Abstract: A system and method for applying a convolutional neural network (CNN) to speech
recognition. The CNN may provide input to a hidden Markov model and has at least one pair
of a convolution layer and a pooling layer. The CNN operates along the frequency axis. The
CNN has units that operate upon one or more local frequency bands of an acoustic signal.
The CNN mitigates acoustic variation.