Unfolded recurrent neural networks for speech recognition

Saon, George; Soltau, Hagen; Emami, Ahmad; Picheny, Michael

doi:10.21437/Interspeech.2014-81

Unfolded recurrent neural networks for speech recognition

George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny

We introduce recurrent neural networks (RNNs) for acoustic modeling which are unfolded in time for a fixed number of time steps. The proposed models are feedforward networks with the property that the unfolded layers which correspond to the recurrent layer have time-shifted inputs and tied weight matrices. Besides the temporal depth due to unfolding, hierarchical processing depth is added by means of several non-recurrent hidden layers inserted between the unfolded layers and the output layer. The training of these models: (a) has a complexity that is comparable to deep neural networks (DNNs) with the same number of layers; (b) can be done on frame-randomized minibatches; (c) can be implemented efficiently through matrix-matrix operations on GPU architectures which makes it scalable for large tasks. Experimental results on the Switchboard 300 hours English conversational telephony task show a 5% relative improvement in word error rate over state-of-the-art DNNs trained on FMLLR features with i-vector speaker adaptation and hessian-free sequence discriminative training.

doi: 10.21437/Interspeech.2014-81

Cite as: Saon, G., Soltau, H., Emami, A., Picheny, M. (2014) Unfolded recurrent neural networks for speech recognition. Proc. Interspeech 2014, 343-347, doi: 10.21437/Interspeech.2014-81

@inproceedings{saon14_interspeech,
  author={George Saon and Hagen Soltau and Ahmad Emami and Michael Picheny},
  title={{Unfolded recurrent neural networks for speech recognition}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={343--347},
  doi={10.21437/Interspeech.2014-81}
}