Static and dynamic modulation spectrum for speech recognition

Ganapathy, Sriram; Thomas, Samuel; Hermansky, Hynek

doi:10.21437/Interspeech.2009-721

Static and dynamic modulation spectrum for speech recognition

Sriram Ganapathy, Samuel Thomas, Hynek Hermansky

We present a feature extraction technique based on static and dynamic modulation spectrum derived from long-term envelopes in sub-bands. Estimation of the sub-band temporal envelopes is done using Frequency Domain Linear Prediction (FDLP). These sub-band envelopes are compressed with a static (logarithmic) and dynamic (adaptive loops) compression. The compressed sub-band envelopes are transformed into modulation spectral components which are used as features for speech recognition. Experiments are performed on a phoneme recognition task using a hybrid HMM-ANN phoneme recognition system and an ASR task using the TANDEM speech recognition system. The proposed features provide a relative improvements of 3.8% and 11.5% in phoneme recognition accuracies for TIMIT and conversation telephone speech (CTS) respectively. Further, these improvements are found to be consistent for ASR tasks on OGI-Digits database (relative improvement of 13.5%).

doi: 10.21437/Interspeech.2009-721

Cite as: Ganapathy, S., Thomas, S., Hermansky, H. (2009) Static and dynamic modulation spectrum for speech recognition. Proc. Interspeech 2009, 2823-2826, doi: 10.21437/Interspeech.2009-721

@inproceedings{ganapathy09_interspeech,
  author={Sriram Ganapathy and Samuel Thomas and Hynek Hermansky},
  title={{Static and dynamic modulation spectrum for speech recognition}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2823--2826},
  doi={10.21437/Interspeech.2009-721}
}