Auditory Teager energy cepstrum coefficients for robust speech recognition

Dimitriadis, Dimitrios; Maragos, Petros; Potamianos, Alexandros

doi:10.21437/Interspeech.2005-142

Auditory Teager energy cepstrum coefficients for robust speech recognition

Dimitrios Dimitriadis, Petros Maragos, Alexandros Potamianos

In this paper, a feature extraction algorithm for robust speech recognition is introduced. The feature extraction algorithm is motivated by the human auditory processing and the nonlinear Teager-Kaiser energy operator that estimates the true energy of the source of a resonance. The proposed features are labeled as Teager Energy Cepstrum Coefficients (TECCs). TECCs are computed by first filtering the speech signal through a dense non constant-Q Gammatone filterbank and then by estimating the "true" energy of the signal's source, i.e., the short-time average of the output of the Teager-Kaiser energy operator. Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the TECCs perform significantly better than the MFCCs for noisy recognition tasks. Specifically, relative word error rate improvement of 60% over the MFCC baseline is shown for the Aurora-3 database for the high-mismatch condition. Absolute error rate improvement ranging from 5% to 20% is shown for a phone recognition task in (various types of additive) noise.

doi: 10.21437/Interspeech.2005-142

Cite as: Dimitriadis, D., Maragos, P., Potamianos, A. (2005) Auditory Teager energy cepstrum coefficients for robust speech recognition. Proc. Interspeech 2005, 3013-3016, doi: 10.21437/Interspeech.2005-142

@inproceedings{dimitriadis05_interspeech,
  author={Dimitrios Dimitriadis and Petros Maragos and Alexandros Potamianos},
  title={{Auditory Teager energy cepstrum coefficients for robust speech recognition}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={3013--3016},
  doi={10.21437/Interspeech.2005-142}
}