Skip to main content

Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients

  • Chapter
  • First Online:
Advances in Systems, Control and Automation

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 442))

Abstract

In this chapter, different variants of Mel-frequency cepstral coefficients (MFCCs) describing human speech emotions are investigated. These features are tested and compared for their robustness in terms of classification accuracy and mean square error. Although MFCC is a reliable feature for speech emotion recognition, it does not consider the temporal dynamics between features which is crucial for such analysis. To address this issue, delta MFCC as its first derivative is extracted for comparison. Due to poor performance of MFCC under noisy condition, both MFCC and delta MFCC features are extracted in wavelet domain in the second phase. Time–frequency characterization of emotions using wavelet analysis and energy or amplitude information using MFCC-based features has enhanced the available information. Wavelet-based MFCCs (WMFCCs) and wavelet-based delta MFCCs (WDMFCCs) outperformed standard MFCCs, delta MFCCs, and wavelets in recognition of Berlin speech emotional utterances. Probabilistic neural network (PNN) has been chosen to model the emotions as the classifier is simple to train, much faster, and allows flexible selection of smoothing parameter than other neural network (NN) models. Highest accuracy of 80.79% has been observed with WDMFCCs as compared to 60.97 and 62.76% with MFCCs and wavelets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Kari, B., Muthulakshmi, S.: Real time implementation of speaker recognition system with MFCC and neural networks on FPGA. Indian J. Sci. Technol. 8(19), 1–11 (2015)

    Article  Google Scholar 

  2. Mohanaprasad, K., Pawani, J.K., Killa, V., Sankarganesh, S.: Real time implementation of speaker verification system. Indian J. Sci. Technol. 8(24), 1–9 (2015)

    Article  Google Scholar 

  3. Subhashree, R., Rathna, G.N.: Speech emotion recognition: performance analysis based on fused algorithms and GMM Modelling. Indian J. Sci. Technol. 9(11), 1–8 (2016)

    Article  Google Scholar 

  4. Mishra, A.N., Chandra, M., Biswas, A., Sharan, S.N.: Robust features for connected Hindi digits recognition. Int. J. Signal Process. Image Process. Pattern Recogn. 4(2), 79–90 (2011)

    Google Scholar 

  5. Kwon, O.W., Chan, K., Hao, J., Lee, T-W.: Emotion recognition by speech signals. In: Interspeech (2003)

    Google Scholar 

  6. Kumar, P., Chandra, M.: Hybrid of wavelet and MFCC features for speaker verification. In: 2011 World Congress on Information and Communication Technologies, IEEE, pp. 1150–1154 (2011)

    Google Scholar 

  7. Ayadi, E., Kamal, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011). Elsevier

    Article  MATH  Google Scholar 

  8. Specht, D.: Probabilistic neural networks. Neural Netw. 3, 109–118 (1990)

    Article  Google Scholar 

  9. Palo, H.K., Mohanty, M.N., Chandra, M.: Efficient feature combination techniques for emotional speech classification. Int. J. Speech Technol. 19, 135–150 (2016)

    Article  Google Scholar 

  10. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the Interspeech, Lissabon, Portugal, pp. 1517–1520 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihir Narayan Mohanty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Palo, H.K., Chandra, M., Mohanty, M.N. (2018). Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients. In: Konkani, A., Bera, R., Paul, S. (eds) Advances in Systems, Control and Automation. Lecture Notes in Electrical Engineering, vol 442. Springer, Singapore. https://doi.org/10.1007/978-981-10-4762-6_47

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4762-6_47

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4761-9

  • Online ISBN: 978-981-10-4762-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics