Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients

Palo, Hemanta Kumar; Chandra, Mahesh; Mohanty, Mihir Narayan

doi:10.1007/978-981-10-4762-6_47

Hemanta Kumar Palo³²,
Mahesh Chandra³³ &
Mihir Narayan Mohanty³²

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 442))

2130 Accesses
23 Citations

Abstract

In this chapter, different variants of Mel-frequency cepstral coefficients (MFCCs) describing human speech emotions are investigated. These features are tested and compared for their robustness in terms of classification accuracy and mean square error. Although MFCC is a reliable feature for speech emotion recognition, it does not consider the temporal dynamics between features which is crucial for such analysis. To address this issue, delta MFCC as its first derivative is extracted for comparison. Due to poor performance of MFCC under noisy condition, both MFCC and delta MFCC features are extracted in wavelet domain in the second phase. Time–frequency characterization of emotions using wavelet analysis and energy or amplitude information using MFCC-based features has enhanced the available information. Wavelet-based MFCCs (WMFCCs) and wavelet-based delta MFCCs (WDMFCCs) outperformed standard MFCCs, delta MFCCs, and wavelets in recognition of Berlin speech emotional utterances. Probabilistic neural network (PNN) has been chosen to model the emotions as the classifier is simple to train, much faster, and allows flexible selection of smoothing parameter than other neural network (NN) models. Highest accuracy of 80.79% has been observed with WDMFCCs as compared to 60.97 and 62.76% with MFCCs and wavelets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Kari, B., Muthulakshmi, S.: Real time implementation of speaker recognition system with MFCC and neural networks on FPGA. Indian J. Sci. Technol. 8(19), 1–11 (2015)
Article Google Scholar
Mohanaprasad, K., Pawani, J.K., Killa, V., Sankarganesh, S.: Real time implementation of speaker verification system. Indian J. Sci. Technol. 8(24), 1–9 (2015)
Article Google Scholar
Subhashree, R., Rathna, G.N.: Speech emotion recognition: performance analysis based on fused algorithms and GMM Modelling. Indian J. Sci. Technol. 9(11), 1–8 (2016)
Article Google Scholar
Mishra, A.N., Chandra, M., Biswas, A., Sharan, S.N.: Robust features for connected Hindi digits recognition. Int. J. Signal Process. Image Process. Pattern Recogn. 4(2), 79–90 (2011)
Google Scholar
Kwon, O.W., Chan, K., Hao, J., Lee, T-W.: Emotion recognition by speech signals. In: Interspeech (2003)
Google Scholar
Kumar, P., Chandra, M.: Hybrid of wavelet and MFCC features for speaker verification. In: 2011 World Congress on Information and Communication Technologies, IEEE, pp. 1150–1154 (2011)
Google Scholar
Ayadi, E., Kamal, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011). Elsevier
Article MATH Google Scholar
Specht, D.: Probabilistic neural networks. Neural Netw. 3, 109–118 (1990)
Article Google Scholar
Palo, H.K., Mohanty, M.N., Chandra, M.: Efficient feature combination techniques for emotional speech classification. Int. J. Speech Technol. 19, 135–150 (2016)
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the Interspeech, Lissabon, Portugal, pp. 1517–1520 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha, India
Hemanta Kumar Palo & Mihir Narayan Mohanty
Department of Electronics and Communication Engineering, Birla Institute Technology, Ranchi, India
Mahesh Chandra

Authors

Hemanta Kumar Palo
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Chandra
View author publications
You can also search for this author in PubMed Google Scholar
Mihir Narayan Mohanty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mihir Narayan Mohanty .

Editor information

Editors and Affiliations

Clinical Engineering, University of Virginia Health System, Charlottesville, Virginia, USA
Avinash Konkani
Department of Electronics and Communication Engineering, Sikkim Manipal Institute of Technology(SMIT), Rangpo, Sikkim, India
Rabindranath Bera
Department of Energy Engineering, North-Eastern Hill University, Shillong, Megalaya, India
Samrat Paul

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Palo, H.K., Chandra, M., Mohanty, M.N. (2018). Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients. In: Konkani, A., Bera, R., Paul, S. (eds) Advances in Systems, Control and Automation. Lecture Notes in Electrical Engineering, vol 442. Springer, Singapore. https://doi.org/10.1007/978-981-10-4762-6_47

Download citation

DOI: https://doi.org/10.1007/978-981-10-4762-6_47
Published: 12 December 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4761-9
Online ISBN: 978-981-10-4762-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics