Abstract
In this chapter, different variants of Mel-frequency cepstral coefficients (MFCCs) describing human speech emotions are investigated. These features are tested and compared for their robustness in terms of classification accuracy and mean square error. Although MFCC is a reliable feature for speech emotion recognition, it does not consider the temporal dynamics between features which is crucial for such analysis. To address this issue, delta MFCC as its first derivative is extracted for comparison. Due to poor performance of MFCC under noisy condition, both MFCC and delta MFCC features are extracted in wavelet domain in the second phase. Time–frequency characterization of emotions using wavelet analysis and energy or amplitude information using MFCC-based features has enhanced the available information. Wavelet-based MFCCs (WMFCCs) and wavelet-based delta MFCCs (WDMFCCs) outperformed standard MFCCs, delta MFCCs, and wavelets in recognition of Berlin speech emotional utterances. Probabilistic neural network (PNN) has been chosen to model the emotions as the classifier is simple to train, much faster, and allows flexible selection of smoothing parameter than other neural network (NN) models. Highest accuracy of 80.79% has been observed with WDMFCCs as compared to 60.97 and 62.76% with MFCCs and wavelets, respectively.
References
Kari, B., Muthulakshmi, S.: Real time implementation of speaker recognition system with MFCC and neural networks on FPGA. Indian J. Sci. Technol. 8(19), 1–11 (2015)
Mohanaprasad, K., Pawani, J.K., Killa, V., Sankarganesh, S.: Real time implementation of speaker verification system. Indian J. Sci. Technol. 8(24), 1–9 (2015)
Subhashree, R., Rathna, G.N.: Speech emotion recognition: performance analysis based on fused algorithms and GMM Modelling. Indian J. Sci. Technol. 9(11), 1–8 (2016)
Mishra, A.N., Chandra, M., Biswas, A., Sharan, S.N.: Robust features for connected Hindi digits recognition. Int. J. Signal Process. Image Process. Pattern Recogn. 4(2), 79–90 (2011)
Kwon, O.W., Chan, K., Hao, J., Lee, T-W.: Emotion recognition by speech signals. In: Interspeech (2003)
Kumar, P., Chandra, M.: Hybrid of wavelet and MFCC features for speaker verification. In: 2011 World Congress on Information and Communication Technologies, IEEE, pp. 1150–1154 (2011)
Ayadi, E., Kamal, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011). Elsevier
Specht, D.: Probabilistic neural networks. Neural Netw. 3, 109–118 (1990)
Palo, H.K., Mohanty, M.N., Chandra, M.: Efficient feature combination techniques for emotional speech classification. Int. J. Speech Technol. 19, 135–150 (2016)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the Interspeech, Lissabon, Portugal, pp. 1517–1520 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Palo, H.K., Chandra, M., Mohanty, M.N. (2018). Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients. In: Konkani, A., Bera, R., Paul, S. (eds) Advances in Systems, Control and Automation. Lecture Notes in Electrical Engineering, vol 442. Springer, Singapore. https://doi.org/10.1007/978-981-10-4762-6_47
Download citation
DOI: https://doi.org/10.1007/978-981-10-4762-6_47
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4761-9
Online ISBN: 978-981-10-4762-6
eBook Packages: EngineeringEngineering (R0)