Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

Bhangale, Kishor; Mohanaprasad, K.

doi:10.1007/978-981-16-4625-6_24

Kishor Bhangale³⁹ &
K. Mohanaprasad³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 792))

Included in the following conference series:

International Conference on Futuristic Communication and Network Technologies

1510 Accesses
19 Citations

Abstract

In recent years, speech emotion recognition (SER) has engrossed more attention in speech processing because of its potential in various speech-based intelligent systems. In deep learning algorithms to capture discriminative features of the audio emotion samples, a large number of features are required, which increases the computational complexity of the network. This paper presents a three-layered sequential deep convolutional neural network (DCNN) based on mel frequency log spectrogram (MFLS) for emotion recognition. Mel frequency log spectrogram that confines the salient information from the emotion speech corpus and two-dimensional DCNN. Exploratory outcomes on the Berlin Emo-DB dataset show that the proposed method gives 95.68 and 96.07% accuracy for the speaker-dependent and speaker-independent approaches. The performance of the proposed method is compared with CNN and CNN-LSTM on the Berlin Emo-DB dataset and results in improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Schuller BW (2018) Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun ACM 61(5):90–99
Article Google Scholar
Khalil RA, Jones E, Babar MI et al (2019) Speech emotion recognition using deep learning techniques: a review. IEEE Access 7:17327–117345
Google Scholar
Gunawan TS et al (2018) A review on emotion recognition algorithms using speech analysis. Indonesian J Elect Eng Inf (IJEEI) 6(1):12–20
Google Scholar
Anagnostopoulos C, Iliou T, Giannoukos I (2012) Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif Intell Rev 43(2):155–177
Article Google Scholar
Guidi A, Vanello N, Bertschy G, Gentili C, Landini L, Scilingo E (2015) Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients. Biomed Signal Process Control 17:29–37
Article Google Scholar
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceeding of fourth international conference on spoken language processing. ICSLP’96, vol 3. IEEE, pp 1970–1973
Google Scholar
Zhou Y, Sun Y, Zhang J, Yan Y (2009) Speech emotion recognition using both spectral and prosodic features. In: Information engineering and computer science, pp 1–4
Google Scholar
Haq S, Jackson P, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: Proceedings of international conference on auditory-visual speech processing (AVSP’08), Tangalooma, Australia
Google Scholar
Sonawane A, Inamdar M, Kishor B (2017) Sound based human emotion recognition using MFCC & multiple SVM. In: 2017 international conference on information, communication, instrumentation and control (ICICIC), pp 1–4. IEEE, Indore, India
Google Scholar
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Article Google Scholar
Jianwei N, Yanmin Q, Kai Y (2014) Acoustic emotion recognition using deep neural network. In: 9th international symposium on Chinese spoken language processing, pp 128–132. IEEE
Google Scholar
Huang Z, Ming D, Qirong M, Yongzhao Z (2014) Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM international conference on multimedia, pp 801–804
Google Scholar
Zheng Q, Yu J, Zou Y (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII), pp 827–831. IEEE)
Google Scholar
Badshah A, Jamil M, A., Nasir, R., Sung, W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 international conference on platform technology and service (PlatCon), pp 1–5. IEEE (2017)
Google Scholar
Zhao J, Xia M, Lijiang C (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323
Article Google Scholar
Bhangale K, Titare P, Pawar R, Bhavsar S (2018) Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J Eng 8(6):55–62
Google Scholar
Zheng F, Guoliang Z, Zhanjiang S (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589
Article Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. 1st edn, MIT Press, Cambridge, MA
Google Scholar
Shende P, Dandwate Y (2020) Convolutional neural network based multimodal biometric human authentication using face, Palm Veins and Fingerprint. Int J Innov Technol Explor Eng (IJITEE) 9(3):771–777
Google Scholar
Burkhardt F, Paescke A, Rolfes M, Sendlmeirer WF, Weiss B (2005) A database of German emotional speech. In: 9th European conference of speech commincation and Technology
Google Scholar

Download references

Author information

Authors and Affiliations

SENSE, VIT, Chennai, India
Kishor Bhangale & K. Mohanaprasad

Authors

Kishor Bhangale
View author publications
You can also search for this author in PubMed Google Scholar
K. Mohanaprasad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Mohanaprasad .

Editor information

Editors and Affiliations

School of Electronics Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
A. Sivasubramanian
Department of Electrical and Computer Engineering, Bradley University, Peoria, IL, USA
Prasad N. Shastry
Department of Electrical and Electronic Engineering, Universiti Tunku Abdul Rahman, Bandar Sungai Long, Malaysia
Pua Chang Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhangale, K., Mohanaprasad, K. (2022). Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network. In: Sivasubramanian, A., Shastry, P.N., Hong, P.C. (eds) Futuristic Communication and Network Technologies. VICFCNT 2020. Lecture Notes in Electrical Engineering, vol 792. Springer, Singapore. https://doi.org/10.1007/978-981-16-4625-6_24

Download citation

DOI: https://doi.org/10.1007/978-981-16-4625-6_24
Published: 12 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4624-9
Online ISBN: 978-981-16-4625-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics