Skip to main content

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

  • Conference paper
  • First Online:
Futuristic Communication and Network Technologies (VICFCNT 2020)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 792))

Abstract

In recent years, speech emotion recognition (SER) has engrossed more attention in speech processing because of its potential in various speech-based intelligent systems. In deep learning algorithms to capture discriminative features of the audio emotion samples, a large number of features are required, which increases the computational complexity of the network. This paper presents a three-layered sequential deep convolutional neural network (DCNN) based on mel frequency log spectrogram (MFLS) for emotion recognition. Mel frequency log spectrogram that confines the salient information from the emotion speech corpus and two-dimensional DCNN. Exploratory outcomes on the Berlin Emo-DB dataset show that the proposed method gives 95.68 and 96.07% accuracy for the speaker-dependent and speaker-independent approaches. The performance of the proposed method is compared with CNN and CNN-LSTM on the Berlin Emo-DB dataset and results in improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schuller BW (2018) Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun ACM 61(5):90–99

    Article  Google Scholar 

  2. Khalil RA, Jones E, Babar MI et al (2019) Speech emotion recognition using deep learning techniques: a review. IEEE Access 7:17327–117345

    Google Scholar 

  3. Gunawan TS et al (2018) A review on emotion recognition algorithms using speech analysis. Indonesian J Elect Eng Inf (IJEEI) 6(1):12–20

    Google Scholar 

  4. Anagnostopoulos C, Iliou T, Giannoukos I (2012) Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif Intell Rev 43(2):155–177

    Article  Google Scholar 

  5. Guidi A, Vanello N, Bertschy G, Gentili C, Landini L, Scilingo E (2015) Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients. Biomed Signal Process Control 17:29–37

    Article  Google Scholar 

  6. Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceeding of fourth international conference on spoken language processing. ICSLP’96, vol 3. IEEE, pp 1970–1973

    Google Scholar 

  7. Zhou Y, Sun Y, Zhang J, Yan Y (2009) Speech emotion recognition using both spectral and prosodic features. In: Information engineering and computer science, pp 1–4

    Google Scholar 

  8. Haq S, Jackson P, Edge J (2008) Audio-visual feature selection and reduction for emotion classification. In: Proceedings of international conference on auditory-visual speech processing (AVSP’08), Tangalooma, Australia

    Google Scholar 

  9. Sonawane A, Inamdar M, Kishor B (2017) Sound based human emotion recognition using MFCC & multiple SVM. In: 2017 international conference on information, communication, instrumentation and control (ICICIC), pp 1–4. IEEE, Indore, India

    Google Scholar 

  10. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587

    Article  Google Scholar 

  11. Jianwei N, Yanmin Q, Kai Y (2014) Acoustic emotion recognition using deep neural network. In: 9th international symposium on Chinese spoken language processing, pp 128–132. IEEE

    Google Scholar 

  12. Huang Z, Ming D, Qirong M, Yongzhao Z (2014) Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM international conference on multimedia, pp 801–804

    Google Scholar 

  13. Zheng Q, Yu J, Zou Y (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII), pp 827–831. IEEE)

    Google Scholar 

  14. Badshah A, Jamil M, A., Nasir, R., Sung, W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 international conference on platform technology and service (PlatCon), pp 1–5. IEEE (2017)

    Google Scholar 

  15. Zhao J, Xia M, Lijiang C (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323

    Article  Google Scholar 

  16. Bhangale K, Titare P, Pawar R, Bhavsar S (2018) Synthetic speech spoofing detection using MFCC and radial basis function SVM. IOSR J Eng 8(6):55–62

    Google Scholar 

  17. Zheng F, Guoliang Z, Zhanjiang S (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16(6):582–589

    Article  Google Scholar 

  18. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. 1st edn, MIT Press, Cambridge, MA

    Google Scholar 

  19. Shende P, Dandwate Y (2020) Convolutional neural network based multimodal biometric human authentication using face, Palm Veins and Fingerprint. Int J Innov Technol Explor Eng (IJITEE) 9(3):771–777

    Google Scholar 

  20. Burkhardt F, Paescke A, Rolfes M, Sendlmeirer WF, Weiss B (2005) A database of German emotional speech. In: 9th European conference of speech commincation and Technology

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Mohanaprasad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhangale, K., Mohanaprasad, K. (2022). Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network. In: Sivasubramanian, A., Shastry, P.N., Hong, P.C. (eds) Futuristic Communication and Network Technologies. VICFCNT 2020. Lecture Notes in Electrical Engineering, vol 792. Springer, Singapore. https://doi.org/10.1007/978-981-16-4625-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-4625-6_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-4624-9

  • Online ISBN: 978-981-16-4625-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics