Emotion recognition from speech using global and local prosodic features

Rao, K. Sreenivasa; Koolagudi, Shashidhar G.; Vempada, Ramu Reddy

doi:10.1007/s10772-012-9172-2

Emotion recognition from speech using global and local prosodic features

Published: 04 August 2012

Volume 16, pages 143–160, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

K. Sreenivasa Rao¹,
Shashidhar G. Koolagudi¹ &
Ramu Reddy Vempada¹

3022 Accesses
95 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Emotion Recognition using Sentence, Word and Syllable Level Prosodic Features

Emotion Recognition Using Prosodic Information

Summary and Conclusions

References

Banziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46, 252–267.
Article Google Scholar
Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.) (2008). Springer handbook on speech processing. Berlin: Springer.
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Interspeech.
Google Scholar
Cahn, J. E. (1990). The generation of affect in synthesized speech. In JAVIOS (pp. 1–19), July 1990
Google Scholar
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40, 5–32.
Article MATH Google Scholar
Dellaert, F., Polzin, T., & Waibel, A. (1996). Recognizing emotion in speech. In 4th international conference on spoken language processing (pp. 1970–1973), Philadelphia, PA, USA, Oct. 1996
Google Scholar
Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187.
Article MATH Google Scholar
Iliou, T., & Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, July (pp. 121–126).
Google Scholar
Kao, Y.h., & Lee, L.s. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH–ICSLP (pp. 1814–1817), Pittsburgh, Pennsylvania, Sept. 2006
Google Scholar
Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48.
Article Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: a review. International Journal of Speech Technology, 15(2), 99–117.
Article Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system and prosodic features. International Journal of Speech Technology, 15(2), 265–289.
Article Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012c). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology. doi:10.1007/s10772-012-9150-8.
Google Scholar
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009). IITKGP-SESC: speech database for emotion analysis, Aug. 2009. Communications in computer and information science, Lecture notes in computer science. Berlin: Springer.
Google Scholar
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13, 293–303.
Article Google Scholar
Luengo, I., Navas, E., Hernez, I., & Snchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal (pp. 493–496), Sept. 2005
Google Scholar
Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP (pp. IV17–IV20), Honolulu, Hawai, USA, May 2007. New York: IEEE Press.
Google Scholar
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCA workshop on speech and emotion, Belfast.
Google Scholar
Murray, I. R., & Arnott, J. L. (1995). Implementation and testing of a system for producing emotion by rule in synthetic speech. Speech Communication, 16, 369–390.
Article Google Scholar
Murray, I. R., Arnott, J. L., & Rohwer, E. A. (1996). Emotional stress in synthetic speech: Progress and future directions. Speech Communication, 20, 85–91.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.
Article Google Scholar
Nwe, T. L., Foo, S. W., & Silva, L. C. D. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.
Article Google Scholar
Prasanna, S. R. M. (2004). Event-based analysis of speech. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, Mar. 2004.
Prasanna, S. R. M., & Zachariah, J. M. (2002). Detection of vowel onset point in speech. In Proc. IEEE int. conf. acoust., speech, signal processing. Orlando, Florida, USA, May 2002.
Google Scholar
Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556–565.
Article Google Scholar
Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.
Rao, K. S. (2011a). Application of prosody models for developing speech systems in Indian languages. International Journal of Speech Technology, 14, 19–33.
Article Google Scholar
Rao, K. S. (2011b). Role of neural network models for developing speech systems. Sadhana, 36, 783–836.
Article Google Scholar
Rao, K. S. & Koolagudi, S. G., (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. IJSCI: International Journal of Systemics, Cybernetics and Informatics, 9(4), 24–33.
Google Scholar
Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Speech and Audio Processing, 14, 972–980.
Article Google Scholar
Rao, K. S., Prasanna, S. R. M., & Sagar, T. V. (2007). Emotion recognition using multilevel prosodic information. In Workshop on image and signal processing (WISP-2007). Guwahati, India, Dec. 2007. Guwahati: IIT Guwahati.
Google Scholar
Rao, K. S., Reddy, R., Maity, S., & Koolagudi, S. G. (2010). Characterization of emotions using the dynamics of prosodic features. In International conference on speech prosody. Chicago, USA, May 2010.
Google Scholar
Rao, K. S., Saroj, V. K., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38, 13181–13185.
Article Google Scholar
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256.
Article MATH Google Scholar
Schroder, M. (2001). Emotional speech synthesis: a review. In Seventh European conference on speech communication and technology, Eurospeech, Aalborg, Denmark, Sept. 2001
Google Scholar
Schroder, M., & Cowie, R. (2006). Issues in emotion-oriented computing toward a shared understanding. In Workshop on emotion and computing, HUMAINE.
Google Scholar
Schuller, B. (2012). The computational paralinguistics challenge. IEEE Signal Processing Magazine, 29, 97–101.
Article Google Scholar
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Communication, 53, 1062–1087.
Article Google Scholar
Ververidis, D., & Kotropoulos, C. (2006). A state of the art review on emotional speech databases. In Eleventh Australasian international conference on speech science and technology, Auckland, New Zealand, Dec. 2006
Google Scholar
Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In ICASSP (pp. I593–I596). New York: IEEE Press.
Google Scholar
Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech, and Language Processing, 20, 1894–1903.
Article Google Scholar
Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation (pp. 407–411).
Chapter Google Scholar
Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art, the future challenges (pp. 23–40). Chichester: Wiley.
Google Scholar
Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun et al. (Ed.), Advances in neural networks (pp. 457–464). Lecture notes in computer science. Berlin: Springer.
Google Scholar
Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), Human computer interaction, Part III, HCII (pp. 544–552). Lecture notes in computer science. Berlin: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India
K. Sreenivasa Rao, Shashidhar G. Koolagudi & Ramu Reddy Vempada

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar
Ramu Reddy Vempada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, K.S., Koolagudi, S.G. & Vempada, R.R. Emotion recognition from speech using global and local prosodic features. Int J Speech Technol 16, 143–160 (2013). https://doi.org/10.1007/s10772-012-9172-2

Download citation

Received: 10 May 2012
Accepted: 20 July 2012
Published: 04 August 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10772-012-9172-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotion recognition from speech using global and local prosodic features

Abstract

Access this article

Similar content being viewed by others

Robust Emotion Recognition using Sentence, Word and Syllable Level Prosodic Features

Emotion Recognition Using Prosodic Information

Summary and Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Robust Emotion Recognition using Sentence, Word and Syllable Level Prosodic Features

Emotion Recognition Using Prosodic Information

Summary and Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation