Development of simulated emotion speech database for excitation source analysis

Pravena, D.; Govind, D.

doi:10.1007/s10772-017-9407-3

Development of simulated emotion speech database for excitation source analysis

Published: 22 April 2017

Volume 20, pages 327–338, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

D. Pravena¹ &
D. Govind¹

566 Accesses
15 Citations
Explore all metrics

Abstract

The work presented in this paper is focused on the development of a simulated emotion database particularly for the excitation source analysis. The presence of simultaneous electroglottogram (EGG) recordings for each emotion utterance helps to accurately analyze the variations in the source parameters according to different emotions. The work presented in this paper describes the development of comparatively large simulated emotion database for three emotions (Anger, Happy and Sad) along with neutrally spoken utterances in three languages (Tamil, Malayalam and Indian English). Emotion utterances in each language are recorded from 10 speakers in multiple sessions (Tamil and Malayalam). Unlike the existing simulated emotion databases, instead of emotionally neutral utterances, emotionally biased utterances are used for recording. Based on the emotion recognition experiments, the emotions elicited from emotionally biased utterances are found to show more emotion discrimination as compared to emotionally neutral utterances. Also, based on the comparative experimental analysis, the speech and EGG utterances of the proposed simulated emotion database are found to preserve the general trend in the excitation source characteristics (instantaneous F0 and strength of excitation parameters) for different emotions as that of the classical German emotion speech-EGG database (EmoDb). Finally, the emotion recognition rates obtained for the proposed speech-EGG emotion database using the conventional mel frequency cepstral coefficients and Gaussian mixture model based emotion recognition system, are found to be comparable with that of the existing German (EmoDb) and IITKGP-SESC Telugu speech emotion databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

Article 17 August 2017

A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

Exploring the Significance of Low Frequency Regions in Electroglottographic Signals for Emotion Recognition

References

Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes and databases. Pattern Recognition, 44, 572–587.
Article MATH Google Scholar
Bulut, M., & Narayanan, S. (2008). On the robustness of overall f0 only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123, 4547–4558.
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlemeier, W., & Weiss, B. (2005) A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520).
Burkhardt, F., & Sendilmeier, W. F. (2000). Verification of acousical correlates of emotional speech using formant synthesis. In Proceedings of the ISCA Workshop on Speech & Emotion (pp. 151–156).
Cabral, J. P., & Oliveira, L. C. (2006). Emo voice: A system to generate emotions in speech. In Proceedings of INTERSPEECH (pp. 1798–1801).
Cahn, J. E. (1989). Generation of affect in synthesized speech. In Proceedings of the American Voice I/O Society (pp. 1–19).
Campbell, N. (2004). Developments in corpus -based speech synthesis: Approaching natural conversational speech. IEICE Transactions on Information and Systems, 87, 497–500.
Google Scholar
Cowie, E.D., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J.-C., Devillers, L., Abrilian, S., Batliner, A., Amir, N., & Karpouzis, K. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction ( pp. 488–500).
Erickson, D. (2005). Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology, 26(4), 317–325.
Article Google Scholar
Fairbanks, G., & Hoaglin, L. W. (1939). An experimental study of pitch characteristics of voice during the expression of emotion. Speech Monographs, 6, 87–104.
Article Google Scholar
Govind, D. (2013). Epoch based dynamic prosody modification for neutral to expressive speech conversion. Ph.D. dissertation, Indian Institute of Technology Guwahati.
Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2011). “Neutral to target emotion conversion using source and suprasegmental information. In Proceedings of INTERSPEECH, 2011.
Govind, D., Prasanna, S. R. M. (2012). Epoch extraction from emotional speech. In Proceedings of the Signal Procesing & Communications (SPCOM) (pp. 1–5).
Hansen, J. H. L., & Bou-Ghazale, S. E. (1997). Getting started with susas: A speech under simulated and actual stress database. EUROSPEECH, 1997, 1743–1746.
Google Scholar
Hashizawa, Y., Hamzah, S. T. M. D., Ohyama, G. (2004). On the differences in prosodic features of emotional expressions in Japanese speech according to the degree of the emotion. In Proceedings of the Speech Prosody (pp. 655–658).
Hofer, G., Richmond, K., & Clark., R. (2005). Informed blending of databases for emotional speech synthesis. In Proceedings of INTERSPEECH.
Jhonstone, T., & Scherer, K. R. (1999). The effects of emotions on voice quality. In Proceedings of the International Congress of Phoetic Sciences: San Fransisco (pp. 2029–2031).
Kadiri, S.R., Gangamohan, P., Gangashetty, S.V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In Proceedings of INTERSPEECH (pp. 1324–1328).
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., Rao, K. S., Computing, Contemporary, et al. (2009). ch (pp. 485–492). Speech Database for Emotion Analysis: IITKGP-SESC.
Google Scholar
Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48.
Article Google Scholar
Kwon, O., Chan, K., Hao, J., Lee, S. T. (2003). Emotion recognition by speech signal. In Proceedings of the Eurospeech (pp. 125–128).
Liberman, M., Davis, K., Grossman, M., Martey, N., & Bell, J. (2002). LDC Emotional Prosody Speech Transcripts database, Univeristy of pennsylvania, Linguistic data consortium.
Lugger, M., Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Proceedings of EUSIPCO.
McKeown, G., Valstar, M., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colorred conversations between a person and limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.
Article Google Scholar
Murray, I. R., & Arnott, J. L. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097–1108.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1614.
Article Google Scholar
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
Nwe, T., Foo, S., & Silva, L. D. (2003). Emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.
Article Google Scholar
Prasanna, S. R. M., Govind, D., Rao, K. S., & Yenanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Proceedings of Speech Prosody.
Prasanna, S. R. M. & Govind, D. (2010). Analysis of excitation source information in emotional speech. In Proceedings of INTERSPEECH (pp. 781–784).
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the recola multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE Face & Gestures 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE).
Scherer, K. R. (1986). Vocal affect expressions: A review and a model for future research. Psychological Bulletin, 99, 143–165.
Article Google Scholar
Schroder, M. (2009). Expressive speech synthesis: Past, present and possible futures. In Affective information processing. Springer, (vol. 2, pp. 111–126).
Article Google Scholar
Slaney, M., & McRoberts, G. (2003). BabyEars: A recognition system for affective vocalizations. Speech Communication, 39, 367–384.
Article MATH Google Scholar
Vroomen, J., Collier, R., Mozziconacci, S. J. L. (1993). Duration and intonation in emotional speech. In Proceedings of EUROSPEECH (pp. 577–580).
Vydana, H. K., Kadiri, S. R., & Vuppala, A. K. (2016). Vowel-based non-uniform prosody modification for emotion conversion. Circuits, Systems and Signal Processing, 35(5), 1643–1663.
Article Google Scholar
Whiteside, S. P. (1998). Simulated emotions: An acoustic study of voice and perturbation measures. In Proceedings of ICSLP, Sydney, Australia (pp. 699–703).
Williams, C. E., & Stevens, K. (1972). Emotions and speech: Some acoustic correlates. The Journal of the Acoustical Society of America, 52, 1238–1250.
Article Google Scholar

Download references

Acknowledgements

The present work was financially supported from the completed DST-SERB project titled, “Analysis, Processing and Synthesis of Emotions in Speech (Ref No. SB/FTP/ETA-370/2012)”. The project started on 4-7-2013 and ended on 3-7-2016 (3 years). Authors would like to thank the funding agency for supporting this part of the work.

Author information

Authors and Affiliations

Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore, India
D. Pravena & D. Govind

Authors

D. Pravena
View author publications
You can also search for this author in PubMed Google Scholar
D. Govind
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Pravena.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pravena, D., Govind, D. Development of simulated emotion speech database for excitation source analysis. Int J Speech Technol 20, 327–338 (2017). https://doi.org/10.1007/s10772-017-9407-3

Download citation

Received: 22 December 2016
Accepted: 31 March 2017
Published: 22 April 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10772-017-9407-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of simulated emotion speech database for excitation source analysis

Abstract

Access this article

Similar content being viewed by others

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

Exploring the Significance of Low Frequency Regions in Electroglottographic Signals for Emotion Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of simulated emotion speech database for excitation source analysis

Abstract

Access this article

Similar content being viewed by others

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models

Exploring the Significance of Low Frequency Regions in Electroglottographic Signals for Emotion Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation