Skip to main content
Log in

Development of simulated emotion speech database for excitation source analysis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The work presented in this paper is focused on the development of a simulated emotion database particularly for the excitation source analysis. The presence of simultaneous electroglottogram (EGG) recordings for each emotion utterance helps to accurately analyze the variations in the source parameters according to different emotions. The work presented in this paper describes the development of comparatively large simulated emotion database for three emotions (Anger, Happy and Sad) along with neutrally spoken utterances in three languages (Tamil, Malayalam and Indian English). Emotion utterances in each language are recorded from 10 speakers in multiple sessions (Tamil and Malayalam). Unlike the existing simulated emotion databases, instead of emotionally neutral utterances, emotionally biased utterances are used for recording. Based on the emotion recognition experiments, the emotions elicited from emotionally biased utterances are found to show more emotion discrimination as compared to emotionally neutral utterances. Also, based on the comparative experimental analysis, the speech and EGG utterances of the proposed simulated emotion database are found to preserve the general trend in the excitation source characteristics (instantaneous F0 and strength of excitation parameters) for different emotions as that of the classical German emotion speech-EGG database (EmoDb). Finally, the emotion recognition rates obtained for the proposed speech-EGG emotion database using the conventional mel frequency cepstral coefficients and Gaussian mixture model based emotion recognition system, are found to be comparable with that of the existing German (EmoDb) and IITKGP-SESC Telugu speech emotion databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes and databases. Pattern Recognition, 44, 572–587.

    Article  MATH  Google Scholar 

  • Bulut, M., & Narayanan, S. (2008). On the robustness of overall f0 only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123, 4547–4558.

    Article  Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlemeier, W., & Weiss, B. (2005) A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520).

  • Burkhardt, F., & Sendilmeier, W. F. (2000). Verification of acousical correlates of emotional speech using formant synthesis. In Proceedings of the ISCA Workshop on Speech & Emotion (pp. 151–156).

  • Cabral, J. P., & Oliveira, L. C. (2006). Emo voice: A system to generate emotions in speech. In Proceedings of INTERSPEECH (pp. 1798–1801).

  • Cahn, J. E. (1989). Generation of affect in synthesized speech. In Proceedings of the American Voice I/O Society (pp. 1–19).

  • Campbell, N. (2004). Developments in corpus -based speech synthesis: Approaching natural conversational speech. IEICE Transactions on Information and Systems, 87, 497–500.

    Google Scholar 

  • Cowie, E.D., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J.-C., Devillers, L., Abrilian, S., Batliner, A., Amir, N., & Karpouzis, K. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction ( pp. 488–500).

  • Erickson, D. (2005). Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology, 26(4), 317–325.

    Article  Google Scholar 

  • Fairbanks, G., & Hoaglin, L. W. (1939). An experimental study of pitch characteristics of voice during the expression of emotion. Speech Monographs, 6, 87–104.

    Article  Google Scholar 

  • Govind, D. (2013). Epoch based dynamic prosody modification for neutral to expressive speech conversion. Ph.D. dissertation, Indian Institute of Technology Guwahati.

  • Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2011). “Neutral to target emotion conversion using source and suprasegmental information. In Proceedings of INTERSPEECH, 2011.

  • Govind, D., Prasanna, S. R. M. (2012). Epoch extraction from emotional speech. In Proceedings of the Signal Procesing & Communications (SPCOM) (pp. 1–5).

  • Hansen, J. H. L., & Bou-Ghazale, S. E. (1997). Getting started with susas: A speech under simulated and actual stress database. EUROSPEECH, 1997, 1743–1746.

    Google Scholar 

  • Hashizawa, Y., Hamzah, S. T. M. D., Ohyama, G. (2004). On the differences in prosodic features of emotional expressions in Japanese speech according to the degree of the emotion. In Proceedings of the Speech Prosody (pp. 655–658).

  • Hofer, G., Richmond, K., & Clark., R. (2005). Informed blending of databases for emotional speech synthesis. In Proceedings of INTERSPEECH.

  • Jhonstone, T., & Scherer, K. R. (1999). The effects of emotions on voice quality. In Proceedings of the International Congress of Phoetic Sciences: San Fransisco (pp. 2029–2031).

  • Kadiri, S.R., Gangamohan, P., Gangashetty, S.V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In Proceedings of INTERSPEECH (pp. 1324–1328).

  • Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., Rao, K. S., Computing, Contemporary, et al. (2009). ch (pp. 485–492). Speech Database for Emotion Analysis: IITKGP-SESC.

    Google Scholar 

  • Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48.

    Article  Google Scholar 

  • Kwon, O., Chan, K., Hao, J., Lee, S. T. (2003). Emotion recognition by speech signal. In Proceedings of the Eurospeech (pp. 125–128).

  • Liberman, M., Davis, K., Grossman, M., Martey, N., & Bell, J. (2002). LDC Emotional Prosody Speech Transcripts database, Univeristy of pennsylvania, Linguistic data consortium.

  • Lugger, M., Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Proceedings of EUSIPCO.

  • McKeown, G., Valstar, M., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colorred conversations between a person and limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.

    Article  Google Scholar 

  • Murray, I. R., & Arnott, J. L. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097–1108.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1614.

    Article  Google Scholar 

  • Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.

    Article  Google Scholar 

  • Nwe, T., Foo, S., & Silva, L. D. (2003). Emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.

    Article  Google Scholar 

  • Prasanna, S. R. M., Govind, D., Rao, K. S., & Yenanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Proceedings of Speech Prosody.

  • Prasanna, S. R. M. & Govind, D. (2010). Analysis of excitation source information in emotional speech. In Proceedings of INTERSPEECH (pp. 781–784).

  • Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the recola multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE Face & Gestures 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE).

  • Scherer, K. R. (1986). Vocal affect expressions: A review and a model for future research. Psychological Bulletin, 99, 143–165.

    Article  Google Scholar 

  • Schroder, M. (2009). Expressive speech synthesis: Past, present and possible futures. In Affective information processing. Springer, (vol. 2, pp. 111–126).

    Article  Google Scholar 

  • Slaney, M., & McRoberts, G. (2003). BabyEars: A recognition system for affective vocalizations. Speech Communication, 39, 367–384.

    Article  MATH  Google Scholar 

  • Vroomen, J., Collier, R., Mozziconacci, S. J. L. (1993). Duration and intonation in emotional speech. In Proceedings of EUROSPEECH (pp. 577–580).

  • Vydana, H. K., Kadiri, S. R., & Vuppala, A. K. (2016). Vowel-based non-uniform prosody modification for emotion conversion. Circuits, Systems and Signal Processing, 35(5), 1643–1663.

    Article  Google Scholar 

  • Whiteside, S. P. (1998). Simulated emotions: An acoustic study of voice and perturbation measures. In Proceedings of ICSLP, Sydney, Australia (pp. 699–703).

  • Williams, C. E., & Stevens, K. (1972). Emotions and speech: Some acoustic correlates. The Journal of the Acoustical Society of America, 52, 1238–1250.

    Article  Google Scholar 

Download references

Acknowledgements

The present work was financially supported from the completed DST-SERB project titled, “Analysis, Processing and Synthesis of Emotions in Speech (Ref No. SB/FTP/ETA-370/2012)”. The project started on 4-7-2013 and ended on 3-7-2016 (3 years). Authors would like to thank the funding agency for supporting this part of the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Pravena.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pravena, D., Govind, D. Development of simulated emotion speech database for excitation source analysis. Int J Speech Technol 20, 327–338 (2017). https://doi.org/10.1007/s10772-017-9407-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9407-3

Keywords

Navigation