Abstract
The work presented in this paper is focused on the development of a simulated emotion database particularly for the excitation source analysis. The presence of simultaneous electroglottogram (EGG) recordings for each emotion utterance helps to accurately analyze the variations in the source parameters according to different emotions. The work presented in this paper describes the development of comparatively large simulated emotion database for three emotions (Anger, Happy and Sad) along with neutrally spoken utterances in three languages (Tamil, Malayalam and Indian English). Emotion utterances in each language are recorded from 10 speakers in multiple sessions (Tamil and Malayalam). Unlike the existing simulated emotion databases, instead of emotionally neutral utterances, emotionally biased utterances are used for recording. Based on the emotion recognition experiments, the emotions elicited from emotionally biased utterances are found to show more emotion discrimination as compared to emotionally neutral utterances. Also, based on the comparative experimental analysis, the speech and EGG utterances of the proposed simulated emotion database are found to preserve the general trend in the excitation source characteristics (instantaneous F0 and strength of excitation parameters) for different emotions as that of the classical German emotion speech-EGG database (EmoDb). Finally, the emotion recognition rates obtained for the proposed speech-EGG emotion database using the conventional mel frequency cepstral coefficients and Gaussian mixture model based emotion recognition system, are found to be comparable with that of the existing German (EmoDb) and IITKGP-SESC Telugu speech emotion databases.
Similar content being viewed by others
References
Ayadi, M. E., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes and databases. Pattern Recognition, 44, 572–587.
Bulut, M., & Narayanan, S. (2008). On the robustness of overall f0 only modifications to the perception of emotions in speech. The Journal of the Acoustical Society of America, 123, 4547–4558.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlemeier, W., & Weiss, B. (2005) A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520).
Burkhardt, F., & Sendilmeier, W. F. (2000). Verification of acousical correlates of emotional speech using formant synthesis. In Proceedings of the ISCA Workshop on Speech & Emotion (pp. 151–156).
Cabral, J. P., & Oliveira, L. C. (2006). Emo voice: A system to generate emotions in speech. In Proceedings of INTERSPEECH (pp. 1798–1801).
Cahn, J. E. (1989). Generation of affect in synthesized speech. In Proceedings of the American Voice I/O Society (pp. 1–19).
Campbell, N. (2004). Developments in corpus -based speech synthesis: Approaching natural conversational speech. IEICE Transactions on Information and Systems, 87, 497–500.
Cowie, E.D., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J.-C., Devillers, L., Abrilian, S., Batliner, A., Amir, N., & Karpouzis, K. (2007). The humaine database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the Second International Conference on Affective Computing and Intelligent Interaction ( pp. 488–500).
Erickson, D. (2005). Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology, 26(4), 317–325.
Fairbanks, G., & Hoaglin, L. W. (1939). An experimental study of pitch characteristics of voice during the expression of emotion. Speech Monographs, 6, 87–104.
Govind, D. (2013). Epoch based dynamic prosody modification for neutral to expressive speech conversion. Ph.D. dissertation, Indian Institute of Technology Guwahati.
Govind, D., Prasanna, S. R. M., & Yegnanarayana, B. (2011). “Neutral to target emotion conversion using source and suprasegmental information. In Proceedings of INTERSPEECH, 2011.
Govind, D., Prasanna, S. R. M. (2012). Epoch extraction from emotional speech. In Proceedings of the Signal Procesing & Communications (SPCOM) (pp. 1–5).
Hansen, J. H. L., & Bou-Ghazale, S. E. (1997). Getting started with susas: A speech under simulated and actual stress database. EUROSPEECH, 1997, 1743–1746.
Hashizawa, Y., Hamzah, S. T. M. D., Ohyama, G. (2004). On the differences in prosodic features of emotional expressions in Japanese speech according to the degree of the emotion. In Proceedings of the Speech Prosody (pp. 655–658).
Hofer, G., Richmond, K., & Clark., R. (2005). Informed blending of databases for emotional speech synthesis. In Proceedings of INTERSPEECH.
Jhonstone, T., & Scherer, K. R. (1999). The effects of emotions on voice quality. In Proceedings of the International Congress of Phoetic Sciences: San Fransisco (pp. 2029–2031).
Kadiri, S.R., Gangamohan, P., Gangashetty, S.V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In Proceedings of INTERSPEECH (pp. 1324–1328).
Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., Rao, K. S., Computing, Contemporary, et al. (2009). ch (pp. 485–492). Speech Database for Emotion Analysis: IITKGP-SESC.
Koolagudi, S. G., & Rao, K. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14, 35–48.
Kwon, O., Chan, K., Hao, J., Lee, S. T. (2003). Emotion recognition by speech signal. In Proceedings of the Eurospeech (pp. 125–128).
Liberman, M., Davis, K., Grossman, M., Martey, N., & Bell, J. (2002). LDC Emotional Prosody Speech Transcripts database, Univeristy of pennsylvania, Linguistic data consortium.
Lugger, M., Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Proceedings of EUSIPCO.
McKeown, G., Valstar, M., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colorred conversations between a person and limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.
Murray, I. R., & Arnott, J. L. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097–1108.
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1614.
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Nwe, T., Foo, S., & Silva, L. D. (2003). Emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.
Prasanna, S. R. M., Govind, D., Rao, K. S., & Yenanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Proceedings of Speech Prosody.
Prasanna, S. R. M. & Govind, D. (2010). Analysis of excitation source information in emotional speech. In Proceedings of INTERSPEECH (pp. 781–784).
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013). Introducing the recola multimodal corpus of remote collaborative and affective interactions. In Proceedings of IEEE Face & Gestures 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE).
Scherer, K. R. (1986). Vocal affect expressions: A review and a model for future research. Psychological Bulletin, 99, 143–165.
Schroder, M. (2009). Expressive speech synthesis: Past, present and possible futures. In Affective information processing. Springer, (vol. 2, pp. 111–126).
Slaney, M., & McRoberts, G. (2003). BabyEars: A recognition system for affective vocalizations. Speech Communication, 39, 367–384.
Vroomen, J., Collier, R., Mozziconacci, S. J. L. (1993). Duration and intonation in emotional speech. In Proceedings of EUROSPEECH (pp. 577–580).
Vydana, H. K., Kadiri, S. R., & Vuppala, A. K. (2016). Vowel-based non-uniform prosody modification for emotion conversion. Circuits, Systems and Signal Processing, 35(5), 1643–1663.
Whiteside, S. P. (1998). Simulated emotions: An acoustic study of voice and perturbation measures. In Proceedings of ICSLP, Sydney, Australia (pp. 699–703).
Williams, C. E., & Stevens, K. (1972). Emotions and speech: Some acoustic correlates. The Journal of the Acoustical Society of America, 52, 1238–1250.
Acknowledgements
The present work was financially supported from the completed DST-SERB project titled, “Analysis, Processing and Synthesis of Emotions in Speech (Ref No. SB/FTP/ETA-370/2012)”. The project started on 4-7-2013 and ended on 3-7-2016 (3 years). Authors would like to thank the funding agency for supporting this part of the work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pravena, D., Govind, D. Development of simulated emotion speech database for excitation source analysis. Int J Speech Technol 20, 327–338 (2017). https://doi.org/10.1007/s10772-017-9407-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9407-3