Skip to main content

Anticipating the User: Acoustic Disposition Recognition in Intelligent Interactions

  • Chapter
  • First Online:
Innovations in Big Data Mining and Embedded Knowledge

Abstract

Contemporary technical devices obey the paradigm of naturalistic multimodal interaction and user-centric individualisation. Users expect devices to interact intelligently, to anticipate their needs, and to adapt to their behaviour. To do so, companion-like solutions have to take into account the affective and dispositional state of the user, and therefore to be trained and modified using interaction data and corpora. We argue that, in this context, big data alone is not purposeful, since important effects are obscured, and since high-quality annotation is too costly. We encourage the collection and use of enriched data. We report on recent trends in this field, presenting methodologies for collecting data with rich disposition variety and predictable classifications based on a careful design and standardised psychological assessments. Besides socio-demographic information and personality traits, we also use speech events to improve user state models. Furthermore, we present possibilities to increase the amount of enriched data in cross-corpus or intra-corpus way based on recent learning approaches. Finally, we highlight particular recent neural recognition approaches feasible for smaller datasets, and covering temporal aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Poria et al. state that some authors define early fusion as fusion directly on signal level, introducing mid-level fusion as fusion on feature level [89].

References

  1. Abraham, W.: Multilingua. J. Cross-Cult. and Interlang. Commun. 10(1/2) (1991). s.p

    Google Scholar 

  2. Allwood, J., Nivre, J., Ahlsn, E.: On the semantics and pragmatics of linguistic feedback. J. Semant. 9(1), 1–26 (1992)

    Article  Google Scholar 

  3. Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)

    Article  Google Scholar 

  4. Bachorowski, J.A., Owren, M.J.: Vocal expression of emotion: acoustic properties of speech are associated with emotional intensity and context. Psycholog. Sci. 6(4), 219–224 (1995)

    Article  Google Scholar 

  5. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: Proceedings of 2017 International Conference on Platform Technology and Service, pp. 1–5. IEEE, Busan, South Korea (2017)

    Google Scholar 

  6. Baimbetov, Y., Khalil, I., Steinbauer, M., Anderst-Kotsis, G.: Using big data for emotionally intelligent mobile services through multi-modal emotion recognition. In: Proceedings of 13th International Conference on Smart Homes and Health Telematics, pp. 127–138. Springer, Geneva, Switzerland (2015)

    Chapter  Google Scholar 

  7. Batliner, A., Fischer, K., Huber, R., Spiker, J., Nöth, E.: Desperately seeking emotions: actors, wizards and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, pp. 195–200. Textflow, Belfast, UK (2000)

    Google Scholar 

  8. Batliner, A., Nöth, E., Buckow, J., Huber, R., Warnke, V., Niemann, H.: Whence and whither prosody in automatic speech understanding: A case study. In: Proceedings of the Workshop on Prosody and Speech Recognition 2001, pp. 3–12. ISCA, Red Bank, USA (2001)

    Google Scholar 

  9. Bazzanella, C.: Phatic connectives as interactional cues in contemporary spoken italian. J. Pragmat. 14(4), 629–647 (1990)

    Article  Google Scholar 

  10. Biundo, S., Wendemuth, A.: Companion-technology for cognitive technical systems. KI-Künstliche Intell. 30(1), 71–75 (2016)

    Article  Google Scholar 

  11. Biundo, S., Wendemuth, A. (eds.): Companion Technology—A Paradigm Shift in Human-Technology Interaction. Springer, Cham, Switzerland (2017)

    Google Scholar 

  12. Böck, R.: Multimodal automatic user disposition recognition in human-machine interaction. Ph.D. thesis, Otto von Guericke University Magdeburg (2013)

    Google Scholar 

  13. Böck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in emotion recognition from speech. In: Intelligent Human Computer Interaction, pp. 189–201. Springer, Cham, Switzerland (2017)

    Chapter  Google Scholar 

  14. Böck, R., Egorow, O., Wendemuth, A.: Speaker-group specific acoustic differences in consecutive stages of spoken interaction. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung, pp. 211–218. TUDpress (2017)

    Google Scholar 

  15. Böck, R., Egorow, O., Wendemuth, A.: Acoustic detection of consecutive stages of spoken interaction based on speaker-group specific features. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung of the 29. Konferenz Elektronische Sprachsignalverarbeitung, pp. 247–254. TUDpress (2018)

    Google Scholar 

  16. Böck, R., Hübner, D., Wendemuth, A.: Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung of the 15th IEEE Mediterranean Electrotechnical Conference, pp. 1586–1590. IEEE, Valletta, Malta (2010)

    Google Scholar 

  17. Böck, R., Siegert, I.: Recognising emotional evolution from speech. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung of the International Workshop on Emotion Representations and Modelling for Companion Technologies. pp. 13–18. ACM, Seattle, USA (2015)

    Google Scholar 

  18. Bolinger, D.: Intonation and its uses: Melody in Grammar and Discourse. Stanford University Press, Stanford, CA (1989)

    Google Scholar 

  19. Bonin, F.: Content and context in conversations : the role of social and situational signals in conversation structure. Ph.D. thesis, Trinity College Dublin (2016)

    Google Scholar 

  20. Butler, L.D., Nolen-Hoeksema, S.: Gender differences in responses to depressed mood in a college sample. Sex Roles 30, 331–346 (1994)

    Article  Google Scholar 

  21. Byrne, C., Foulkes, P.: The mobile phone effect on vowel formants. Int. J. Speech Lang. Law 11, 83–102 (2004)

    Article  Google Scholar 

  22. Carroll, J.M.: Human computer interaction—brief intro. The Interaction Design Foundation, Aarhus, Denmark, 2nd edn. (2013). s.p

    Google Scholar 

  23. Chen, J., Chaudhari, N.: Segmented-memory recurrent neural networks. IEEE Trans. Neural Netw. 20(8), 1267–1280 (2009)

    Article  Google Scholar 

  24. Chowdhury, S.A., Riccardi, G.: A deep learning approach to modeling competitiveness in spoken conversations. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5684. IEEE (2017)

    Google Scholar 

  25. Costa, P., McCrae, R.: NEO-PI-R Professional manual. Revised NEO Personality Inventory (NEO-PI-R) and NEO Five Factor Inventory (NEO-FFI). Psychological Assessment Resources, Odessa, USA (1992)

    Google Scholar 

  26. Cowie, R.: Perceiving emotion: towards a realistic understanding of the task. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 364(1535), 3515–3525 (2009)

    Article  Google Scholar 

  27. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ’feeltrace’: An instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, pp. 19–24. Textflow, Belfast, UK (2000)

    Google Scholar 

  28. Crispim-Junior, C.F., Ma, Q., Fosty, B., Romdhane, R., Bremond, F., Thonnat, M.: Combining multiple sensors for event recognition of older people. In: Proceedings of the 1st Workshop on Multimedia Indexing and information Retrieval for Healthcare, pp. 15–22. ACM, Barcelona, Spain (2013)

    Google Scholar 

  29. Cuperman, R., Ickes, W.: Big five predictors of behavior and perceptions in initial dyadic interactions: personality similarity helps extraverts and introverts, but hurts ’disagreeables’. J. Personal. Soc. Psychol. 97, 667–684 (2009)

    Article  Google Scholar 

  30. Dobris̆ek, S., Gajs̆ek, R., Mihelic̆, F., Paves̆ić, N., S̆truc, V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Robot Syst. 10 (2013). s.p

    Google Scholar 

  31. Egorow, O., Lotz, A., Siegert, I., Böck, R., Krüger, J., Wendemuth, A.: Accelerating manual annotation of filled pauses by automatic pre-selection. In: Proceedings of the 2017 International Conference on Companion Technology (ICCT), pp. 1–6 (2017)

    Google Scholar 

  32. Egorow, O., Siegert, I., Andreas, W.: Prediction of user satisfaction in naturalistic human-computer interaction. Kognitive Syst. 2017(1) (2017). s.p

    Google Scholar 

  33. Egorow, O., Wendemuth, A.: Detection of challenging dialogue stages using acoustic signals and biosignals. In: Proceedings of the WSCG 2016, pp. 137–143. Springer, Plzen, Chech Republic (2016)

    Google Scholar 

  34. Egorow, O., Wendemuth, A.: Emotional features for speech overlaps classification. In: INTERSPEECH 2017, pp. 2356–2360. ISCA, Stockholm, Sweden (2017)

    Google Scholar 

  35. Etemadpour, R., Murray, P., Forbes, A.G.: Evaluating density-based motion for big data visual analytics. In: IEEE International Conference on Big Data, pp. 451–460. IEEE, Washington, USA (2014)

    Google Scholar 

  36. Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)

    Article  Google Scholar 

  37. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838. ACM, Barcelona, Spain (2013)

    Google Scholar 

  38. Eyben, F., Wöllmer, M., Schuller, B.: Openearintroducing the munich open-source emotion and affect recognition toolkit. In: Proceedings of the 2009th ACII, pp. 1–6. IEEE, Amsterdam, Netherlands (Sept 2009)

    Google Scholar 

  39. Forgas, J.P.: Feeling and doing: affective influences on interpersonal behavior. Psychol. Inq. 13, 1–28 (2002)

    Article  Google Scholar 

  40. Frommer, J., Rösner, D., Haase, M., Lange, J., Friesen, R., Otto, M.: Detection and Avoidance of Failures in Dialogues–Wizard of Oz Experiment Operator’s Manual. Pabst Science Publishers (2012)

    Google Scholar 

  41. Gill, A., French, R.: Level of representation and semantic distance: Rating author personality from texts. In: Proceedings of the Second European Cognitive Science Conference. Taylor & Francis, Delphi, Greece (2007). s.p

    Google Scholar 

  42. Glüge, S., Böck, R., Ott, T.: Emotion recognition from speech using representation learning in extreme learning machines. In: Proceedings of the 9th IJCCI, pp. 1–6. INSTICC, Funchal, Madeira, Portugal (2017)

    Google Scholar 

  43. Glüge, S., Böck, R., Wendemuth, A.: Segmented-Memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In: Proceedings of the 3rd IJCCI, pp. 308–315. SCITEPRESS, Paris, France (2011)

    Google Scholar 

  44. Goldberg, J.A.: Interrupting the discourse on interruptions: an analysis in terms of relationally neutral, power-and rapport-oriented acts. J. Pragmat. 14(6), 883–903 (1990)

    Article  Google Scholar 

  45. Goldberg, L.R.: The development of markers for the Big-five factor structure. J. Pers. Soc. Psychol. 59(6), 1216–1229 (1992)

    Article  Google Scholar 

  46. Gosztolya, G.: Optimized time series filters for detecting laughter and filler events. Proc. Interspeech 2017, 2376–2380 (2017)

    Article  Google Scholar 

  47. Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for spontaneous speech recognition. In: EUROSPEECH 1999, pp. 227–230. ISCA, Budapest, Hungary (1999)

    Google Scholar 

  48. Gross, J.J., Carstensen, L.L., Pasupathi, M., Tsai, J., Skorpen, C.G., Hsu, A.Y.: Emotion and aging: experience, expression, and control. Psychol. Aging 12, 590–599 (1997)

    Article  Google Scholar 

  49. Hamacher, D., Hamacher, D., Müller, R., Schega, L., Zech, A.: Exploring phase dependent functional gait variability. Hum. Mov. Sci. 52(Supplement C), 191–196 (2017)

    Article  Google Scholar 

  50. Hamzah, R., Jamil, N., Seman, N., Ardi, N., Doraisamy, S.C.: Impact of acoustical voice activity detection on spontaneous filled pause classification. In: Proceedings of the IEEE ICOS-2014, pp. 1–6. IEEE, Subang, Malaysia (2014)

    Google Scholar 

  51. Hattie, J.: Visible Learning. A Bradford Book, Routledge, London, UK (2009)

    Google Scholar 

  52. Hölker, K.: Zur Analyse von Markern: Korrektur- und Schlußmarker des Französischen. Steiner, Stuttgart, Germany (1988)

    Google Scholar 

  53. Hölker, K.: Französisch: Partikelforschung. Lexikon der Romanistischen Linguistik 5, 77–88 (1991)

    Google Scholar 

  54. Honold, F., Bercher, P., Richter, F., Nothdurft, F., Geier, T., Barth, R., Hoernle, T., Schüssel, F., Reuter, S., Rau, M., Bertrand, G., Seegebarth, B., Kurzok, P., Schattenberg, B., Minker, W., Weber, M., Biundo-Stephan, S.: Companion-technology: towards user- and situation-adaptive functionality of technical systems. In: 2014 International Conference on Intelligent Environments, pp. 378–381. IEEE, Shanghai, China (2014)

    Google Scholar 

  55. Honold, F., Schüssel, F., Weber, M.: The automated interplay of multimodal fission and fusion in adaptive HCI. In: 2014 International Conference on Intelligent Environments, pp. 170–177. IEEE, Shanghai, China (2014)

    Google Scholar 

  56. Horowitz, L., Alden, L., Wiggins, J., Pincus, A.: Inventory of Interpersonal Problems Manual. The Psychological Corporation, Odessa, USA (2000)

    Google Scholar 

  57. Hossain, M.S., Muhammad, G., Alhamid, M.F., Song, B., Al-Mutib, K.: Audio-visual emotion recognition using big data towards 5g. Mob. Netw. Appl. 21(5), 753–763 (2016)

    Article  Google Scholar 

  58. Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man, Cybern Part B (Cybernetics) 42(2), 513–529 (2012)

    Google Scholar 

  59. Huang, Y., Hu, M., Yu, X., Wang, T., Yang, C.: Transfer learning of deep neural network for speech emotion recognition. In: Pattern Recognition—Part 2, pp. 721–729. Springer, Singapore (2016)

    Google Scholar 

  60. Huang, Z., Epps, J.: Detecting the instant of emotion change from speech using a martingale framework. In: 2016 IEEE International Conference on Acoustics. Speech and Signal Processing, pp. 5195–5199. IEEE, Shanghai, China (2016)

    Google Scholar 

  61. Huang, Z., Epps, J., Ambikairajah, E.: An investigation of emotion change detection from speech. In: INTERSPEECH 2015, pp. 1329–1333. ISCA, Dresden, Germany (2015)

    Google Scholar 

  62. Izard, C.E., Libero, D.Z., Putnam, P., Haynes, O.M.: Stability of emotion experiences and their relations to traits of personality. J. Person. Soc. Psychol. 64, 847–860 (1993)

    Article  Google Scholar 

  63. Jahnke, W., Erdmann, G., Kallus, K.: Stressverarbeitungsfragebogen mit SVF 120 und SVF 78, 3rd edn. Hogrefe, Göttingen, Germany (2002)

    Google Scholar 

  64. Jiang, A., Yang, J., Yang, Y.: General Change Detection Explains the Early Emotion Effect in Implicit Speech Perception, pp. 66–74. Springer, Heidelberg, Germany (2013)

    Chapter  Google Scholar 

  65. Jucker, A.H., Ziv, Y.: Discourse Markers: Introduction, pp. 1–12. John Benjamins Publishing Company, Amsterdam, The Netherlands (1998)

    Chapter  Google Scholar 

  66. Kächele, M., Schels, M., Meudt, S., Kessler, V., Glodek, M., Thiam, P., Tschechne, S., Palm, G., Schwenker, F.: On annotation and evaluation of multi-modal corpora in affective human-computer interaction. In: Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, pp. 35–44. Springer, Cham (2015)

    Google Scholar 

  67. Kindsvater, D., Meudt, S., Schwenker, F.: Fusion architectures for multimodal cognitive load recognition. In: Schwenker, F., Scherer, S. (eds.) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction, pp. 36–47. Springer, Cham (2017)

    Chapter  Google Scholar 

  68. Kohrs, C., Angenstein, N., Brechmann, A.: Delays in human-computer interaction and their effects on brain activity. PLOS One 11(1), 1–14 (2016)

    Article  Google Scholar 

  69. Kohrs, C., Hrabal, D., Angenstein, N., Brechmann, A.: Delayed system response times affect immediate physiology and the dynamics of subsequent button press behavior. Psychophysiology 51(11), 1178–1184 (2014)

    Article  Google Scholar 

  70. Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1972–1979. IEEE (2017)

    Google Scholar 

  71. Krüger, J., Wahl, M., Frommer, J.: Making the system a relational partner: Users ascrip-tions in individualization-focused interactions with companion-systems. In: Proceedings of the 8th CENTRIC 2015, pp. 48–54. Barcelona, Spain (2015)

    Google Scholar 

  72. Lange, J., Frommer, J.: Subjektives Erleben und intentionale Einstellung in Interviews zur Nutzer-Companion-Interaktion. In: proceedings der 41. GI-Jahrestagung, pp. 240–254. Bonner Köllen Verlag, Berlin, Germany (2011)

    Google Scholar 

  73. Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., Elenius, K.: Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput. Speech Lang. 25(1), 84–104 (2011)

    Article  Google Scholar 

  74. Lee, C.C., Lee, S., Narayanan, S.S.: An analysis of multimodal cues of interruption in dyadic spoken interactions. In: INTERSPEECH 2008, pp. 1678–1681. ISCA, Brisbane, Australia (2008)

    Google Scholar 

  75. Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Proc. 13(2), 293–303 (2005)

    Article  Google Scholar 

  76. Lefter, I., Jonker, C.M.: Aggression recognition using overlapping speech. In: Proceedings of the 2017th ACII, pp. 299–304 (2017)

    Google Scholar 

  77. Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4. IEEE, Jeju, South Korea (2016)

    Google Scholar 

  78. Linville, S.E.: Vocal Aging. Singular Publishing Group, San Diego, USA (2001)

    Google Scholar 

  79. Lotz, A.F., Siegert, I., Wendemuth, A.: Automatic differentiation of form-function-relations of the discourse particle "hm" in a naturalistic human-computer interaction. In: Proceedings of the 26. Konferenz Elektronische Sprachsignalverarbeitung. vol. 78, pp. 172–179. TUDpress, Eichstätt, Germany (2015)

    Google Scholar 

  80. Lotz, A.F., Siegert, I., Wendemuth, A.: Classification of functional meanings of non-isolated discourse particles in human-human-interaction. In: Human-Computer Interaction. Theory, Design, Development and Practice, pp. 53–64. Springer (2016)

    Google Scholar 

  81. Lotz, A.F., Siegert, I., Wendemuth, A.: Comparison of different modeling techniques for robust prototype matching of speech pitch-contours. Kognitive Syst. 2016(1) (2016). s.p

    Google Scholar 

  82. Luengo, I., Navas, E., Hernáez, I.: Feature analysis and evaluation for automatic emotion identification in speech. IEEE Trans. Multimed. 12(6), 490–501 (2010)

    Article  Google Scholar 

  83. Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 30, 457–500 (2007)

    Article  MATH  Google Scholar 

  84. Matsumoto, D., LeRoux, J., Wilson-Cohn, C., Raroque, J., Kooken, K., Ekman, P., Yrizarry, N., Loewinger, S., Uchida, H., Yee, A., Amo, L., Goh, A.: A new test to measure emotion recognition ability: matsumoto and ekman’s Japanese and caucasian brief affect recognition test (JACBART). J. Nonverbal Behav. 24(3), 179–209 (2000)

    Article  Google Scholar 

  85. Moattar, M., Homayounpour, M.: A review on speaker diarization systems and approaches. Speech Commun. 54(10), 1065–1103 (2012)

    Article  Google Scholar 

  86. Murino, V., Gong, S., Loy, C.C., Bazzani, L.: Image and video understanding in big data. Comput. Vis. Image Underst. 156, 1–3 (2017)

    Article  Google Scholar 

  87. Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)

    Article  Google Scholar 

  88. Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., Poggi, I., Schroeder, M., Vinciarelli, A.: Social signal processing: the research agenda. In: Visual Analysis of Humans: Looking at People, pp. 511–538. Springer, London, UK (2011)

    Chapter  Google Scholar 

  89. Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37(Supplement C), 98–125 (2017)

    Article  Google Scholar 

  90. Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: INTERSPEECH 2014, pp. 1816–1820. ISCA, Singapore (2014)

    Google Scholar 

  91. Resseguier, B., Léger, P.M., Sénécal, S., Bastarache-Roberge, M.C., Courtemanche, F.: The influence of personality on users’ emotional reactions. In: Proceedings of Third International Conference on the HCI in Business, Government, and Organizations: Information Systems, pp. 91–98. Springer, Toronto, Canada (2016)

    Chapter  Google Scholar 

  92. Ringeval, F., Amiriparian, S., Eyben, F., Scherer, K., Schuller, B.: Emotion recognition in the wild: incorporating voice and lip activity in multimodal decision-level fusion. In: Proceedings of the 16th ICMI, pp. 473–480. ACM, Istanbul, Turkey (2014)

    Google Scholar 

  93. Rösner, D., Frommer, J., Andrich, R., Friesen, R., Haase, M., Kunze, M., Lange, J., Otto, M.: Last minute: a novel corpus to support emotion, sentiment and social signal processing. In: Proceedings of the Eigth LREC, pp. 82–89. ELRA, Istanbul, Turkey (2012)

    Google Scholar 

  94. Rösner, D., Haase, M., Bauer, T., Günther, S., Krüger, J., Frommer, J.: Desiderata for the design of companion systems. KI - Künstliche Intell. 30(1), 53–61 (2016)

    Article  Google Scholar 

  95. Rösner, D., Hazer-Rau, D., Kohrs, C., Bauer, T., Günther, S., Hoffmann, H., Zhang, L., Brechmann, A.: Is there a biological basis for success in human companion interaction? In: Proceedings of the 18th International Conference on Human-Computer Interaction, pp. 77–88. Springer, Toronto, Canada (2016)

    Google Scholar 

  96. Sani, A., Lestari, D.P., Purwarianti, A.: Filled pause detection in indonesian spontaneous speech. In: Proceedings of the PACLING-2016, pp. 54–64. Springer, Bali, Indonesia (2016)

    Google Scholar 

  97. Schels, M., Kächele, M., Glodek, M., Hrabal, D., Walter, S., Schwenker, F.: Using unlabeled data to improve classification of emotional states in human computer interaction. J. Multimodal User Interfaces 8(1), 5–16 (2014)

    Article  Google Scholar 

  98. Scherer, K.R.: Vocal affect expression: a review and a model for future research. Psychol. Bull. 99(2), 143 (1986)

    Article  Google Scholar 

  99. Schmidt, J.E.: Bausteine der Intonation. In: Neue Wege der Intonationsforschung, Germanistische Linguistik, vol. 157–158, pp. 9–32. Georg Olms Verlag (2001)

    Google Scholar 

  100. Schneider, T.R., Rench, T.A., Lyons, J.B., Riffle, R.: The influence of neuroticism, extraversion and openness on stress responses. Stress Health: J. Int. Soc. Investig. Stress 28, 102–110 (2012)

    Article  Google Scholar 

  101. Schuller, B., Steidl, S., Batliner, A., Nöth, E., Vinciarelli, A., Burkhardt, F., Son, van, V., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The INTERSPEECH 2012 Speaker Trait Challenge. In: INTERSPEECH2012. ISCA, Portland, USA (2012). s.p

    Google Scholar 

  102. Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, pp. 552–557. IEEE, Merano, Italy (2009)

    Google Scholar 

  103. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)

    Article  Google Scholar 

  104. Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH 2013. ISCA, Lyon, France (2013). s.p

    Google Scholar 

  105. Schuller, B.W.: Speech analysis in the big data era. In: Proceedings of the 18th International Conference Text, Speech, and Dialogue, pp. 3–11. Springer, Plzen, Czech Republic (2015)

    Chapter  Google Scholar 

  106. Schulz von Thun, F.: Miteinander reden 1 - Störungen und Klärungen. Rowohlt, Reinbek, Germany (1981)

    Google Scholar 

  107. Shahin, I.M.A.: Gender-dependent emotion recognition based on HMMs and SPHMMs. Int. J. Speech Technol. 16, 133–141 (2013)

    Article  Google Scholar 

  108. Shriberg, E., Stolcke, A., Baron, D.: Observations on overlap: findings and implications for automatic processing of multi-party conversation. In: INTERSPEECH, pp. 1359–1362 (2001)

    Google Scholar 

  109. Sidorov, M., Brester, C., Minker, W., Semenkin, E.: Speech-based emotion recognition: feature selection by self-adaptive multi-criteria genetic algorithm. In: Proceedings of the Ninth LREC. ELRA, Reykjavik, Iceland (2014)

    Google Scholar 

  110. Sidorov, M., Schmitt, A., Semenkin, E., Minker, W.: Could speaker, gender or age awareness be beneficial in speech-based emotion recognition? In: Proceedings of the Tenth LREC, pp. 61–68. ELRA, Portorož, Slovenia (2016)

    Google Scholar 

  111. Siegert, I., Böck, R., Vlasenko, B., Ohnemus, K., Wendemuth, A.: Overlapping speech, utterance duration and affective content in HHI and HCI—an comparison. In: Proceedings of 6th Conference on Cognitive Infocommunications, pp. 83–88. IEEE, Györ, Hungary (2015)

    Google Scholar 

  112. Siegert, I., Böck, R., Vlasenko, B., Wendemuth, A.: Exploring dataset similarities using PCA-based feature selection. In: Proceedings of the 2015th ACII, pp. 387–393. IEEE, Xi’an, China (2015)

    Google Scholar 

  113. Siegert, I., Böck, R., Wendemuth, A.: Modeling users’ mood state to improve human-machine-interaction. In: Cognitive Behavioural Systems, pp. 273–279. Springer (2012)

    Google Scholar 

  114. Siegert, I., Böck, R., Wendemuth, A.: Inter-Rater reliability for emotion annotation in human-computer interaction—comparison and methodological improvements. J. Multimodal User Interfaces 8, 17–28 (2014)

    Article  Google Scholar 

  115. Siegert, I., Böck, R., Wendemuth, A.: Using the PCA-based dataset similarity measure to improve cross-corpus emotion recogniton. Comput. Speech Lang. 1–12 (2018)

    Google Scholar 

  116. Siegert, I., Hartmann, K., Philippou-Hübner, D., Wendemuth, A.: Human behaviour in HCI: complex emotion detection through sparse speech features. In: Human Behavior Understanding, Lecture Notes in Computer Science, vol. 8212, pp. 246–257. Springer (2013)

    Google Scholar 

  117. Siegert, I., Krüger, J., Haase, M., Lotz, A.F., Günther, S., Frommer, J., Rösner, D., Wendemuth, A.: Discourse particles in human-human and human-computer interaction—analysis and evaluation. In: Proceedings of the 18th International Conference on Human-Computer Interaction, pp. 105–117. Springer, Toronto, Canada (2016)

    Google Scholar 

  118. Siegert, I., Lotz, A.F., Duong, L.L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Proceedings of the 27. Konferenz Elektronische Sprachsignalverarbeitung, pp. 229–236 (2016)

    Google Scholar 

  119. Siegert, I., Lotz, A.F., Egorow, O., Böck, R., Schega, L., Tornow, M., Thiers, A., Wendemuth, A.: Akustische Marker für eine verbesserte Situations- und Intentionserkennung von technischen Assistenzsystemen. In: Proceedings of the Zweite transdisziplinäre Konferenz. Technische Unterstützungssysteme, die die Menschen wirklich wollen, pp. 465–474. University Hamburg, Hamburg, Germany (2016)

    Google Scholar 

  120. Siegert, I., Philippou-Hübner, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigation of speaker group-dependent modelling for recognition of affective states from speech. Cogn. Comput. 6(4), 892–913 (2014)

    Article  Google Scholar 

  121. Siegert, I., Philippou-Hübner, D., Tornow, M., Heinemann, R., Wendemuth, A., Ohnemus, K., Fischer, S., Schreiber, G.: Ein Datenset zur Untersuchung emotionaler Sprache in Kundenbindungsdialogen. In: Proceedings of the 26. Konferenz Elektronische Sprachsignalverarbeitung, pp. 180–187. TUDpress, Eichstätt, Germany (2015)

    Google Scholar 

  122. Siegert, I., Prylipko, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigating the form-function-relation of the discourse particle "hm" in a naturalistic human-computer interaction. In: Recent Advances of Neural Network Models and Applications. Smart Innovation, Systems and Technologies, vol. 26, pp. 387–394. Springer, Berlin (2014)

    Chapter  Google Scholar 

  123. Song, P., Jin, Y., Zhao, L., Xin, M.: Speech emotion recognition using transfer learning. IEICE Trans. Inf. Syst. E97.D(9), 2530–2532 (2014)

    Article  Google Scholar 

  124. Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, H.G., Schuller, B.W.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of the ICASSP, pp. 5688–5691. IEEE (2011)

    Google Scholar 

  125. Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 16–28 (2016)

    Article  Google Scholar 

  126. Tamir, M.: Differential preferences for happiness: extraversion and trait-consistent emotion regulation. J. Pers. 77, 447–470 (2009)

    Article  Google Scholar 

  127. Terracciano, A., Merritt, M., Zonderman, A.B., Evans, M.K.: Personality traits and sex differences in emotions recognition among african americans and caucasians. Ann. New York Acad. Sci. 1000, 309–312 (2003)

    Article  Google Scholar 

  128. Thiam, P., Meudt, S., Kächele, M., Palm, G., Schwenker, F.: Detection of emotional events utilizing support vector methods in an active learning HCI scenario. In: Proceedings of the 2014 Workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems, pp. 31–36. ACM, Istanbul, Turkey (2014)

    Google Scholar 

  129. Thiam, P., Meudt, S., Schwenker, F., Palm, G.: Active Learning for Speech Event Detection in HCI. In: Proceedings of the 7th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition, pp. 285–297. Springer, Ulm, Germany (2016)

    Chapter  Google Scholar 

  130. Thiers, A., Hamacher, D., Tornow, M., Heinemann, R., Siegert, I., Wendemuth, A., Schega, L.: Kennzeichnung von Nutzerprofilen zur Interaktionssteuerung beim Gehen. In: Proceedings of the Zweite transdisziplinäre Konferenz. Technische Unterstützungssysteme, die die Menschen wirklich wollen, pp. 475–484. University Hamburg, Hamburg, Germany (2016)

    Google Scholar 

  131. Tighe, H.: Emotion recognition and personality traits: a pilot study. Summer Res. (2012). s.p

    Google Scholar 

  132. Tornow, M., Krippl, M., Bade, S., Thiers, A., Siegert, I., Handrich, S., Krüger, J., Schega, L., Wendemuth, A.: Integrated health and fitness (iGF)-Corpus - ten-Modal highly synchronized subject dispositional and emotional human machine interactions. In: Proceedings of Multimodal Corpora: Computer vision and language processing, pp. 21–24. ELRA, Portorož, Slovenia (2016)

    Google Scholar 

  133. Uzair, M., Shafait, F., Ghanem, B., Mian, A.: Representation learning with deep extreme learning machines for efficient image set classification. Neural Comput. Appl. pp. 1–13 (2016)

    Google Scholar 

  134. Valente, F., Kim, S., Motlicek, P.: Annotation and recognition of personality traits in spoken conversations from the ami meetings corpus. In: INTERSPEECH 2012. ISCA, Portland, USA (2012). s.p

    Google Scholar 

  135. Valli, A.: The design of natural interaction. Multimed. Tools Appl. 38(3), 295–305 (2008)

    Article  Google Scholar 

  136. van der Veer, G.C., Tauber, M.J., Waem, Y., van Muylwijk, B.: On the interaction between system and user characteristics. Behav. Inf. Technol. 4, 289–308 (1985)

    Google Scholar 

  137. Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in russian spontaneous speech. In: Proceedings of the SPECOM-2015, pp. 285–292. Springer, Athens, Greece (2015)

    Chapter  Google Scholar 

  138. Vinciarelli, A., Esposito, A., Andre, E., Bonin, F., Chetouani, M., Cohn, J.F., Cristani, M., Fuhrmann, F., Gilmartin, E., Hammal, Z., Heylen, D., Kaiser, R., Koutsombogera, M., Potamianos, A., Renals, S., Riccardi, G., Salah, A.A.: Open challenges in modelling, analysis and synthesis of human behaviour in human-human and human-machine interactions. Cogn. Comput. 7(4), 397–413 (2015)

    Article  Google Scholar 

  139. Vinciarelli, A., Pantic, M., Boulard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 12(27), 1743–1759 (2009)

    Article  Google Scholar 

  140. Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: Proceedings of the ICME. IEEE, Barcelona, Spain (2011). s.p

    Google Scholar 

  141. Vlasenko, B., Prylipko, D., Böck, R., Wendemuth, A.: Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput. Speech Lang. 28(2), 483–500 (2014)

    Article  Google Scholar 

  142. Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of the ICME, pp. 474–477. IEEE, Amsterdam, The Netherlands (2005)

    Google Scholar 

  143. Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of the Fiveth LREC. ELRA, Genoa, Italy (2006). s.p

    Google Scholar 

  144. Walter, S., Kim, J., Hrabal, D., Crawcour, S.C., Kessler, H., Traue, H.C.: Transsituational individual-specific biopsychological classification of emotions. IEEE Trans. Syst. Man Cybern.: Syst. 43(4), 988–995 (2013)

    Article  Google Scholar 

  145. Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Human-Computer Interaction. Towards Mobile and Intelligent Interaction Environments, pp. 603–611. Springer (2011)

    Google Scholar 

  146. Watzlawick, P., Beavin, J.H., Jackson, D.D.: Menschliche Kommunikation: Formen, Störungen, Paradoxien. Verlag Hans Huber, Bern, Switzerland (2007)

    Google Scholar 

  147. Weinberg, G.M.: The Psychology of Computer Programming. Van Nostrand Reinhold, New York, USA (1971)

    Google Scholar 

  148. Weißkirchen, N., Böck, R., Wendemuth, A.: Recognition of emotional speech with convolutional neural networks by means of spectral estimates. In: Proceedings of the 2017th ACII, pp. 1–6. IEEE, San Antonio, USA (2017)

    Google Scholar 

  149. White, S.: Backchannels across cultures: a study of americans and japanese. Lang. Soc. 18(1), 59–76 (1989)

    Article  Google Scholar 

  150. Wilks, Y.: Artificial companions. Interdiscip. Sci. Rev. 30(2), 145–152 (2005)

    Article  Google Scholar 

  151. Wolff, S., Brechmann, A.: Carrot and stick 2.0: the benefits of natural and motivational prosody in computer-assisted learning. Comput. Hum. Behav. 43(Supplement C), 76–84 (2015)

    Article  Google Scholar 

  152. Yang, L.C.: Visualizing spoken discourse: prosodic form and discourse functions of interruptions. In: Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, pp. 1–10. Association for Computational Linguistics, Aalborg, Denmark (2001)

    Google Scholar 

Download references

Acknowledgements

We acknowledge support by the project “Intention-based Anticipatory Interactive Systems” (IAIS) funded by the European Funds for Regional Development (EFRE) and by the Federal State of Sachsen-Anhalt, Germany, under the grant number ZS/2017/10/88785. Further, we thank the projects “Mova3D” (grant number: 03ZZ0431H) and “Mod3D” (grant number: 03ZZ0414) funded by 3Dsensation within the Zwanzig20 funding program by the German Federal Ministry of Education and Research (BMBF). Moreover, the project has received funding from the European Union’s Horizon 2020 research and innovation programme under the ADAS and ME consortium, grant agreement No 688900.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronald Böck .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Böck, R., Egorow, O., Höbel-Müller, J., Requardt, A.F., Siegert, I., Wendemuth, A. (2019). Anticipating the User: Acoustic Disposition Recognition in Intelligent Interactions. In: Esposito, A., Esposito, A., Jain, L. (eds) Innovations in Big Data Mining and Embedded Knowledge. Intelligent Systems Reference Library, vol 159. Springer, Cham. https://doi.org/10.1007/978-3-030-15939-9_11

Download citation

Publish with us

Policies and ethics