Anticipating the User: Acoustic Disposition Recognition in Intelligent Interactions

Böck, Ronald; Egorow, Olga; Höbel-Müller, Juliane; Requardt, Alicia Flores; Siegert, Ingo; Wendemuth, Andreas

doi:10.1007/978-3-030-15939-9_11

Ronald Böck⁶,
Olga Egorow⁷,
Juliane Höbel-Müller⁷,
Alicia Flores Requardt⁷,
Ingo Siegert⁷ &
…
Andreas Wendemuth⁶

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 159))

537 Accesses
2 Citations

Abstract

Contemporary technical devices obey the paradigm of naturalistic multimodal interaction and user-centric individualisation. Users expect devices to interact intelligently, to anticipate their needs, and to adapt to their behaviour. To do so, companion-like solutions have to take into account the affective and dispositional state of the user, and therefore to be trained and modified using interaction data and corpora. We argue that, in this context, big data alone is not purposeful, since important effects are obscured, and since high-quality annotation is too costly. We encourage the collection and use of enriched data. We report on recent trends in this field, presenting methodologies for collecting data with rich disposition variety and predictable classifications based on a careful design and standardised psychological assessments. Besides socio-demographic information and personality traits, we also use speech events to improve user state models. Furthermore, we present possibilities to increase the amount of enriched data in cross-corpus or intra-corpus way based on recent learning approaches. Finally, we highlight particular recent neural recognition approaches feasible for smaller datasets, and covering temporal aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Poria et al. state that some authors define early fusion as fusion directly on signal level, introducing mid-level fusion as fusion on feature level [89].

References

Abraham, W.: Multilingua. J. Cross-Cult. and Interlang. Commun. 10(1/2) (1991). s.p
Google Scholar
Allwood, J., Nivre, J., Ahlsn, E.: On the semantics and pragmatics of linguistic feedback. J. Semant. 9(1), 1–26 (1992)
Article Google Scholar
Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Article Google Scholar
Bachorowski, J.A., Owren, M.J.: Vocal expression of emotion: acoustic properties of speech are associated with emotional intensity and context. Psycholog. Sci. 6(4), 219–224 (1995)
Article Google Scholar
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: Proceedings of 2017 International Conference on Platform Technology and Service, pp. 1–5. IEEE, Busan, South Korea (2017)
Google Scholar
Baimbetov, Y., Khalil, I., Steinbauer, M., Anderst-Kotsis, G.: Using big data for emotionally intelligent mobile services through multi-modal emotion recognition. In: Proceedings of 13th International Conference on Smart Homes and Health Telematics, pp. 127–138. Springer, Geneva, Switzerland (2015)
Chapter Google Scholar
Batliner, A., Fischer, K., Huber, R., Spiker, J., Nöth, E.: Desperately seeking emotions: actors, wizards and human beings. In: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, pp. 195–200. Textflow, Belfast, UK (2000)
Google Scholar
Batliner, A., Nöth, E., Buckow, J., Huber, R., Warnke, V., Niemann, H.: Whence and whither prosody in automatic speech understanding: A case study. In: Proceedings of the Workshop on Prosody and Speech Recognition 2001, pp. 3–12. ISCA, Red Bank, USA (2001)
Google Scholar
Bazzanella, C.: Phatic connectives as interactional cues in contemporary spoken italian. J. Pragmat. 14(4), 629–647 (1990)
Article Google Scholar
Biundo, S., Wendemuth, A.: Companion-technology for cognitive technical systems. KI-Künstliche Intell. 30(1), 71–75 (2016)
Article Google Scholar
Biundo, S., Wendemuth, A. (eds.): Companion Technology—A Paradigm Shift in Human-Technology Interaction. Springer, Cham, Switzerland (2017)
Google Scholar
Böck, R.: Multimodal automatic user disposition recognition in human-machine interaction. Ph.D. thesis, Otto von Guericke University Magdeburg (2013)
Google Scholar
Böck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in emotion recognition from speech. In: Intelligent Human Computer Interaction, pp. 189–201. Springer, Cham, Switzerland (2017)
Chapter Google Scholar
Böck, R., Egorow, O., Wendemuth, A.: Speaker-group specific acoustic differences in consecutive stages of spoken interaction. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung, pp. 211–218. TUDpress (2017)
Google Scholar
Böck, R., Egorow, O., Wendemuth, A.: Acoustic detection of consecutive stages of spoken interaction based on speaker-group specific features. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung of the 29. Konferenz Elektronische Sprachsignalverarbeitung, pp. 247–254. TUDpress (2018)
Google Scholar
Böck, R., Hübner, D., Wendemuth, A.: Determining optimal signal features and parameters for hmm-based emotion classification. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung of the 15th IEEE Mediterranean Electrotechnical Conference, pp. 1586–1590. IEEE, Valletta, Malta (2010)
Google Scholar
Böck, R., Siegert, I.: Recognising emotional evolution from speech. In: Proceedings of the 28. Konferenz Elektronische Sprachsignalverarbeitung of the International Workshop on Emotion Representations and Modelling for Companion Technologies. pp. 13–18. ACM, Seattle, USA (2015)
Google Scholar
Bolinger, D.: Intonation and its uses: Melody in Grammar and Discourse. Stanford University Press, Stanford, CA (1989)
Google Scholar
Bonin, F.: Content and context in conversations : the role of social and situational signals in conversation structure. Ph.D. thesis, Trinity College Dublin (2016)
Google Scholar
Butler, L.D., Nolen-Hoeksema, S.: Gender differences in responses to depressed mood in a college sample. Sex Roles 30, 331–346 (1994)
Article Google Scholar
Byrne, C., Foulkes, P.: The mobile phone effect on vowel formants. Int. J. Speech Lang. Law 11, 83–102 (2004)
Article Google Scholar
Carroll, J.M.: Human computer interaction—brief intro. The Interaction Design Foundation, Aarhus, Denmark, 2nd edn. (2013). s.p
Google Scholar
Chen, J., Chaudhari, N.: Segmented-memory recurrent neural networks. IEEE Trans. Neural Netw. 20(8), 1267–1280 (2009)
Article Google Scholar
Chowdhury, S.A., Riccardi, G.: A deep learning approach to modeling competitiveness in spoken conversations. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5680–5684. IEEE (2017)
Google Scholar
Costa, P., McCrae, R.: NEO-PI-R Professional manual. Revised NEO Personality Inventory (NEO-PI-R) and NEO Five Factor Inventory (NEO-FFI). Psychological Assessment Resources, Odessa, USA (1992)
Google Scholar
Cowie, R.: Perceiving emotion: towards a realistic understanding of the task. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 364(1535), 3515–3525 (2009)
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: ’feeltrace’: An instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, pp. 19–24. Textflow, Belfast, UK (2000)
Google Scholar
Crispim-Junior, C.F., Ma, Q., Fosty, B., Romdhane, R., Bremond, F., Thonnat, M.: Combining multiple sensors for event recognition of older people. In: Proceedings of the 1st Workshop on Multimedia Indexing and information Retrieval for Healthcare, pp. 15–22. ACM, Barcelona, Spain (2013)
Google Scholar
Cuperman, R., Ickes, W.: Big five predictors of behavior and perceptions in initial dyadic interactions: personality similarity helps extraverts and introverts, but hurts ’disagreeables’. J. Personal. Soc. Psychol. 97, 667–684 (2009)
Article Google Scholar
Dobris̆ek, S., Gajs̆ek, R., Mihelic̆, F., Paves̆ić, N., S̆truc, V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Robot Syst. 10 (2013). s.p
Google Scholar
Egorow, O., Lotz, A., Siegert, I., Böck, R., Krüger, J., Wendemuth, A.: Accelerating manual annotation of filled pauses by automatic pre-selection. In: Proceedings of the 2017 International Conference on Companion Technology (ICCT), pp. 1–6 (2017)
Google Scholar
Egorow, O., Siegert, I., Andreas, W.: Prediction of user satisfaction in naturalistic human-computer interaction. Kognitive Syst. 2017(1) (2017). s.p
Google Scholar
Egorow, O., Wendemuth, A.: Detection of challenging dialogue stages using acoustic signals and biosignals. In: Proceedings of the WSCG 2016, pp. 137–143. Springer, Plzen, Chech Republic (2016)
Google Scholar
Egorow, O., Wendemuth, A.: Emotional features for speech overlaps classification. In: INTERSPEECH 2017, pp. 2356–2360. ISCA, Stockholm, Sweden (2017)
Google Scholar
Etemadpour, R., Murray, P., Forbes, A.G.: Evaluating density-based motion for big data visual analytics. In: IEEE International Conference on Big Data, pp. 451–460. IEEE, Washington, USA (2014)
Google Scholar
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., et al.: The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Article Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838. ACM, Barcelona, Spain (2013)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Openearintroducing the munich open-source emotion and affect recognition toolkit. In: Proceedings of the 2009th ACII, pp. 1–6. IEEE, Amsterdam, Netherlands (Sept 2009)
Google Scholar
Forgas, J.P.: Feeling and doing: affective influences on interpersonal behavior. Psychol. Inq. 13, 1–28 (2002)
Article Google Scholar
Frommer, J., Rösner, D., Haase, M., Lange, J., Friesen, R., Otto, M.: Detection and Avoidance of Failures in Dialogues–Wizard of Oz Experiment Operator’s Manual. Pabst Science Publishers (2012)
Google Scholar
Gill, A., French, R.: Level of representation and semantic distance: Rating author personality from texts. In: Proceedings of the Second European Cognitive Science Conference. Taylor & Francis, Delphi, Greece (2007). s.p
Google Scholar
Glüge, S., Böck, R., Ott, T.: Emotion recognition from speech using representation learning in extreme learning machines. In: Proceedings of the 9th IJCCI, pp. 1–6. INSTICC, Funchal, Madeira, Portugal (2017)
Google Scholar
Glüge, S., Böck, R., Wendemuth, A.: Segmented-Memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In: Proceedings of the 3rd IJCCI, pp. 308–315. SCITEPRESS, Paris, France (2011)
Google Scholar
Goldberg, J.A.: Interrupting the discourse on interruptions: an analysis in terms of relationally neutral, power-and rapport-oriented acts. J. Pragmat. 14(6), 883–903 (1990)
Article Google Scholar
Goldberg, L.R.: The development of markers for the Big-five factor structure. J. Pers. Soc. Psychol. 59(6), 1216–1229 (1992)
Article Google Scholar
Gosztolya, G.: Optimized time series filters for detecting laughter and filler events. Proc. Interspeech 2017, 2376–2380 (2017)
Article Google Scholar
Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for spontaneous speech recognition. In: EUROSPEECH 1999, pp. 227–230. ISCA, Budapest, Hungary (1999)
Google Scholar
Gross, J.J., Carstensen, L.L., Pasupathi, M., Tsai, J., Skorpen, C.G., Hsu, A.Y.: Emotion and aging: experience, expression, and control. Psychol. Aging 12, 590–599 (1997)
Article Google Scholar
Hamacher, D., Hamacher, D., Müller, R., Schega, L., Zech, A.: Exploring phase dependent functional gait variability. Hum. Mov. Sci. 52(Supplement C), 191–196 (2017)
Article Google Scholar
Hamzah, R., Jamil, N., Seman, N., Ardi, N., Doraisamy, S.C.: Impact of acoustical voice activity detection on spontaneous filled pause classification. In: Proceedings of the IEEE ICOS-2014, pp. 1–6. IEEE, Subang, Malaysia (2014)
Google Scholar
Hattie, J.: Visible Learning. A Bradford Book, Routledge, London, UK (2009)
Google Scholar
Hölker, K.: Zur Analyse von Markern: Korrektur- und Schlußmarker des Französischen. Steiner, Stuttgart, Germany (1988)
Google Scholar
Hölker, K.: Französisch: Partikelforschung. Lexikon der Romanistischen Linguistik 5, 77–88 (1991)
Google Scholar
Honold, F., Bercher, P., Richter, F., Nothdurft, F., Geier, T., Barth, R., Hoernle, T., Schüssel, F., Reuter, S., Rau, M., Bertrand, G., Seegebarth, B., Kurzok, P., Schattenberg, B., Minker, W., Weber, M., Biundo-Stephan, S.: Companion-technology: towards user- and situation-adaptive functionality of technical systems. In: 2014 International Conference on Intelligent Environments, pp. 378–381. IEEE, Shanghai, China (2014)
Google Scholar
Honold, F., Schüssel, F., Weber, M.: The automated interplay of multimodal fission and fusion in adaptive HCI. In: 2014 International Conference on Intelligent Environments, pp. 170–177. IEEE, Shanghai, China (2014)
Google Scholar
Horowitz, L., Alden, L., Wiggins, J., Pincus, A.: Inventory of Interpersonal Problems Manual. The Psychological Corporation, Odessa, USA (2000)
Google Scholar
Hossain, M.S., Muhammad, G., Alhamid, M.F., Song, B., Al-Mutib, K.: Audio-visual emotion recognition using big data towards 5g. Mob. Netw. Appl. 21(5), 753–763 (2016)
Article Google Scholar
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man, Cybern Part B (Cybernetics) 42(2), 513–529 (2012)
Google Scholar
Huang, Y., Hu, M., Yu, X., Wang, T., Yang, C.: Transfer learning of deep neural network for speech emotion recognition. In: Pattern Recognition—Part 2, pp. 721–729. Springer, Singapore (2016)
Google Scholar
Huang, Z., Epps, J.: Detecting the instant of emotion change from speech using a martingale framework. In: 2016 IEEE International Conference on Acoustics. Speech and Signal Processing, pp. 5195–5199. IEEE, Shanghai, China (2016)
Google Scholar
Huang, Z., Epps, J., Ambikairajah, E.: An investigation of emotion change detection from speech. In: INTERSPEECH 2015, pp. 1329–1333. ISCA, Dresden, Germany (2015)
Google Scholar
Izard, C.E., Libero, D.Z., Putnam, P., Haynes, O.M.: Stability of emotion experiences and their relations to traits of personality. J. Person. Soc. Psychol. 64, 847–860 (1993)
Article Google Scholar
Jahnke, W., Erdmann, G., Kallus, K.: Stressverarbeitungsfragebogen mit SVF 120 und SVF 78, 3rd edn. Hogrefe, Göttingen, Germany (2002)
Google Scholar
Jiang, A., Yang, J., Yang, Y.: General Change Detection Explains the Early Emotion Effect in Implicit Speech Perception, pp. 66–74. Springer, Heidelberg, Germany (2013)
Chapter Google Scholar
Jucker, A.H., Ziv, Y.: Discourse Markers: Introduction, pp. 1–12. John Benjamins Publishing Company, Amsterdam, The Netherlands (1998)
Chapter Google Scholar
Kächele, M., Schels, M., Meudt, S., Kessler, V., Glodek, M., Thiam, P., Tschechne, S., Palm, G., Schwenker, F.: On annotation and evaluation of multi-modal corpora in affective human-computer interaction. In: Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, pp. 35–44. Springer, Cham (2015)
Google Scholar
Kindsvater, D., Meudt, S., Schwenker, F.: Fusion architectures for multimodal cognitive load recognition. In: Schwenker, F., Scherer, S. (eds.) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction, pp. 36–47. Springer, Cham (2017)
Chapter Google Scholar
Kohrs, C., Angenstein, N., Brechmann, A.: Delays in human-computer interaction and their effects on brain activity. PLOS One 11(1), 1–14 (2016)
Article Google Scholar
Kohrs, C., Hrabal, D., Angenstein, N., Brechmann, A.: Delayed system response times affect immediate physiology and the dynamics of subsequent button press behavior. Psychophysiology 51(11), 1178–1184 (2014)
Article Google Scholar
Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1972–1979. IEEE (2017)
Google Scholar
Krüger, J., Wahl, M., Frommer, J.: Making the system a relational partner: Users ascrip-tions in individualization-focused interactions with companion-systems. In: Proceedings of the 8th CENTRIC 2015, pp. 48–54. Barcelona, Spain (2015)
Google Scholar
Lange, J., Frommer, J.: Subjektives Erleben und intentionale Einstellung in Interviews zur Nutzer-Companion-Interaktion. In: proceedings der 41. GI-Jahrestagung, pp. 240–254. Bonner Köllen Verlag, Berlin, Germany (2011)
Google Scholar
Laukka, P., Neiberg, D., Forsell, M., Karlsson, I., Elenius, K.: Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput. Speech Lang. 25(1), 84–104 (2011)
Article Google Scholar
Lee, C.C., Lee, S., Narayanan, S.S.: An analysis of multimodal cues of interruption in dyadic spoken interactions. In: INTERSPEECH 2008, pp. 1678–1681. ISCA, Brisbane, Australia (2008)
Google Scholar
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Proc. 13(2), 293–303 (2005)
Article Google Scholar
Lefter, I., Jonker, C.M.: Aggression recognition using overlapping speech. In: Proceedings of the 2017th ACII, pp. 299–304 (2017)
Google Scholar
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4. IEEE, Jeju, South Korea (2016)
Google Scholar
Linville, S.E.: Vocal Aging. Singular Publishing Group, San Diego, USA (2001)
Google Scholar
Lotz, A.F., Siegert, I., Wendemuth, A.: Automatic differentiation of form-function-relations of the discourse particle "hm" in a naturalistic human-computer interaction. In: Proceedings of the 26. Konferenz Elektronische Sprachsignalverarbeitung. vol. 78, pp. 172–179. TUDpress, Eichstätt, Germany (2015)
Google Scholar
Lotz, A.F., Siegert, I., Wendemuth, A.: Classification of functional meanings of non-isolated discourse particles in human-human-interaction. In: Human-Computer Interaction. Theory, Design, Development and Practice, pp. 53–64. Springer (2016)
Google Scholar
Lotz, A.F., Siegert, I., Wendemuth, A.: Comparison of different modeling techniques for robust prototype matching of speech pitch-contours. Kognitive Syst. 2016(1) (2016). s.p
Google Scholar
Luengo, I., Navas, E., Hernáez, I.: Feature analysis and evaluation for automatic emotion identification in speech. IEEE Trans. Multimed. 12(6), 490–501 (2010)
Article Google Scholar
Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 30, 457–500 (2007)
Article MATH Google Scholar
Matsumoto, D., LeRoux, J., Wilson-Cohn, C., Raroque, J., Kooken, K., Ekman, P., Yrizarry, N., Loewinger, S., Uchida, H., Yee, A., Amo, L., Goh, A.: A new test to measure emotion recognition ability: matsumoto and ekman’s Japanese and caucasian brief affect recognition test (JACBART). J. Nonverbal Behav. 24(3), 179–209 (2000)
Article Google Scholar
Moattar, M., Homayounpour, M.: A review on speaker diarization systems and approaches. Speech Commun. 54(10), 1065–1103 (2012)
Article Google Scholar
Murino, V., Gong, S., Loy, C.C., Bazzani, L.: Image and video understanding in big data. Comput. Vis. Image Underst. 156, 1–3 (2017)
Article Google Scholar
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Article Google Scholar
Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., Poggi, I., Schroeder, M., Vinciarelli, A.: Social signal processing: the research agenda. In: Visual Analysis of Humans: Looking at People, pp. 511–538. Springer, London, UK (2011)
Chapter Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37(Supplement C), 98–125 (2017)
Article Google Scholar
Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: INTERSPEECH 2014, pp. 1816–1820. ISCA, Singapore (2014)
Google Scholar
Resseguier, B., Léger, P.M., Sénécal, S., Bastarache-Roberge, M.C., Courtemanche, F.: The influence of personality on users’ emotional reactions. In: Proceedings of Third International Conference on the HCI in Business, Government, and Organizations: Information Systems, pp. 91–98. Springer, Toronto, Canada (2016)
Chapter Google Scholar
Ringeval, F., Amiriparian, S., Eyben, F., Scherer, K., Schuller, B.: Emotion recognition in the wild: incorporating voice and lip activity in multimodal decision-level fusion. In: Proceedings of the 16th ICMI, pp. 473–480. ACM, Istanbul, Turkey (2014)
Google Scholar
Rösner, D., Frommer, J., Andrich, R., Friesen, R., Haase, M., Kunze, M., Lange, J., Otto, M.: Last minute: a novel corpus to support emotion, sentiment and social signal processing. In: Proceedings of the Eigth LREC, pp. 82–89. ELRA, Istanbul, Turkey (2012)
Google Scholar
Rösner, D., Haase, M., Bauer, T., Günther, S., Krüger, J., Frommer, J.: Desiderata for the design of companion systems. KI - Künstliche Intell. 30(1), 53–61 (2016)
Article Google Scholar
Rösner, D., Hazer-Rau, D., Kohrs, C., Bauer, T., Günther, S., Hoffmann, H., Zhang, L., Brechmann, A.: Is there a biological basis for success in human companion interaction? In: Proceedings of the 18th International Conference on Human-Computer Interaction, pp. 77–88. Springer, Toronto, Canada (2016)
Google Scholar
Sani, A., Lestari, D.P., Purwarianti, A.: Filled pause detection in indonesian spontaneous speech. In: Proceedings of the PACLING-2016, pp. 54–64. Springer, Bali, Indonesia (2016)
Google Scholar
Schels, M., Kächele, M., Glodek, M., Hrabal, D., Walter, S., Schwenker, F.: Using unlabeled data to improve classification of emotional states in human computer interaction. J. Multimodal User Interfaces 8(1), 5–16 (2014)
Article Google Scholar
Scherer, K.R.: Vocal affect expression: a review and a model for future research. Psychol. Bull. 99(2), 143 (1986)
Article Google Scholar
Schmidt, J.E.: Bausteine der Intonation. In: Neue Wege der Intonationsforschung, Germanistische Linguistik, vol. 157–158, pp. 9–32. Georg Olms Verlag (2001)
Google Scholar
Schneider, T.R., Rench, T.A., Lyons, J.B., Riffle, R.: The influence of neuroticism, extraversion and openness on stress responses. Stress Health: J. Int. Soc. Investig. Stress 28, 102–110 (2012)
Article Google Scholar
Schuller, B., Steidl, S., Batliner, A., Nöth, E., Vinciarelli, A., Burkhardt, F., Son, van, V., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The INTERSPEECH 2012 Speaker Trait Challenge. In: INTERSPEECH2012. ISCA, Portland, USA (2012). s.p
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, pp. 552–557. IEEE, Merano, Italy (2009)
Google Scholar
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)
Article Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: INTERSPEECH 2013. ISCA, Lyon, France (2013). s.p
Google Scholar
Schuller, B.W.: Speech analysis in the big data era. In: Proceedings of the 18th International Conference Text, Speech, and Dialogue, pp. 3–11. Springer, Plzen, Czech Republic (2015)
Chapter Google Scholar
Schulz von Thun, F.: Miteinander reden 1 - Störungen und Klärungen. Rowohlt, Reinbek, Germany (1981)
Google Scholar
Shahin, I.M.A.: Gender-dependent emotion recognition based on HMMs and SPHMMs. Int. J. Speech Technol. 16, 133–141 (2013)
Article Google Scholar
Shriberg, E., Stolcke, A., Baron, D.: Observations on overlap: findings and implications for automatic processing of multi-party conversation. In: INTERSPEECH, pp. 1359–1362 (2001)
Google Scholar
Sidorov, M., Brester, C., Minker, W., Semenkin, E.: Speech-based emotion recognition: feature selection by self-adaptive multi-criteria genetic algorithm. In: Proceedings of the Ninth LREC. ELRA, Reykjavik, Iceland (2014)
Google Scholar
Sidorov, M., Schmitt, A., Semenkin, E., Minker, W.: Could speaker, gender or age awareness be beneficial in speech-based emotion recognition? In: Proceedings of the Tenth LREC, pp. 61–68. ELRA, Portorož, Slovenia (2016)
Google Scholar
Siegert, I., Böck, R., Vlasenko, B., Ohnemus, K., Wendemuth, A.: Overlapping speech, utterance duration and affective content in HHI and HCI—an comparison. In: Proceedings of 6th Conference on Cognitive Infocommunications, pp. 83–88. IEEE, Györ, Hungary (2015)
Google Scholar
Siegert, I., Böck, R., Vlasenko, B., Wendemuth, A.: Exploring dataset similarities using PCA-based feature selection. In: Proceedings of the 2015th ACII, pp. 387–393. IEEE, Xi’an, China (2015)
Google Scholar
Siegert, I., Böck, R., Wendemuth, A.: Modeling users’ mood state to improve human-machine-interaction. In: Cognitive Behavioural Systems, pp. 273–279. Springer (2012)
Google Scholar
Siegert, I., Böck, R., Wendemuth, A.: Inter-Rater reliability for emotion annotation in human-computer interaction—comparison and methodological improvements. J. Multimodal User Interfaces 8, 17–28 (2014)
Article Google Scholar
Siegert, I., Böck, R., Wendemuth, A.: Using the PCA-based dataset similarity measure to improve cross-corpus emotion recogniton. Comput. Speech Lang. 1–12 (2018)
Google Scholar
Siegert, I., Hartmann, K., Philippou-Hübner, D., Wendemuth, A.: Human behaviour in HCI: complex emotion detection through sparse speech features. In: Human Behavior Understanding, Lecture Notes in Computer Science, vol. 8212, pp. 246–257. Springer (2013)
Google Scholar
Siegert, I., Krüger, J., Haase, M., Lotz, A.F., Günther, S., Frommer, J., Rösner, D., Wendemuth, A.: Discourse particles in human-human and human-computer interaction—analysis and evaluation. In: Proceedings of the 18th International Conference on Human-Computer Interaction, pp. 105–117. Springer, Toronto, Canada (2016)
Google Scholar
Siegert, I., Lotz, A.F., Duong, L.L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Proceedings of the 27. Konferenz Elektronische Sprachsignalverarbeitung, pp. 229–236 (2016)
Google Scholar
Siegert, I., Lotz, A.F., Egorow, O., Böck, R., Schega, L., Tornow, M., Thiers, A., Wendemuth, A.: Akustische Marker für eine verbesserte Situations- und Intentionserkennung von technischen Assistenzsystemen. In: Proceedings of the Zweite transdisziplinäre Konferenz. Technische Unterstützungssysteme, die die Menschen wirklich wollen, pp. 465–474. University Hamburg, Hamburg, Germany (2016)
Google Scholar
Siegert, I., Philippou-Hübner, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigation of speaker group-dependent modelling for recognition of affective states from speech. Cogn. Comput. 6(4), 892–913 (2014)
Article Google Scholar
Siegert, I., Philippou-Hübner, D., Tornow, M., Heinemann, R., Wendemuth, A., Ohnemus, K., Fischer, S., Schreiber, G.: Ein Datenset zur Untersuchung emotionaler Sprache in Kundenbindungsdialogen. In: Proceedings of the 26. Konferenz Elektronische Sprachsignalverarbeitung, pp. 180–187. TUDpress, Eichstätt, Germany (2015)
Google Scholar
Siegert, I., Prylipko, D., Hartmann, K., Böck, R., Wendemuth, A.: Investigating the form-function-relation of the discourse particle "hm" in a naturalistic human-computer interaction. In: Recent Advances of Neural Network Models and Applications. Smart Innovation, Systems and Technologies, vol. 26, pp. 387–394. Springer, Berlin (2014)
Chapter Google Scholar
Song, P., Jin, Y., Zhao, L., Xin, M.: Speech emotion recognition using transfer learning. IEICE Trans. Inf. Syst. E97.D(9), 2530–2532 (2014)
Article Google Scholar
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, H.G., Schuller, B.W.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of the ICASSP, pp. 5688–5691. IEEE (2011)
Google Scholar
Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 16–28 (2016)
Article Google Scholar
Tamir, M.: Differential preferences for happiness: extraversion and trait-consistent emotion regulation. J. Pers. 77, 447–470 (2009)
Article Google Scholar
Terracciano, A., Merritt, M., Zonderman, A.B., Evans, M.K.: Personality traits and sex differences in emotions recognition among african americans and caucasians. Ann. New York Acad. Sci. 1000, 309–312 (2003)
Article Google Scholar
Thiam, P., Meudt, S., Kächele, M., Palm, G., Schwenker, F.: Detection of emotional events utilizing support vector methods in an active learning HCI scenario. In: Proceedings of the 2014 Workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems, pp. 31–36. ACM, Istanbul, Turkey (2014)
Google Scholar
Thiam, P., Meudt, S., Schwenker, F., Palm, G.: Active Learning for Speech Event Detection in HCI. In: Proceedings of the 7th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition, pp. 285–297. Springer, Ulm, Germany (2016)
Chapter Google Scholar
Thiers, A., Hamacher, D., Tornow, M., Heinemann, R., Siegert, I., Wendemuth, A., Schega, L.: Kennzeichnung von Nutzerprofilen zur Interaktionssteuerung beim Gehen. In: Proceedings of the Zweite transdisziplinäre Konferenz. Technische Unterstützungssysteme, die die Menschen wirklich wollen, pp. 475–484. University Hamburg, Hamburg, Germany (2016)
Google Scholar
Tighe, H.: Emotion recognition and personality traits: a pilot study. Summer Res. (2012). s.p
Google Scholar
Tornow, M., Krippl, M., Bade, S., Thiers, A., Siegert, I., Handrich, S., Krüger, J., Schega, L., Wendemuth, A.: Integrated health and fitness (iGF)-Corpus - ten-Modal highly synchronized subject dispositional and emotional human machine interactions. In: Proceedings of Multimodal Corpora: Computer vision and language processing, pp. 21–24. ELRA, Portorož, Slovenia (2016)
Google Scholar
Uzair, M., Shafait, F., Ghanem, B., Mian, A.: Representation learning with deep extreme learning machines for efficient image set classification. Neural Comput. Appl. pp. 1–13 (2016)
Google Scholar
Valente, F., Kim, S., Motlicek, P.: Annotation and recognition of personality traits in spoken conversations from the ami meetings corpus. In: INTERSPEECH 2012. ISCA, Portland, USA (2012). s.p
Google Scholar
Valli, A.: The design of natural interaction. Multimed. Tools Appl. 38(3), 295–305 (2008)
Article Google Scholar
van der Veer, G.C., Tauber, M.J., Waem, Y., van Muylwijk, B.: On the interaction between system and user characteristics. Behav. Inf. Technol. 4, 289–308 (1985)
Google Scholar
Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in russian spontaneous speech. In: Proceedings of the SPECOM-2015, pp. 285–292. Springer, Athens, Greece (2015)
Chapter Google Scholar
Vinciarelli, A., Esposito, A., Andre, E., Bonin, F., Chetouani, M., Cohn, J.F., Cristani, M., Fuhrmann, F., Gilmartin, E., Hammal, Z., Heylen, D., Kaiser, R., Koutsombogera, M., Potamianos, A., Renals, S., Riccardi, G., Salah, A.A.: Open challenges in modelling, analysis and synthesis of human behaviour in human-human and human-machine interactions. Cogn. Comput. 7(4), 397–413 (2015)
Article Google Scholar
Vinciarelli, A., Pantic, M., Boulard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 12(27), 1743–1759 (2009)
Article Google Scholar
Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal emotions. In: Proceedings of the ICME. IEEE, Barcelona, Spain (2011). s.p
Google Scholar
Vlasenko, B., Prylipko, D., Böck, R., Wendemuth, A.: Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput. Speech Lang. 28(2), 483–500 (2014)
Article Google Scholar
Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of the ICME, pp. 474–477. IEEE, Amsterdam, The Netherlands (2005)
Google Scholar
Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proceedings of the Fiveth LREC. ELRA, Genoa, Italy (2006). s.p
Google Scholar
Walter, S., Kim, J., Hrabal, D., Crawcour, S.C., Kessler, H., Traue, H.C.: Transsituational individual-specific biopsychological classification of emotions. IEEE Trans. Syst. Man Cybern.: Syst. 43(4), 988–995 (2013)
Article Google Scholar
Walter, S., Scherer, S., Schels, M., Glodek, M., Hrabal, D., Schmidt, M., Böck, R., Limbrecht, K., Traue, H., Schwenker, F.: Multimodal emotion classification in naturalistic user behavior. In: Human-Computer Interaction. Towards Mobile and Intelligent Interaction Environments, pp. 603–611. Springer (2011)
Google Scholar
Watzlawick, P., Beavin, J.H., Jackson, D.D.: Menschliche Kommunikation: Formen, Störungen, Paradoxien. Verlag Hans Huber, Bern, Switzerland (2007)
Google Scholar
Weinberg, G.M.: The Psychology of Computer Programming. Van Nostrand Reinhold, New York, USA (1971)
Google Scholar
Weißkirchen, N., Böck, R., Wendemuth, A.: Recognition of emotional speech with convolutional neural networks by means of spectral estimates. In: Proceedings of the 2017th ACII, pp. 1–6. IEEE, San Antonio, USA (2017)
Google Scholar
White, S.: Backchannels across cultures: a study of americans and japanese. Lang. Soc. 18(1), 59–76 (1989)
Article Google Scholar
Wilks, Y.: Artificial companions. Interdiscip. Sci. Rev. 30(2), 145–152 (2005)
Article Google Scholar
Wolff, S., Brechmann, A.: Carrot and stick 2.0: the benefits of natural and motivational prosody in computer-assisted learning. Comput. Hum. Behav. 43(Supplement C), 76–84 (2015)
Article Google Scholar
Yang, L.C.: Visualizing spoken discourse: prosodic form and discourse functions of interruptions. In: Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, pp. 1–10. Association for Computational Linguistics, Aalborg, Denmark (2001)
Google Scholar

Download references

Acknowledgements

We acknowledge support by the project “Intention-based Anticipatory Interactive Systems” (IAIS) funded by the European Funds for Regional Development (EFRE) and by the Federal State of Sachsen-Anhalt, Germany, under the grant number ZS/2017/10/88785. Further, we thank the projects “Mova3D” (grant number: 03ZZ0431H) and “Mod3D” (grant number: 03ZZ0414) funded by 3Dsensation within the Zwanzig20 funding program by the German Federal Ministry of Education and Research (BMBF). Moreover, the project has received funding from the European Union’s Horizon 2020 research and innovation programme under the ADAS and ME consortium, grant agreement No 688900.

Author information

Authors and Affiliations

Cognitive Systems Group and Center for Behavioral Brain Sciences, Otto von Guericke University Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Ronald Böck & Andreas Wendemuth
Cognitive Systems Group, Otto von Guericke University Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Olga Egorow, Juliane Höbel-Müller, Alicia Flores Requardt & Ingo Siegert

Authors

Ronald Böck
View author publications
You can also search for this author in PubMed Google Scholar
Olga Egorow
View author publications
You can also search for this author in PubMed Google Scholar
Juliane Höbel-Müller
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Flores Requardt
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronald Böck .

Editor information

Editors and Affiliations

Dipartimento di Psicologia and International Institute for Advanced Scientific Studies (IIASS), Università degli Studi della Campania “Luigi Vanvitelli”, Caserta, Italy
Anna Esposito
Sezione di Napoli, Osservatorio Vesuviano, Istituto Nazionale di Geofisica e Vulcanologia, Napoli, Italy
Antonietta M. Esposito
University of Canberra, Canberra, ACT, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Böck, R., Egorow, O., Höbel-Müller, J., Requardt, A.F., Siegert, I., Wendemuth, A. (2019). Anticipating the User: Acoustic Disposition Recognition in Intelligent Interactions. In: Esposito, A., Esposito, A., Jain, L. (eds) Innovations in Big Data Mining and Embedded Knowledge. Intelligent Systems Reference Library, vol 159. Springer, Cham. https://doi.org/10.1007/978-3-030-15939-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-15939-9_11
Published: 03 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15938-2
Online ISBN: 978-3-030-15939-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics