Multimodal Human Computer Interaction: A Survey

Jaimes, Alejandro; Sebe, Nicu

doi:10.1007/11573425_1

Alejandro Jaimes¹⁹ &
Nicu Sebe²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3766))

Included in the following conference series:

International Workshop on Human-Computer Interaction

1975 Accesses
40 Citations
6 Altmetric

Abstract

In this paper we review the major approaches to multimodal human computer interaction from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. CVIU 73(3), 428–440 (1999)
Google Scholar
Application of Affective Computing in Human-computer Interaction. Int. J. of Human-Computer Studies 59(1-2) (2003)
Google Scholar
Ben-Arie, J., Wang, Z., Pandit, P., Rajaram, S.: Human activity recognition using multidimensional indexing. IEEE Trans. On PAMI 24(8), 1091–1104 (2002)
Google Scholar
Benali-Khoudja, M., Hafez, M., Alexandre, J.-M., Kheddar, A.: Tactile interfaces: a state-of-the-art survey. In: Int. Symposium on Robotics (2004)
Google Scholar
Bobick, A.F., Davis, J.: The recognition of human movement using temporal templates. IEEE Trans. on PAMI 23(3), 257–267 (2001)
Google Scholar
Brewster, S.A., Lumsden, J., Bell, M., Hall, M., Tasker, S.: Multimodal ’Eyes-Free’ Interaction Techniques for Wearable Devices. In: Proc. ACM CHI 2003 (2003)
Google Scholar
Campbell, C.S., Maglio, P.P.: A Robust Algorithm for Reading Detection. In: ACM Workshop on Perceptive User Interfaces (2001)
Google Scholar
Cohen, P.R., McGee, D.R.: Tangible Multimodal Interfaces for Safety-critical Applications. Communications of the ACM 47(1), 41–46 (2004)
Article Google Scholar
Cohen, I., Sebe, N., Cozman, F., Cirelo, M., Huang, T.S.: Semi-supervised learning of classifiers: Theory, algorithms, and their applications to human-computer interaction. IEEE Trans. on PAMI 22(12), 1553–1567 (2004)
Google Scholar
Cohen, I., Sebe, N., Garg, A., Chen, L., Huang, T.S.: Facial expression recognition from video sequences: Temporal and static modeling. CVIU 91(1-2), 160–187 (2003)
Google Scholar
Chen, L.S.: Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction, PhD thesis, UIUC (2000)
Google Scholar
Duchowski, A.T.: A Breadth-First Survey of Eye Tracking Applications. Behavior Research Methods, Instruments, and Computing 34(4), 455–470 (2002)
Article Google Scholar
Dickie, C., Vertegaal, R., Fono, D., Sohn, C., Chen, D., Cheng, D., Shell, J.S., Aoudeh, O.: Augmenting and Sharing Memory with eyeBlog. In: CARPE 2004 (2004)
Google Scholar
Duric, Z., Gray, W., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M., Schunn, C., Wechsler, H.: Integrating perceptual and cognitive modeling for adaptive and intelligent human- computer interaction. Proc. of the IEEE 90(7), 1272–1289 (2002)
Article Google Scholar
Ekman, P. (ed.): Emotion in the Human Face. Cambridge University Press, Cambridge (1982)
Google Scholar
Fagiani, C., Betke, M., Gips, J.: Evaluation of tracking methods for human-computer interaction. In: IEEE Workshop on Applications in Computer Vision (2002)
Google Scholar
Fasel, B., Luettin, J.: Automatic facial expression analysis: A survey. Patt. Recogn. 36, 259–275 (2003)
Article MATH Google Scholar
Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robotics and Autonomous Systems 42(3-4), 143–166 (2003)
Article MATH Google Scholar
Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., Kramer, A.: Gestures over video streams to support remote collaboration on physical tasks. Human-Computer Int. 19(3), 273–309 (2004)
Article Google Scholar
Garg, A., Naphade, M., Huang, T.S.: Modeling video using input/output Markov models with application to multi-modal event detection, Handbook of Video Databases: Design and Applications (2003)
Google Scholar
Garg, I., Pavlovic, V., Rehg, J.: Boosted learning in dynamic Bayesian networks for multimodal speaker detection. Proceedings of the IEEE 91(9), 1355–1369 (2003)
Article Google Scholar
Gavrila, D.M.: The Visual Analysis of Human Movement: A Survey. CVIU 73(1), 82–98 (1999)
MATH Google Scholar
Hanjalic, A., Xu, L.-Q.: Affective video content representation and modeling. IEEE Trans. on Multimedia 7(1), 143–154 (2005)
Article Google Scholar
Hakeem, A., Shah, M.: Ontology and taxonomy collaborated framework for meeting classification ICPR. (2004)
Google Scholar
Heishman, R., Duric, Z., Wechsler, H.: Using eye region biometrics to reveal affective and cognitive states. In: CVPR Workshop on Face Processing in Video (2004)
Google Scholar
Hjelmas, E., Low, B.K.: Face detection: A survey. CVIU 83, 236–274 (2001)
MATH Google Scholar
Hu, W., Tan, T., Wang, L., Maybank, S.: A Survey on Visual Surveillance of Object Motion and Behaviors. IEEE Trans. On Systems, Man, and Cybernetics 34(3) (2004)
Google Scholar
Intille, S., Larson, K., Beaudin, J., Nawyn, J., Tapia, E., Kaushik, P.: A living laboratory for the design and evaluation of ubiquitous computing technologies, In: Conf. on Human Factors in Computing Systems (2004)
Google Scholar
Jaimes, A., Liu, J.: Hotspot Components for Gesture-Based Interaction. In: proc. IFIP Interact 2005, Rome, Italy (September 2005)
Google Scholar
El Kaliouby, R., Robinson, P.: Real time inference of complex mental states from facial expressions and head gestures. In: CVPR Workshop on Real-time Vision for HCI (2004)
Google Scholar
Kettebekov, S., Sharma, R.: Understanding gestures in multimodal human computer interaction. Int. J. on Artificial Intelligence Tools 9(2), 205–223 (2000)
Article Google Scholar
Kirishima, T., Sato, K., Chihara, K.: Real-time gesture recognition by learning and selective control of visual interest points. IEEE Trans. on PAMI 27(3), 351–364 (2005)
Google Scholar
Kisacanin, T., Pavlovic, V., Huang, T.S. (eds.): Real-Time Vision for Human-Computer Interaction. Springer, New York (2005)
Google Scholar
Kuno, Y., Shimada, N., Shirai, Y.: Look where you’re going: A robotic wheelchair based on the integration of human and environmental observations. IEEE Robotics and Automation 10(1), 26–34 (2003)
Article Google Scholar
Lang, P.: The emotion probe: Studies of motivation and attention. American Psychologist 50(5), 372–385 (1995)
Article Google Scholar
Legin, A., Rudnitskaya, A., Seleznev, B., Vlasov, Y.: Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Analytica Chimica Acta, 534, 129–135 (2005)
Article Google Scholar
Lyons, M.J., Haehnel, M., Tetsutani, N.: Designing, playing, and performing, with a vision-based mouth Interface. In: Conf. on New Interfaces for Musical Expression (2003)
Google Scholar
Marcel, S.: Gestures for multi-modal interfaces: A Review, Technical Report IDIAP-RR 02-34 (2002)
Google Scholar
Maynes-Aminzade, D., Pausch, R., Seitz, S.: Techniques for interactive audience participation. In: ICMI 2002 (2002)
Google Scholar
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. on PAMI 27(3), 305–317 (2005)
Google Scholar
McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. Univ. of Chicago Press, Chicago (1992)
Google Scholar
Mehrabian, A.: Communication without words. Psychology Today 2(4), 53–56 (1968)
Google Scholar
Meyer, S., Rakotonirainy, A.: A Survey of research on context-aware homes, Australasian Information Security Workshop Conference on ACSW Frontiers (2003)
Google Scholar
Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. CVIU 81(3), 231–258 (2001)
MATH Google Scholar
Nielsen, J.: Non-command user interfaces. Comm. of the ACM 36(4), 83–99 (1993)
Article Google Scholar
Oudeyer, P.Y.: The production and recognition of emotions in speech: Features and algorithms. Int. J. of Human-Computer Studies 59(1-2), 157–183 (2003)
Article Google Scholar
Oulasvirta, A., Salovaara, A.: A cognitive meta-analysis of design approaches to interruptions in intelligent environments. In: Proceedings of ACM Conference on Human Factors in Computing Systems, CHI 2004 (2004) (Extended Abstracts)
Google Scholar
Qvarfordt, P., Zhai, S.: Conversing with the user based on eye-gaze patterns. In: Conf. Human-Factors in Computing Syst. (2005)
Google Scholar
Oviatt, S., Darrell, T., Flickner, M.: Multimodal Interfaces that Flex, Adapt, and Persist. Communications of the ACM 47 (1) (2004), special issue
Google Scholar
Oviatt, S.L., Cohen, P.: Multimodal interfaces that process what comes naturally. Comm. of the ACM 43(3), 45–48 (2000)
Article Google Scholar
Oviat, S.L.: Multimodal interfaces. In: Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, ch. 14, pp. 286–304 (2003)
Google Scholar
Oviatt, S.L., Cohen, P., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., Ferro, D.: Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions. Human-Computer Int. 15, 263–322 (2000)
Article Google Scholar
Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. CVPR 2, 407–412 (1999)
Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. on PAMI 22(12), 1424–1445 (2000)
Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
Article Google Scholar
Paradiso, J., Sparacino, F.: Optical Tracking for Music and Dance Performance. In: Gruen, A., Kahmen, H. (eds.) Optical 3-D Measurement Techniques IV, pp. 11–18 (1997)
Google Scholar
Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human- computer interaction: a review. IEEE Trans. on PAMI 19(7), 677–695 (1997)
Google Scholar
Pelz, J.B.: Portable eye-tracking in natural behavior. J. of Vision 4(11) (2004)
Google Scholar
Pentland, A.: Looking at People. Comm. of the ACM 43(3), 35–44 (2000)
Article Google Scholar
Pentland, A.: Socially Aware Computation and Communication. IEEE Computer 38(3) (2005)
Google Scholar
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Google Scholar
Porta, M.: Vision-based user interfaces: methods and applications. Int. J. Human-Computer Studies 57(1), 27–73 (2002)
Article Google Scholar
Reeves, L.M., et al.: Guidelines for multimodal user interface design. Communications of the ACM 47(1), 57–69 (2004)
Article Google Scholar
Rosales, R., Sclaroff, S.: Learning body pose via specialized maps. NIPS 14, 1263–1270 (2001)
Google Scholar
Roth, P., Pun, T.: Design and evaluation of a multimodal system for the non-visual exploration of digital pictures. In: INTERACT 2003 (2003)
Google Scholar
Ruddaraju, R., Haro, A., Nagel, K., Tran, Q., Essa, I., Abowd, G., Mynat, E.: Perceptual user interfaces using vision-based eye tracking. ICMI (2003)
Google Scholar
Santella, A., DeCarlo, D.: Robust clustering of eye movement recordings for quantification of visual interest. Eye Tracking Research and Applications (ETRA), 27–34 (2004)
Google Scholar
Sebe, N., Cohen, I., Huang, T.S.: Multimodal emotion recognition, Handbook of Pattern Recognition and Computer Vision. World Scientific, Singapore (2005)
Google Scholar
Schapira, E., Sharma, R.: Experimental evaluation of vision and speech based multimodal interfaces. In: Workshop on Perceptive User Interfaces, pp. 1–9 (2001)
Google Scholar
Schuller, B., Lang, M., Rigoll, G.: Multimodal emotion recognition in audiovisual communication. In: ICME (2002)
Google Scholar
Selker, T.: Visual Attentive Interfaces. BT Technology Journal 22(4), 146–150 (2004)
Article Google Scholar
Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, I., Cai, G., Brewer, I., MacEachren, A., Sengupta, K.: Speech–gesture driven multimodal interfaces for crisis management. Proceedings of the IEEE 91(9), 1327–1354 (2003)
Article Google Scholar
Sibert, L.E., Jacob, R.J.K.: Evaluation of eye gaze interaction. In: Conf. Human-Factors in Computing Syst., pp. 281–288 (2000)
Google Scholar
Smith, P., Shah, M., Lobo, N.d.V.: Determining driver visual zttention with one camera. IEEE Trans. on Intelligent Transportation Systems 4(4) (2003)
Google Scholar
Simpson, R., LoPresti, E., Hayashi, S., Nourbakhsh, I., Miller, D.: The smart wheelchair component system. J. of Rehabilitation Research and Development (May/June 2004)
Google Scholar
Sparacino, F.: The museum wearable: Real-time sensor-driven understanding of visitors. interests for personalized visually-augmented museum experiences. Museums and the Web (2002)
Google Scholar
Trivedi, M.M., Cheng, S.Y., Childers, E.M.C., Krotosky, S.J.: Occupant posture analysis with stereo and thermal infrared video: Algorithms and experimental evaluation. IEEE Trans. on Vehicular Technology 53(6), 1698–1712 (2004)
Article Google Scholar
Turk, M.: Gesture recognition. In: Stanney, K. (ed.) Handbook of Virtual Environment Technology (2001)
Google Scholar
Turk, M.: Computer vision in the interface. Communications of the ACM 47(1), 60–67 (2004)
Article Google Scholar
Turk, M., Robertson, G.: Perceptual Interfaces. Communications of the ACM 43(3), 32–34 (2000)
Article Google Scholar
Turk, M., Kölsch, M.: Perceptual Interfaces. In: Medioni, G., Kang, S.B. (eds.) Emerging Topics in Computer Vision, Prentice Hall, Englewood Cliffs (2004)
Google Scholar
Wang, J.-G., Sung, E., Venkateswarlu, R.: Eye gaze estimation from a single image of one eye. In: ICCV, pp. 136–143 (2003)
Google Scholar
Wang, L., Hu, W., Tan, T.: Recent developments in human motion analysis. Patt. Recogn. 36, 585–601 (2003)
Article Google Scholar
Wang, J.J.L., Singh, S.: Video analysis of human dynamics – A survey. Real-Time Imaging 9(5), 321–346 (2003)
Article Google Scholar
Wassermann, K.C., Eng, K., Verschure, P.F.M.J., Manzolli, J.: Live soundscape composition based on synthetic emotions. IEEE Multimedia Magazine 10(4) (2003)
Google Scholar
Wu, Y., Huang, T.: Vision-based gesture recognition: A review. In: 3rd Gesture Workshop (1999)
Google Scholar
Wu, Y., Hua, G., Yu, T.: Tracking articulated body by dynamic Markov network. In: ICCV, pp. 1094–1101 (2003)
Google Scholar
Yang, M.-H., Kriegman, D., Ahuja, N.: Detecting faces in images: A survey. IEEE Trans. on PAMI 24(1), 34–58 (2002)
Google Scholar
Yuan, Q., Sclaroff, S., Athitsos, V.: Automatic 2D hand tracking in video sequences. In: IEEE Workshop on Applications of Computer Vision (2005)
Google Scholar
Yu, C., Ballard, D.H.: A multimodal learning interface for grounding spoken language in sensorimotor experience. ACM Trans. on Applied Perception (2004)
Google Scholar
Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, J.: Face recognition: A literature survey. ACM Computing Surveys 12, 399–458 (2003)
Article Google Scholar
Salen, K., Zimmerman, E.: Rules of Play: Game Design Fundamentals. MIT Press, Cambridge (2003)
Google Scholar
Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T.S., Roth, D., Levinson, S.: Bimodal HCI-related affect recognition. In: ICMI (2004)
Google Scholar
Wu, Y., Huang, T.S.: Human hand modeling, analysis and animation in the context of human computer interaction. IEEE Signal Processing 18(3), 51–60 (2001)
Article Google Scholar
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion. J. of the Acoustic Society of America 93(2), 1097–1108 (1993)
Article Google Scholar

Download references

Author information

Authors and Affiliations

FXPAL, Fuji Xerox Co., Ltd, Japan
Alejandro Jaimes
University of Amsterdam, The Netherlands
Nicu Sebe

Authors

Alejandro Jaimes
View author publications
You can also search for this author in PubMed Google Scholar
Nicu Sebe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Lab Amsterdam, University of Amsterdam, The Netherlands
Nicu Sebe
LIACS Media Lab, Leiden University,
Michael Lew
Beckman Institute, University of Illinois at Urbana-Champaign, USA
Thomas S. Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaimes, A., Sebe, N. (2005). Multimodal Human Computer Interaction: A Survey. In: Sebe, N., Lew, M., Huang, T.S. (eds) Computer Vision in Human-Computer Interaction. HCI 2005. Lecture Notes in Computer Science, vol 3766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573425_1

Download citation

DOI: https://doi.org/10.1007/11573425_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29620-1
Online ISBN: 978-3-540-32129-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics