Skip to main content

Multimodal Human Computer Interaction: A Survey

  • Conference paper
Computer Vision in Human-Computer Interaction (HCI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3766))

Included in the following conference series:

Abstract

In this paper we review the major approaches to multimodal human computer interaction from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. CVIU 73(3), 428–440 (1999)

    Google Scholar 

  2. Application of Affective Computing in Human-computer Interaction. Int. J. of Human-Computer Studies 59(1-2) (2003)

    Google Scholar 

  3. Ben-Arie, J., Wang, Z., Pandit, P., Rajaram, S.: Human activity recognition using multidimensional indexing. IEEE Trans. On PAMI 24(8), 1091–1104 (2002)

    Google Scholar 

  4. Benali-Khoudja, M., Hafez, M., Alexandre, J.-M., Kheddar, A.: Tactile interfaces: a state-of-the-art survey. In: Int. Symposium on Robotics (2004)

    Google Scholar 

  5. Bobick, A.F., Davis, J.: The recognition of human movement using temporal templates. IEEE Trans. on PAMI 23(3), 257–267 (2001)

    Google Scholar 

  6. Brewster, S.A., Lumsden, J., Bell, M., Hall, M., Tasker, S.: Multimodal ’Eyes-Free’ Interaction Techniques for Wearable Devices. In: Proc. ACM CHI 2003 (2003)

    Google Scholar 

  7. Campbell, C.S., Maglio, P.P.: A Robust Algorithm for Reading Detection. In: ACM Workshop on Perceptive User Interfaces (2001)

    Google Scholar 

  8. Cohen, P.R., McGee, D.R.: Tangible Multimodal Interfaces for Safety-critical Applications. Communications of the ACM 47(1), 41–46 (2004)

    Article  Google Scholar 

  9. Cohen, I., Sebe, N., Cozman, F., Cirelo, M., Huang, T.S.: Semi-supervised learning of classifiers: Theory, algorithms, and their applications to human-computer interaction. IEEE Trans. on PAMI 22(12), 1553–1567 (2004)

    Google Scholar 

  10. Cohen, I., Sebe, N., Garg, A., Chen, L., Huang, T.S.: Facial expression recognition from video sequences: Temporal and static modeling. CVIU 91(1-2), 160–187 (2003)

    Google Scholar 

  11. Chen, L.S.: Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction, PhD thesis, UIUC (2000)

    Google Scholar 

  12. Duchowski, A.T.: A Breadth-First Survey of Eye Tracking Applications. Behavior Research Methods, Instruments, and Computing 34(4), 455–470 (2002)

    Article  Google Scholar 

  13. Dickie, C., Vertegaal, R., Fono, D., Sohn, C., Chen, D., Cheng, D., Shell, J.S., Aoudeh, O.: Augmenting and Sharing Memory with eyeBlog. In: CARPE 2004 (2004)

    Google Scholar 

  14. Duric, Z., Gray, W., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M., Schunn, C., Wechsler, H.: Integrating perceptual and cognitive modeling for adaptive and intelligent human- computer interaction. Proc. of the IEEE 90(7), 1272–1289 (2002)

    Article  Google Scholar 

  15. Ekman, P. (ed.): Emotion in the Human Face. Cambridge University Press, Cambridge (1982)

    Google Scholar 

  16. Fagiani, C., Betke, M., Gips, J.: Evaluation of tracking methods for human-computer interaction. In: IEEE Workshop on Applications in Computer Vision (2002)

    Google Scholar 

  17. Fasel, B., Luettin, J.: Automatic facial expression analysis: A survey. Patt. Recogn. 36, 259–275 (2003)

    Article  MATH  Google Scholar 

  18. Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robotics and Autonomous Systems 42(3-4), 143–166 (2003)

    Article  MATH  Google Scholar 

  19. Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., Kramer, A.: Gestures over video streams to support remote collaboration on physical tasks. Human-Computer Int. 19(3), 273–309 (2004)

    Article  Google Scholar 

  20. Garg, A., Naphade, M., Huang, T.S.: Modeling video using input/output Markov models with application to multi-modal event detection, Handbook of Video Databases: Design and Applications (2003)

    Google Scholar 

  21. Garg, I., Pavlovic, V., Rehg, J.: Boosted learning in dynamic Bayesian networks for multimodal speaker detection. Proceedings of the IEEE 91(9), 1355–1369 (2003)

    Article  Google Scholar 

  22. Gavrila, D.M.: The Visual Analysis of Human Movement: A Survey. CVIU 73(1), 82–98 (1999)

    MATH  Google Scholar 

  23. Hanjalic, A., Xu, L.-Q.: Affective video content representation and modeling. IEEE Trans. on Multimedia 7(1), 143–154 (2005)

    Article  Google Scholar 

  24. Hakeem, A., Shah, M.: Ontology and taxonomy collaborated framework for meeting classification ICPR. (2004)

    Google Scholar 

  25. Heishman, R., Duric, Z., Wechsler, H.: Using eye region biometrics to reveal affective and cognitive states. In: CVPR Workshop on Face Processing in Video (2004)

    Google Scholar 

  26. Hjelmas, E., Low, B.K.: Face detection: A survey. CVIU 83, 236–274 (2001)

    MATH  Google Scholar 

  27. Hu, W., Tan, T., Wang, L., Maybank, S.: A Survey on Visual Surveillance of Object Motion and Behaviors. IEEE Trans. On Systems, Man, and Cybernetics 34(3) (2004)

    Google Scholar 

  28. Intille, S., Larson, K., Beaudin, J., Nawyn, J., Tapia, E., Kaushik, P.: A living laboratory for the design and evaluation of ubiquitous computing technologies, In: Conf. on Human Factors in Computing Systems (2004)

    Google Scholar 

  29. Jaimes, A., Liu, J.: Hotspot Components for Gesture-Based Interaction. In: proc. IFIP Interact 2005, Rome, Italy (September 2005)

    Google Scholar 

  30. El Kaliouby, R., Robinson, P.: Real time inference of complex mental states from facial expressions and head gestures. In: CVPR Workshop on Real-time Vision for HCI (2004)

    Google Scholar 

  31. Kettebekov, S., Sharma, R.: Understanding gestures in multimodal human computer interaction. Int. J. on Artificial Intelligence Tools 9(2), 205–223 (2000)

    Article  Google Scholar 

  32. Kirishima, T., Sato, K., Chihara, K.: Real-time gesture recognition by learning and selective control of visual interest points. IEEE Trans. on PAMI 27(3), 351–364 (2005)

    Google Scholar 

  33. Kisacanin, T., Pavlovic, V., Huang, T.S. (eds.): Real-Time Vision for Human-Computer Interaction. Springer, New York (2005)

    Google Scholar 

  34. Kuno, Y., Shimada, N., Shirai, Y.: Look where you’re going: A robotic wheelchair based on the integration of human and environmental observations. IEEE Robotics and Automation 10(1), 26–34 (2003)

    Article  Google Scholar 

  35. Lang, P.: The emotion probe: Studies of motivation and attention. American Psychologist 50(5), 372–385 (1995)

    Article  Google Scholar 

  36. Legin, A., Rudnitskaya, A., Seleznev, B., Vlasov, Y.: Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Analytica Chimica Acta, 534, 129–135 (2005)

    Article  Google Scholar 

  37. Lyons, M.J., Haehnel, M., Tetsutani, N.: Designing, playing, and performing, with a vision-based mouth Interface. In: Conf. on New Interfaces for Musical Expression (2003)

    Google Scholar 

  38. Marcel, S.: Gestures for multi-modal interfaces: A Review, Technical Report IDIAP-RR 02-34 (2002)

    Google Scholar 

  39. Maynes-Aminzade, D., Pausch, R., Seitz, S.: Techniques for interactive audience participation. In: ICMI 2002 (2002)

    Google Scholar 

  40. McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. on PAMI 27(3), 305–317 (2005)

    Google Scholar 

  41. McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. Univ. of Chicago Press, Chicago (1992)

    Google Scholar 

  42. Mehrabian, A.: Communication without words. Psychology Today 2(4), 53–56 (1968)

    Google Scholar 

  43. Meyer, S., Rakotonirainy, A.: A Survey of research on context-aware homes, Australasian Information Security Workshop Conference on ACSW Frontiers (2003)

    Google Scholar 

  44. Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. CVIU 81(3), 231–258 (2001)

    MATH  Google Scholar 

  45. Nielsen, J.: Non-command user interfaces. Comm. of the ACM 36(4), 83–99 (1993)

    Article  Google Scholar 

  46. Oudeyer, P.Y.: The production and recognition of emotions in speech: Features and algorithms. Int. J. of Human-Computer Studies 59(1-2), 157–183 (2003)

    Article  Google Scholar 

  47. Oulasvirta, A., Salovaara, A.: A cognitive meta-analysis of design approaches to interruptions in intelligent environments. In: Proceedings of ACM Conference on Human Factors in Computing Systems, CHI 2004 (2004) (Extended Abstracts)

    Google Scholar 

  48. Qvarfordt, P., Zhai, S.: Conversing with the user based on eye-gaze patterns. In: Conf. Human-Factors in Computing Syst. (2005)

    Google Scholar 

  49. Oviatt, S., Darrell, T., Flickner, M.: Multimodal Interfaces that Flex, Adapt, and Persist. Communications of the ACM 47 (1) (2004), special issue

    Google Scholar 

  50. Oviatt, S.L., Cohen, P.: Multimodal interfaces that process what comes naturally. Comm. of the ACM 43(3), 45–48 (2000)

    Article  Google Scholar 

  51. Oviat, S.L.: Multimodal interfaces. In: Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, ch. 14, pp. 286–304 (2003)

    Google Scholar 

  52. Oviatt, S.L., Cohen, P., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., Ferro, D.: Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions. Human-Computer Int. 15, 263–322 (2000)

    Article  Google Scholar 

  53. Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. CVPR 2, 407–412 (1999)

    Google Scholar 

  54. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. on PAMI 22(12), 1424–1445 (2000)

    Google Scholar 

  55. Pantic, M., Rothkrantz, L.J.M.: Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)

    Article  Google Scholar 

  56. Paradiso, J., Sparacino, F.: Optical Tracking for Music and Dance Performance. In: Gruen, A., Kahmen, H. (eds.) Optical 3-D Measurement Techniques IV, pp. 11–18 (1997)

    Google Scholar 

  57. Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human- computer interaction: a review. IEEE Trans. on PAMI 19(7), 677–695 (1997)

    Google Scholar 

  58. Pelz, J.B.: Portable eye-tracking in natural behavior. J. of Vision 4(11) (2004)

    Google Scholar 

  59. Pentland, A.: Looking at People. Comm. of the ACM 43(3), 35–44 (2000)

    Article  Google Scholar 

  60. Pentland, A.: Socially Aware Computation and Communication. IEEE Computer 38(3) (2005)

    Google Scholar 

  61. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)

    Google Scholar 

  62. Porta, M.: Vision-based user interfaces: methods and applications. Int. J. Human-Computer Studies 57(1), 27–73 (2002)

    Article  Google Scholar 

  63. Reeves, L.M., et al.: Guidelines for multimodal user interface design. Communications of the ACM 47(1), 57–69 (2004)

    Article  Google Scholar 

  64. Rosales, R., Sclaroff, S.: Learning body pose via specialized maps. NIPS 14, 1263–1270 (2001)

    Google Scholar 

  65. Roth, P., Pun, T.: Design and evaluation of a multimodal system for the non-visual exploration of digital pictures. In: INTERACT 2003 (2003)

    Google Scholar 

  66. Ruddaraju, R., Haro, A., Nagel, K., Tran, Q., Essa, I., Abowd, G., Mynat, E.: Perceptual user interfaces using vision-based eye tracking. ICMI (2003)

    Google Scholar 

  67. Santella, A., DeCarlo, D.: Robust clustering of eye movement recordings for quantification of visual interest. Eye Tracking Research and Applications (ETRA), 27–34 (2004)

    Google Scholar 

  68. Sebe, N., Cohen, I., Huang, T.S.: Multimodal emotion recognition, Handbook of Pattern Recognition and Computer Vision. World Scientific, Singapore (2005)

    Google Scholar 

  69. Schapira, E., Sharma, R.: Experimental evaluation of vision and speech based multimodal interfaces. In: Workshop on Perceptive User Interfaces, pp. 1–9 (2001)

    Google Scholar 

  70. Schuller, B., Lang, M., Rigoll, G.: Multimodal emotion recognition in audiovisual communication. In: ICME (2002)

    Google Scholar 

  71. Selker, T.: Visual Attentive Interfaces. BT Technology Journal 22(4), 146–150 (2004)

    Article  Google Scholar 

  72. Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, I., Cai, G., Brewer, I., MacEachren, A., Sengupta, K.: Speech–gesture driven multimodal interfaces for crisis management. Proceedings of the IEEE 91(9), 1327–1354 (2003)

    Article  Google Scholar 

  73. Sibert, L.E., Jacob, R.J.K.: Evaluation of eye gaze interaction. In: Conf. Human-Factors in Computing Syst., pp. 281–288 (2000)

    Google Scholar 

  74. Smith, P., Shah, M., Lobo, N.d.V.: Determining driver visual zttention with one camera. IEEE Trans. on Intelligent Transportation Systems 4(4) (2003)

    Google Scholar 

  75. Simpson, R., LoPresti, E., Hayashi, S., Nourbakhsh, I., Miller, D.: The smart wheelchair component system. J. of Rehabilitation Research and Development (May/June 2004)

    Google Scholar 

  76. Sparacino, F.: The museum wearable: Real-time sensor-driven understanding of visitors. interests for personalized visually-augmented museum experiences. Museums and the Web (2002)

    Google Scholar 

  77. Trivedi, M.M., Cheng, S.Y., Childers, E.M.C., Krotosky, S.J.: Occupant posture analysis with stereo and thermal infrared video: Algorithms and experimental evaluation. IEEE Trans. on Vehicular Technology 53(6), 1698–1712 (2004)

    Article  Google Scholar 

  78. Turk, M.: Gesture recognition. In: Stanney, K. (ed.) Handbook of Virtual Environment Technology (2001)

    Google Scholar 

  79. Turk, M.: Computer vision in the interface. Communications of the ACM 47(1), 60–67 (2004)

    Article  Google Scholar 

  80. Turk, M., Robertson, G.: Perceptual Interfaces. Communications of the ACM 43(3), 32–34 (2000)

    Article  Google Scholar 

  81. Turk, M., Kölsch, M.: Perceptual Interfaces. In: Medioni, G., Kang, S.B. (eds.) Emerging Topics in Computer Vision, Prentice Hall, Englewood Cliffs (2004)

    Google Scholar 

  82. Wang, J.-G., Sung, E., Venkateswarlu, R.: Eye gaze estimation from a single image of one eye. In: ICCV, pp. 136–143 (2003)

    Google Scholar 

  83. Wang, L., Hu, W., Tan, T.: Recent developments in human motion analysis. Patt. Recogn. 36, 585–601 (2003)

    Article  Google Scholar 

  84. Wang, J.J.L., Singh, S.: Video analysis of human dynamics – A survey. Real-Time Imaging 9(5), 321–346 (2003)

    Article  Google Scholar 

  85. Wassermann, K.C., Eng, K., Verschure, P.F.M.J., Manzolli, J.: Live soundscape composition based on synthetic emotions. IEEE Multimedia Magazine 10(4) (2003)

    Google Scholar 

  86. Wu, Y., Huang, T.: Vision-based gesture recognition: A review. In: 3rd Gesture Workshop (1999)

    Google Scholar 

  87. Wu, Y., Hua, G., Yu, T.: Tracking articulated body by dynamic Markov network. In: ICCV, pp. 1094–1101 (2003)

    Google Scholar 

  88. Yang, M.-H., Kriegman, D., Ahuja, N.: Detecting faces in images: A survey. IEEE Trans. on PAMI 24(1), 34–58 (2002)

    Google Scholar 

  89. Yuan, Q., Sclaroff, S., Athitsos, V.: Automatic 2D hand tracking in video sequences. In: IEEE Workshop on Applications of Computer Vision (2005)

    Google Scholar 

  90. Yu, C., Ballard, D.H.: A multimodal learning interface for grounding spoken language in sensorimotor experience. ACM Trans. on Applied Perception (2004)

    Google Scholar 

  91. Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, J.: Face recognition: A literature survey. ACM Computing Surveys 12, 399–458 (2003)

    Article  Google Scholar 

  92. Salen, K., Zimmerman, E.: Rules of Play: Game Design Fundamentals. MIT Press, Cambridge (2003)

    Google Scholar 

  93. Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T.S., Roth, D., Levinson, S.: Bimodal HCI-related affect recognition. In: ICMI (2004)

    Google Scholar 

  94. Wu, Y., Huang, T.S.: Human hand modeling, analysis and animation in the context of human computer interaction. IEEE Signal Processing 18(3), 51–60 (2001)

    Article  Google Scholar 

  95. Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion. J. of the Acoustic Society of America 93(2), 1097–1108 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jaimes, A., Sebe, N. (2005). Multimodal Human Computer Interaction: A Survey. In: Sebe, N., Lew, M., Huang, T.S. (eds) Computer Vision in Human-Computer Interaction. HCI 2005. Lecture Notes in Computer Science, vol 3766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573425_1

Download citation

  • DOI: https://doi.org/10.1007/11573425_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29620-1

  • Online ISBN: 978-3-540-32129-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics