Abstract
In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI). Together with improved speech processing technology it is an important step toward natural multimodal HCI. However, inclusion of non-predefined continuous gestures into a multimodal framework is a challenging problem. In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control. We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure. Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis. We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive testbed (iMAP). The testbed enabled elicitation of adequate, non-sequential, multimodal patterns in a narrative mode of HCI. Conducted user studies illustrate significance of accounting for the temporal alignment of gesture and speech parts in semantic mapping. Furthermore, co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
D. McNeill, Hand and Mind. The University of Chicago Press, Chicago, (1992).
A. Kendon. Current issues in the study of gesture. In J. L. Nespoulous, P. Peron, and A. R. Lecours, editors, The Biological Foundations of Gestures: Motor and Semiotic Aspects, Lawrence Erlbaum Assoc., (1986) 23–47.
A. De Angeli, L. Romary, F. Wolff. Ecological interfaces: Extending the pointing paradigm by visual context. P. Bouquet et al. (Eds.).:CONTEXT’99, LNAI 16888, (1999) 91–104.
R.A. Bolt. Put-that-there: Voice and gesture at the graphic interface. In SIGGRAPHComputer Graphics, New York, ACM Press (1980)
S. Oviatt. Multimodal interfaces for dynamic interactive maps. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’96), (1996) 95–102.
D. J. Sturman and D. Zeltzer. A survey of glove-based input. IEEE Computer Graphics and Applications, (1994) 14(1):30–39.
R. Sharma, J. Cai, S. Chakravarthy, I. Poddar and Y. Sethi. Exploiting Speech/Gesture Cooccurrence for Improving Continuous Gesture Recognition in Weather Narration. In Proc. International Conference on Faceand Gesture Recognition, Grenoble, France, (2000).
I. Poddar. Continuous Recognition of Deictic Gestures for Multimodal Interfaces. Master Thesis. The Pennsylvania State University, University Park, (1999).
V. I. Pavlovic, R. Sharma, and T. S. Huang. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Trans on Pattern Analysis and Machine Intelligence, (1997) 19(7):677–695.
R. Sharma, V. I. Pavlovic, and T. S. Huang. Toward multimodal human-computer interface. In Proceedings of the IEEE, (1998) 86(5):853–869.
S. Oviatt, A. De Angeli, and K. Kuhn. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’97), 95–102, ACM Press, New York, (1997) 415-422.
P. R. Cohen. Synergic use of direct manipulation and natural language. In Proc. Conference on Human Factors in Computing (CHI), (1989) 227–233.
M. Argule and M. Cook. Gaze and Mutual Gaze. Cambridge: Cambridge University Press, (1976).
S. Nobe, S. Hayamizu, O. Hasegawa, H. Takahashi. Are listeners paying attention to the hand gestures of an anthropomorphic agent: An evaluation using a gaze tracking method. Lecture Notes in Computer Science. Springer Verlag Heidelberg. (1998) 1371: 0049.
D.F. Armstrong, W.C. Stokoe, and S.E. Wilcox. Gesture and the nature of language. Cambridge University Press (1995).
Stokoe, W.C. Sign language structure: an outline of the communication systems of the American deaf. Studies in Linguistics, Ocasional Papers, 8 (1960)
R.W. Langacker. Foundation of cognitive grammar. Stanford University Press, V.2 (1991).
A. Kendon. Conducting Interaction. Cambridge: Cambridge University Press (1990).
Stokoe, W.C. Semantic phonology. Sign Language Studies (1991) 71:107–114.
S. Kita, I.V. Gijn, and H.V. Hulst. Movement phases in signs and co-speech gestures, and their transcription by human coders. In Proceedings of Intl. Gesture Workshop, (1997) 23–35.
L. Talmy. How language structures space. Spatial Orientation: Theory, Research and Application. In Pick and Acredolo (Eds.), Plenum Publishing Corp., NY, (1983).
B. Dasarathy. Sensor fusion potential exploitation-innovative architectures and illustrative approaches. Proc. IEEE, 85(1):24;38, January 1997.
D. Salber and J. Contaz. Applying the wizard-of-oz techniques to the study of multimodal systems. In EWHCI’93, Moscow, Russia, 1993.
Y. Azoz, L. Devi, and R. Sharma. Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, (1998).
S. Kettebekov and R. Sharma. Understanding gestures in multimodal human computer interaction. International Journal on Artificial Intelligence Tools, (2000) 9:2:205–224.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kettebekov, S., Sharma, R. (2001). Toward Natural Gesture/Speech Control of a Large Display. In: Little, M.R., Nigay, L. (eds) Engineering for Human-Computer Interaction. EHCI 2001. Lecture Notes in Computer Science, vol 2254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45348-2_20
Download citation
DOI: https://doi.org/10.1007/3-540-45348-2_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43044-5
Online ISBN: 978-3-540-45348-2
eBook Packages: Springer Book Archive