ABSTRACT
Most computational spoken dialogue systems take a "literary" approach to reference resolution. With this type of approach, entities that are mentioned by a human interactor are unified with elements in the world state based on the same principles that guide the process during text interpretation. In human-to-human interaction, however, referring is a much more collaborative process. Participants often under-specify their referents, relying on their discourse partners for feedback if more information is needed to uniquely identify a particular referent. By monitoring eye-movements during this interaction, it is possible to improve the performance of a spoken dialogue system on referring expressions that are underspecified according to the literary model. This paper describes a system currently under development that employs such a strategy.
- Allopenna, P., Magnuson, J., and Tanenhaus, M., 1998. Tracking the time course of spoken word recognition. Journal of Memory and Language, vol. 38, pages 419--439.Google ScholarCross Ref
- Baldwin, B., 1995. CogNiac: A discourse processing engine. Ph.D. Thesis, University of Pennsylvania, Department of Computer and Information Sciences. Google ScholarDigital Library
- Clark, H., and Schaefer, E., 1989. Collaborating on contributions to converstations. In Dietrich, R., and Graumann, C. (eds.) Language processing in Social Contexts. Elsevier Press.Google Scholar
- Clark, H., and Wilkes-Gibbs, D., 1986. Referring as a collaborative process. Cognition, vol. 22, pages 1--39.Google Scholar
- Crain, S. and Steedman, M., 1985. On not being led up the garden path: the use of context by the Psychological Parser. In Dowty, D., Kartunnen, L., and Zwicky, A. (eds.) Natural Language Parsing: Psychological, Computational and Theoretical Perspectives. Cambridge University Press.Google Scholar
- Dowding, J., Gawron, M., Appelt, D., Cherny, L., Moore, R. and Moran, D. 1993. Gemini: A natural language system for spoken language understanding. In Proceedings for the Thirty-First Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Dowding, J., Hockey, B. A., Gawron, M. J., and Culy, C. 2001. Practical issues in compiling typed unification grammars for speech recognition. Proceedings for the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, France. Google ScholarDigital Library
- Griffin, Z., and Bock, K. 2000. What the eyes say about speaking. Psychological Science, vol. 11(4), pages 274--279.Google Scholar
- Martin, D., Cheyer, A. and Moran, D. 1998. Building distributed software systems with the open agent architecture. In Proceedings of the Third International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology.Google Scholar
- Moore, R., Dowding, J., Bratt, H., Gawron, J., Gorfu, Y., and Cheyer, A. 1997. CommandTalk: A spoken-language interface for battlefield simulations. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 1--7. Google ScholarDigital Library
- PSA, 2001. Personal Satellite Assistant (PSA) Project. http://ic.arc.nasa.gov/ic/psa. As of 2 July 2001.Google Scholar
- Nuance, 2001. Nuance Communications, Inc. http://www.nuance.com. As of July 2, 2001.Google Scholar
- SMI, 2001. Sensomotoric Instruments, Inc. http://www.smi.de/el/index.html. As of 2 July 2001.Google Scholar
- Stent, A., Dowding, J., Gawron, J., Bratt, E., and Moore, R. 1999. The CommandTalk spoken dialog system. In Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics, pages 183--190. Google ScholarDigital Library
- Tanenhaus, M., Spivey-Knowlton, M., Eberhard, K., and Sedivy, J., 1995. Integration of visual and linguistic information in spoken language comprehension. Science, vol. 268, pages 632--634.Google Scholar
- Van Eijck, J., and Moore, R. 1992. Semantic rules for English. In Alshawi, H. (editor) The Core Language Engine. MIT Press.Google Scholar
Index Terms
- Using eye movements to determine referents in a spoken dialogue system
Recommendations
Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions
ICMI '14: Proceedings of the 16th International Conference on Multimodal InteractionWhen humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a ...
From vocal to multimodal dialogue management
ICMI '06: Proceedings of the 8th international conference on Multimodal interfacesMultimodal, speech-enabled systems pose different research problems when compared to unimodal, voice-only dialogue systems. One of the important issues is the question of how a multimodal interface should look like in order to make the multimodal ...
Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interactionIn this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Comments