Abstract
Our goal is to improve the efficiency and effectiveness of natural language communication between humans and robots. Human language is frequently ambiguous, and a robot's limited sensing makes complete understanding of a statement even more difficult. To address these challenges, we describe an approach for enabling a robot to engage in clarifying dialog with a human partner, just as a human might do in a similar situation. Given an unconstrained command from a human operator, the robot asks one or more questions and receives natural language answers from the human. We apply an information-theoretic approach to choosing questions for the robot to ask. Specifically, we choose the type and subject of questions in order to maximize the reduction in Shannon entropy of the robot's mapping between language and entities in the world. Within the framework of the G3 graphical model, we derive a method to estimate this entropy reduction, choose the optimal question to ask, and merge the information gained from the human operator's answer. We demonstrate that this improves the accuracy of command understanding over prior work while asking fewer questions as compared to baseline question-selection strategies.
- Bauer, A., Klasing, K., Lidoris, G., Mühlbauer, Q., Rohrmüller, F., Sosnowski, S., et al. (2009, April). The Autonomous City Explorer: Towards natural human-robot interaction in urban environments. International Journal of Social Robotics, 1(2), 127--140Google ScholarCross Ref
- Cantrell, R., Talamadupula, K., Schermerhorn, P., Benton, J., Kambhampati, S., & Scheutz, M. (2012). Tell me when and why to do it!: Run-time planner model updates via natural language instruction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 471--478). New York, NY, USA: ACM. Google ScholarDigital Library
- Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 859--865). Google ScholarDigital Library
- Cohen, P., & Oviatt, S. (1995). The role of voice input for human-machine communication. In Proceedings of the National Academy of Sciences (Vol. 92, pp. 9921--9927). National Academy Sciences.Google ScholarCross Ref
- Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299--319. Google ScholarDigital Library
- Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In IEEE International Conference on Robotics and Automation (pp. 4163--4168). Google ScholarDigital Library
- Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A survey of socially interactive robots. Robotics and autonomous systems, 42(3), 143--166.Google Scholar
- Hsiao, K., Tellex, S., Vosoughi, S., Kubat, R., & Roy, D. (2008). Object schemas for grounding language in a responsive robot. Connection Science, 20(4), 253--276. Google ScholarDigital Library
- Jackendoff, R. S. (1985). Semantics and cognition (Vol. 8). MIT Press.Google Scholar
- Jurafsky, D., & Martin, J. H. (2008). Speech and language processing (2 ed.). Englewood Cliffs, New Jersey: Pearson Prentice Hall. Google ScholarDigital Library
- Kollar, T., Tellex, S., Roy, D., & Roy, N. (2010). Toward understanding natural language directions. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 259--266). Google ScholarDigital Library
- Kwiatkowski, T., Zettlemoyer, L., Goldwater, S., & Steedman, M. (2010). Inducing probabilistic CCG grammars from logical form with higher-order unification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1223--1233). Google ScholarDigital Library
- MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: Connecting language, knowledge, and action in route instructions. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 1475--1482). Google ScholarDigital Library
- Marneffe, M. de, MacCartney, B., & Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation (LREC) (pp. 449--454). Genoa, Italy.Google Scholar
- Matuszek, C., Fox, D., & Koscher, K. (2010). Following directions using statistical machine translation. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 251--258). Google ScholarDigital Library
- Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012). Learning to parse natural language commands to a robot control system. In Proceedings of the International Symposium on Experimental Robotics (ISER). Quebec City, Canada.Google Scholar
- McCallum, A. K. (2002). MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu.Google Scholar
- Piantadosi, S., Goodman, N., Ellis, B., & Tenenbaum, J. (2008). A Bayesian model of the acquisition of compositional semantics. In Proceedings of the Thirtieth Annual Conference of the Cognitive Science Society (pp. 1620--1625).Google Scholar
- Rosenthal, S., Veloso, M., & Dey, A. K. (2011). Learning accuracy and availability of humans who help mobile robots. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 1501--1506). Google ScholarDigital Library
- Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th annual meeting of the association for computational linguistics (ACL-2000). Google ScholarDigital Library
- Severinson-Eklundh, K., Green, A., & Hüttenrauch, H. (2003). Social and collaborative aspects of interaction with a service robot. Robotics and Autonomous Systems, 42(3), 223--234.Google ScholarCross Ref
- Shimizu, N., & Haas, A. (2009). Learning to follow navigational route instructions. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (pp. 1488--1493). Google ScholarDigital Library
- Simeonov, D., Tellex, S., Kollar, T., & Roy, N. (2011). Toward interpreting spatial language discourse with grounding graphs. In RSS Workshop on Grounding Human-Robot Dialog for Spatial Tasks. Los Angeles, CA.Google Scholar
- Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., & Brock, D. (2004). Spatial language for human-robot dialogs. IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 34(2), 154--167. Google ScholarDigital Library
- Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D., & Hysom, D. (2010, April). Reconcile: A coreference resolution research platform (Tech. Rep.). Cornell University.Google Scholar
- Tellex, S. (2010). Natural Language and Spatial Reasoning. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA. Google ScholarDigital Library
- Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A., Teller, S., et al. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of the national conference on artificial intelligence (aaai). San Francisco, CA. Google ScholarDigital Library
- Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A. G., Teller, S., et al. (2011). Approaching the symbol grounding problem with probabilistic graphical models. AI Magazine, 32(4), 64--76.Google ScholarCross Ref
- Tellex, S., Thaker, P., Deits, R., Kollar, T., & Roy, N. (2012, July). Toward information theoretic human-robot dialog. In Proceedings of Robotics: Science and Systems. Sydney, Australia.Google ScholarCross Ref
- Thompson, C. A., & Mooney, R. J. (2003). Acquiring word-meaning mappings for natural language interfaces. Journal of Artificial Intelligence Research, 18, 1--44. Google ScholarDigital Library
- Vogel, A., & Jurafsky, D. (2010). Learning to follow navigational directions. In Proceedings of the Association for Computational Linguistics (pp. 806--814). Google ScholarDigital Library
- Williams, J. D., & Young, S. (2007a, April). Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2), 393--422. Google ScholarDigital Library
- Williams, J. D., & Young, S. (2007b, September). Scaling POMDPs for spoken dialog management. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2116--2129. Google ScholarDigital Library
- Winograd, T. (1971). Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- Wong, Y., & Mooney, R. (2007). Learning synchronous grammars for semantic parsing with lambda calculus. In Association for computational linguistics (Vol. 45, p. 960).Google Scholar
- Young, S. (2006). Using POMDPs for dialog management. In IEEE Spoken Language Technology Workshop (pp. 8--13).Google ScholarCross Ref
- Zettlemoyer, L. S., & Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the Conference on Uncertainty in Artificial Intelligence) (pp. 658--666). Google ScholarDigital Library
Index Terms
- Clarifying commands with information-theoretic human-robot dialog
Recommendations
Miscommunication Detection and Recovery in Situated Human–Robot Dialogue
Even without speech recognition errors, robots may face difficulties interpreting natural-language instructions. We present a method for robustly handling miscommunication between people and robots in task-oriented spoken dialogue. This capability is ...
Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interactionIn this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Lexical Entrainment in Multi-party Human–Robot Interaction
Social RoboticsAbstractThis paper reports lexical entrainment in a multi-party human–robot interaction, wherein one robot and two humans serve as participants. Humans tend to use the same terms as their interlocutors while making conversation. This phenomenon is called ...
Comments