skip to main content
research-article
Open Access

Clarifying commands with information-theoretic human-robot dialog

Published:18 June 2013Publication History
Skip Abstract Section

Abstract

Our goal is to improve the efficiency and effectiveness of natural language communication between humans and robots. Human language is frequently ambiguous, and a robot's limited sensing makes complete understanding of a statement even more difficult. To address these challenges, we describe an approach for enabling a robot to engage in clarifying dialog with a human partner, just as a human might do in a similar situation. Given an unconstrained command from a human operator, the robot asks one or more questions and receives natural language answers from the human. We apply an information-theoretic approach to choosing questions for the robot to ask. Specifically, we choose the type and subject of questions in order to maximize the reduction in Shannon entropy of the robot's mapping between language and entities in the world. Within the framework of the G3 graphical model, we derive a method to estimate this entropy reduction, choose the optimal question to ask, and merge the information gained from the human operator's answer. We demonstrate that this improves the accuracy of command understanding over prior work while asking fewer questions as compared to baseline question-selection strategies.

References

  1. Bauer, A., Klasing, K., Lidoris, G., Mühlbauer, Q., Rohrmüller, F., Sosnowski, S., et al. (2009, April). The Autonomous City Explorer: Towards natural human-robot interaction in urban environments. International Journal of Social Robotics, 1(2), 127--140Google ScholarGoogle ScholarCross RefCross Ref
  2. Cantrell, R., Talamadupula, K., Schermerhorn, P., Benton, J., Kambhampati, S., & Scheutz, M. (2012). Tell me when and why to do it!: Run-time planner model updates via natural language instruction. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 471--478). New York, NY, USA: ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 859--865). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cohen, P., & Oviatt, S. (1995). The role of voice input for human-machine communication. In Proceedings of the National Academy of Sciences (Vol. 92, pp. 9921--9927). National Academy Sciences.Google ScholarGoogle ScholarCross RefCross Ref
  5. Doshi, F., & Roy, N. (2008). Spoken language interaction with model uncertainty: An adaptive human-robot interaction system. Connection Science, 20(4), 299--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In IEEE International Conference on Robotics and Automation (pp. 4163--4168). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A survey of socially interactive robots. Robotics and autonomous systems, 42(3), 143--166.Google ScholarGoogle Scholar
  8. Hsiao, K., Tellex, S., Vosoughi, S., Kubat, R., & Roy, D. (2008). Object schemas for grounding language in a responsive robot. Connection Science, 20(4), 253--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jackendoff, R. S. (1985). Semantics and cognition (Vol. 8). MIT Press.Google ScholarGoogle Scholar
  10. Jurafsky, D., & Martin, J. H. (2008). Speech and language processing (2 ed.). Englewood Cliffs, New Jersey: Pearson Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kollar, T., Tellex, S., Roy, D., & Roy, N. (2010). Toward understanding natural language directions. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 259--266). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kwiatkowski, T., Zettlemoyer, L., Goldwater, S., & Steedman, M. (2010). Inducing probabilistic CCG grammars from logical form with higher-order unification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1223--1233). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: Connecting language, knowledge, and action in route instructions. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 1475--1482). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marneffe, M. de, MacCartney, B., & Manning, C. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation (LREC) (pp. 449--454). Genoa, Italy.Google ScholarGoogle Scholar
  15. Matuszek, C., Fox, D., & Koscher, K. (2010). Following directions using statistical machine translation. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) (pp. 251--258). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012). Learning to parse natural language commands to a robot control system. In Proceedings of the International Symposium on Experimental Robotics (ISER). Quebec City, Canada.Google ScholarGoogle Scholar
  17. McCallum, A. K. (2002). MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu.Google ScholarGoogle Scholar
  18. Piantadosi, S., Goodman, N., Ellis, B., & Tenenbaum, J. (2008). A Bayesian model of the acquisition of compositional semantics. In Proceedings of the Thirtieth Annual Conference of the Cognitive Science Society (pp. 1620--1625).Google ScholarGoogle Scholar
  19. Rosenthal, S., Veloso, M., & Dey, A. K. (2011). Learning accuracy and availability of humans who help mobile robots. In Proceedings of the National Conference on Artificial Intelligence (AAAI) (pp. 1501--1506). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th annual meeting of the association for computational linguistics (ACL-2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Severinson-Eklundh, K., Green, A., & Hüttenrauch, H. (2003). Social and collaborative aspects of interaction with a service robot. Robotics and Autonomous Systems, 42(3), 223--234.Google ScholarGoogle ScholarCross RefCross Ref
  22. Shimizu, N., & Haas, A. (2009). Learning to follow navigational route instructions. In Proceedings of the 21st International Joint Conference on Artifical Intelligence (pp. 1488--1493). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Simeonov, D., Tellex, S., Kollar, T., & Roy, N. (2011). Toward interpreting spatial language discourse with grounding graphs. In RSS Workshop on Grounding Human-Robot Dialog for Spatial Tasks. Los Angeles, CA.Google ScholarGoogle Scholar
  24. Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams, W., Bugajska, M., & Brock, D. (2004). Spatial language for human-robot dialogs. IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 34(2), 154--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D., & Hysom, D. (2010, April). Reconcile: A coreference resolution research platform (Tech. Rep.). Cornell University.Google ScholarGoogle Scholar
  26. Tellex, S. (2010). Natural Language and Spatial Reasoning. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A., Teller, S., et al. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of the national conference on artificial intelligence (aaai). San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A. G., Teller, S., et al. (2011). Approaching the symbol grounding problem with probabilistic graphical models. AI Magazine, 32(4), 64--76.Google ScholarGoogle ScholarCross RefCross Ref
  29. Tellex, S., Thaker, P., Deits, R., Kollar, T., & Roy, N. (2012, July). Toward information theoretic human-robot dialog. In Proceedings of Robotics: Science and Systems. Sydney, Australia.Google ScholarGoogle ScholarCross RefCross Ref
  30. Thompson, C. A., & Mooney, R. J. (2003). Acquiring word-meaning mappings for natural language interfaces. Journal of Artificial Intelligence Research, 18, 1--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vogel, A., & Jurafsky, D. (2010). Learning to follow navigational directions. In Proceedings of the Association for Computational Linguistics (pp. 806--814). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Williams, J. D., & Young, S. (2007a, April). Partially observable Markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2), 393--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Williams, J. D., & Young, S. (2007b, September). Scaling POMDPs for spoken dialog management. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2116--2129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Winograd, T. (1971). Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA.Google ScholarGoogle Scholar
  35. Wong, Y., & Mooney, R. (2007). Learning synchronous grammars for semantic parsing with lambda calculus. In Association for computational linguistics (Vol. 45, p. 960).Google ScholarGoogle Scholar
  36. Young, S. (2006). Using POMDPs for dialog management. In IEEE Spoken Language Technology Workshop (pp. 8--13).Google ScholarGoogle ScholarCross RefCross Ref
  37. Zettlemoyer, L. S., & Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the Conference on Uncertainty in Artificial Intelligence) (pp. 658--666). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Clarifying commands with information-theoretic human-robot dialog
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader