ABSTRACT
In this work, we present a novel human-robot interaction (HRI) method to detect and engage passive subjects in multiparty conversations using a humanoid robot. Voice activity detection and speaker localization are combined with facial recognition to detect and identify non-participating subjects. Once a non-participating individual is identified, the robot addresses the subject with a fact related to the topic of the conversation, with the goal of promoting the subject to join the conversation. To prompt sentences related to the topic of the conversation, automatic speech recognition and natural language processing techniques are employed. Preliminary experiments demonstrate that the method successfully identifies and engages passive subjects in a conversation.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarCross Ref
- Cynthia Breazeal. 2003. Toward sociable robots. Robotics and Autonomous Systems 42, 3--4 (2003), 167--175. https://doi.org/10.1016/S0921-8890(02)00373-1Google ScholarCross Ref
- Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. EMNLP 2018 - Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Proceedings (3 2018), 169--174. https://doi.org/10.18653/ v1/d18-2029Google Scholar
- Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. 2017. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. (2017).Google Scholar
- Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, Mlm (2019), 4171--4186.Google Scholar
- J.J. Godfrey, E.C. Holliman, and J. McDaniel. 1992. SWITCHBOARD: telephone speech corpus for research and development. In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 517--520. https://doi.org/10.1109/ICASSP.1992.225858Google ScholarCross Ref
- Shogo Ikari, Yuichiro Yoshikawa, and Hiroshi Ishiguro. 2020. Multiple-Robot Mediated Discussion System to support group discussion. 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020 (2020), 495--502. https://doi.org/10.1109/RO-MAN47096.2020.9223444Google ScholarCross Ref
- C. Knapp and G. Carter. 1976. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 24, 4 (8 1976), 320--327. https://doi.org/10.1109/TASSP.1976.1162830Google ScholarCross Ref
- Yoichi Matsuyama, Iwao Akiba, Shinya Fujie, and Tetsunori Kobayashi. 2015. Four-participant group conversation: A facilitation robot controlling engagement density as the fourth participant. Computer Speech and Language 33, 1 (2015), 1--24. https://doi.org/10.1016/j.csl.2014.12.001Google ScholarDigital Library
- Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, and Tetsunori Kobayashi. 2010. Framework of communication activation robot participating in multiparty conversation. AAAI Fall Symposium - Technical Report FS-10-05, September 2017 (2010), 68--73.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. https://journal.cecyf.fr/ojs/index.php/cybin/article/view/9Google Scholar
- Bilge Mutlu, Toshiyuki Shiwa, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2009. Footing in human-robot conversations. 2, 1 (2009), 61. https: //doi.org/10.1145/1514095.1514109Google ScholarDigital Library
- Catharine Oertel, Ginevra Castellano, Mohamed Chetouani, Jauwairia Nasir, Mohammad Obaid, Catherine Pelachaud, and Christopher Peters. 2020. Engagement in Human-Agent Interaction: An Overview. Frontiers in Robotics and AI 7, August (2020), 1--21. https://doi.org/10.3389/frobt.2020.00092Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Stroudsburg, PA, USA, 1532--1543. https://doi.org/10. 3115/v1/D14-1162Google ScholarCross Ref
- Abhishek Sehgal and Nasser Kehtarnavaz. 2018. A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection. IEEE Access 6 (2018), 9017--9026. https://doi.org/10.1109/ACCESS.2018.2800728Google ScholarCross Ref
- Candace L. Sidner, Christopher Lee, Cory D. Kidd, Neal Lesh, and Charles Rich. 2005. Explorations in engagement for humans and robots. Artificial Intelligence 166, 1--2 (2005), 140--164. https://doi.org/10.1016/j.artint.2005.03.005Google ScholarCross Ref
- Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1631--1642. https://www.aclweb.org/anthology/D13--1170Google Scholar
- Margaret L. Traeger, Sarah Strohkorb Sebo, Malte Jung, Brian Scassellati, and Nicholas A. Christakis. 2020. Vulnerable robots positively shape human conversational dynamics in a human-robot team. Proceedings of the National Academy of Sciences of the United States of America 117, 12 (2020), 6370--6375. https://doi.org/10.1073/pnas.1910402117Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 2017-Decem, Nips (2017), 5999--6009.Google Scholar
- Marynel Vázquez, Elizabeth J. Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E. Hudson. 2017. Towards Robot Autonomy in Group Conversations: Understanding the Effects of Body Orientation and Gaze. ACM/IEEE International Conference on Human-Robot Interaction Part F1271 (2017), 42--52. https://doi.org/10.1145/2909824.3020207Google ScholarDigital Library
Index Terms
- Identification and Engagement of Passive Subjects in Multiparty Conversations by a Humanoid Robot
Recommendations
Footing in human-robot conversations: how robots might shape participant roles using gaze cues
HRI '09: Proceedings of the 4th ACM/IEEE international conference on Human robot interactionDuring conversations, speakers establish their and others' participant roles (who participates in the conversation and in what capacity)--or "footing" as termed by Goffman-using gaze cues. In this paper, we study how a robot can establish the ...
Estimating user's engagement from eye-gaze behaviors in human-agent conversations
IUI '10: Proceedings of the 15th international conference on Intelligent user interfacesIn face-to-face conversations, speakers are continuously checking whether the listener is engaged in the conversation and change the conversational strategy if the listener is not fully engaged in the conversation. With the goal of building a ...
Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interactionIn this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Comments