ABSTRACT
We present an end-to-end deep learning model for robot navigation from raw visual pixel input and natural text instructions. The proposed model is an LSTM-based sequence-to-sequence neural network architecture with attention, which is trained on instruction-perception data samples collected in a synthetic environment. We conduct experiments on the SAIL dataset which we reconstruct in 3D so as to generate the 2D images associated with the data. Our experiments show that the performance of our model is on a par with state-of-the-art, despite the fact that it learns navigational language with end-to-end training from raw visual data.
- Jacob Andreas and Dan Klein. 2015. Alignment-based compositional semantics for instruction following. arXiv preprint arXiv:1508.06491 (2015).Google Scholar
- Yoav Artzi, Dipanjan Das, and Slav Petrov. 2014. Learning compact lexicons for CCG semantic parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 1273--1283.Google ScholarCross Ref
- Yoav Artzi and Luke Zettlemoyer. 2013. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association of Computational Linguistics , Vol. 1 (2013), 49--62.Google ScholarCross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Benjamin Börschinger, Bevan K Jones, and Mark Johnson. 2011. Reducing grounded learning tasks to grammatical inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1416--1425.Google Scholar
- Ozan Arkan Can and Deniz Yuret. 2018. A new dataset and model for learning to understand navigational instructions. arXiv preprint arXiv:1805.07952 (2018).Google Scholar
- David L Chen. 2012. Fast online lexicon learning for grounded language acquisition. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 430--439.Google ScholarDigital Library
- David L Chen and Raymond J Mooney. 2011. Learning to interpret natural language navigation instructions from observations.. In AAAI , Vol. 2. 1--2.Google Scholar
- Daniel Fried, Jacob Andreas, and Dan Klein. 2017. Unified Pragmatic Models for Generating and Following Instructions. arXiv preprint arXiv:1711.04987 (2017).Google Scholar
- Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, and Trevor Darrell. 2018. Speaker-Follower Models for Vision-and-Language Navigation. arXiv preprint arXiv:1806.02724 (2018).Google Scholar
- Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks , Vol. 18, 5--6 (2005), 602--610.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780.Google Scholar
- Rohit J Kate and Raymond J Mooney. 2006. Using string-kernels for learning semantic parsers. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 913--920.Google Scholar
- Joohyun Kim and Raymond Mooney. 2013. Adapting discriminative reranking to grounded language learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 218--227.Google Scholar
- Joohyun Kim and Raymond J Mooney. 2012. Unsupervised pcfg induction for grounded language learning with highly ambiguous supervision. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 433--444.Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Tomávs Kovc iskỳ , Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, and Karl Moritz Hermann. 2016. Semantic parsing with semi-supervised sequential autoencoders. arXiv preprint arXiv:1609.09315 (2016).Google Scholar
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
- Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: Connecting language, knowledge, and action in route instructions. Def , Vol. 2, 6 (2006), 4.Google Scholar
- Hongyuan Mei, Mohit Bansal, and Matthew R Walter. 2016. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences.. In AAAI, Vol. 1. 2.Google Scholar
- Tomávs Mikolov, Martin Karafiát, Lukávs Burget, Jan vC ernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association .Google ScholarCross Ref
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807--814.Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision , Vol. 115, 3 (2015), 211--252.Google ScholarDigital Library
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.Google Scholar
- Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 23--30.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008.Google Scholar
- Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 806--814.Google Scholar
- Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE , Vol. 78, 10 (1990), 1550--1560.Google ScholarCross Ref
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057.Google ScholarDigital Library
- Deniz Yuret. 2016. Knet: beginning deep learning with 100 lines of julia. In Machine Learning Systems Workshop at NIPS, Vol. 2016. 5.Google Scholar
Index Terms
- Visually Grounded Language Learning for Robot Navigation
Recommendations
Biomimetic application of desert ant visual navigation for mobile robot docking with weighted landmarks
Previous work has shown that honeybees use a snapshot model to determine a local vector to find their way home. A simpler, average landmark vector model has since been proposed for biologically-inspired mobile robot homing. Previously, the authors have ...
Navigation of mobile robots in the presence of obstacles
Robot navigation is one of the basic problems in robotics. In general, the robot navigation algorithms are classified as global or local, depending on surrounding environment. In global navigation, the environment surrounding the robot is known and the ...
Design and implementation of a navigation system for autonomous mobile robots
In this paper, a navigation system for autonomous mobile robots is proposed. Our navigation system is a hybrid of behaviour-based and model-based navigation systems. In our system, a behaviour-based subsystem is in charge of low-level reactive actions, ...
Comments