A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Martinez-Cantin, Ruben; de Freitas, Nando; Brochu, Eric; Castellanos, José; Doucet, Arnaud

doi:10.1007/s10514-009-9130-2

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Published: 04 August 2009

Volume 27, pages 93–103, (2009)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Ruben Martinez-Cantin¹,
Nando de Freitas²,
Eric Brochu²,
José Castellanos³ &
…
Arnaud Doucet²

2121 Accesses
139 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is high-dimensional, continuous, non-differentiable, nonlinear, non-Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomous Exploration with Expectation-Maximization

Demonstration-Guided Motion Planning

Autonomous Exploration with Exact Inverse Sensor Models

Article 28 October 2017

References

Bailey, T., Nieto, J., Guivant, J., Stevens, M., & Nebot, E. (2006). Consistency of the EKF-SLAM algorithm. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems, 2006.
Baxter, J., & Bartlett, P. L. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15(4), 319–350.
Article MATH MathSciNet Google Scholar
Bergman, N. (1999). Recursive Bayesian estimation: navigation and tracking applications. PhD thesis, Linköping University.
Bertsekas, D. (1995). Dynamic programming and optimal control. Nashua: Athena Scientific.
MATH Google Scholar
Brochu, E., de Freitas, N., & Ghosh, A. (2007). Active preference learning with discrete choice data. In Advances in neural information processing systems, 2007.
Bryson, M., & Sukkarieh, S. (2008). Observability analysis and active control for airborne SLAM. IEEE Transaction on Aerospace Electronic Systems, 44(1), 261–280.
Article Google Scholar
Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: a review. Journal of Statistical Science, 10, 273–304.
Article MATH MathSciNet Google Scholar
Durrant-Whyte, H., & Bailey, T. (2006). Simultaneous localisation and mapping (SLAM): part I the essential algorithms. Robotics and Automation Magazine, 13, 99–110.
Article Google Scholar
Finkel, D. (2003). DIRECT optimization algorithm user guide. Center for Research in Scientific Computation, North Carolina State University.
Gablonsky, J. (2001). Modification of the DIRECT algorithm. PhD thesis, Department of Mathematics, North Carolina State University, Raleigh, North Carolina.
Hernandez, M. (2004). Optimal sensor trajectories in bearings-only tracking. In P. Svensson & J. Schubert (Eds.), Proc. of the seventh int. conf. on information fusion, international society of information fusion, Mountain View, CA (Vol. II, pp. 893–900).
Hernandez, M., Kirubarajan, T., & Bar-Shalom, Y. (2004). Multisensor resource deployment using posterior Cramèr-Rao bounds. IEEE Transactions on Aerospace Electronic Systems, 40(2), 399–416.
Article Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., & Vijayakumar, S. (2009). A novel method for learning policies from variable constraint data. Autonomous Robots, 27 (Special issue on Robot Learning, Part B) (this issue).
Jones, D. (2001). A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21, 345–383.
Article MATH Google Scholar
Jones, D., Perttunen, C., & Stuckman, B. (1993). Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79(1), 157–181.
Article MATH MathSciNet Google Scholar
Jones, D., Schonlau, M., & Welch, W. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4), 455–492.
Article MATH MathSciNet Google Scholar
Kato, H., & Billinghurst, M. (1999). Marker tracking and hmd calibration for a video-based augmentedreality conferencing system. In Proc. of the 2nd IEEE and ACM int. work. on augmented reality (pp. 85–94) 1999.
Kollar, T., & Roy, N. (2008). Trajectory optimization using reinforcement learning for map exploration. International Journal of Robotics Research, 27(2), 175–197.
Article Google Scholar
Konda, V., & Tsitsiklis, J. (2003). On actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4), 1143–1166.
Article MATH MathSciNet Google Scholar
Kueck, H., de Freitas, N., & Doucet, A. (2006). SMC samplers for Bayesian optimal nonlinear design. In Nonlinear statistical signal processing workshop (NSSPW), 2006.
Kushner, H. (1964). A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86, 97–106.
Google Scholar
Leung, C., Huang, S., Dissanayake, G., & Forukawa, T. (2005). Trajectory planning for multiple robots in bearing-only target localisation. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems, 2005.
Lizotte, D. (2008). Practical Bayesian optimization. PhD thesis, Dept. of Computer Science, University of Alberta.
Lizotte, D., Wang, T., Bowling, M., & Schuurmans, D. (2007). Automatic gait optimization with Gaussian process regression. In International joint conference on artificial intelligence, 2007.
Locatelli, M. (1997). Bayesian algorithms for one-dimensional global optimization. Journal of Global Optimization, 10, 57–76.
Article MATH MathSciNet Google Scholar
Maciejowski, J. (2002). Predictive control: with constraints. New York: Prentice-Hall.
Google Scholar
Martinez-Cantin, R. (2008). Active map learning for robots: insights into statistical consistency. PhD thesis, University of Zaragoza.
Martinez-Cantin, R., de Freitas, N., & Castellanos, J. (2006). Analysis of particle methods for simultaneous robot localization and mapping and a new algorithm: Marginal-SLAM. In Proc. of the IEEE int. conf. on robotics & automation, 2006.
Martinez-Cantin, R., de Freitas, N., & Castellanos, J. (2007a). Active policy learning for robot planning and exploration under uncertainty. In Proc. of robotics: science and systems, 2007.
Martinez-Cantin, R., de Freitas, N., Doucet, A., & Castellanos, J. (2007b). Active policy learning for robot planning and exploration under uncertainty. In Robotics: science and systems (RSS), 2007.
Meger, D., Marinakis, D., Rekleitis, I., & Dudek, G. (2009). Inferring a probability distribution function for the pose of a sensor network using a mobile robot. In: ICRA, 2009.
Metta, G., Fitzpatrick, P., & Natale, L. (2006). Yarp: yet another robot platform. International Journal on Advanced Robotics Systems, 3(1), 140–151.
Google Scholar
Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. In L. Dixon & G. Szego (Eds.), Towards global optimisation (Vol. 2, pp. 117–129). Amsterdam: Elsevier.
Google Scholar
Ng, A., & Jordan, M. (2000). PEGASUS: a policy search method for large MDPs and POMDPs. In Proc. of the sixteenth conf. on uncertainty in artificial intelligence, 2000.
Paris, S., & Le Cadre, J. (2002). Planification for terrain-aided navigation. In Fusion 2002, Annapolis, Maryland (pp. 1007–1014).
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proc. of the IEEE/RSJ int. conf. on intelligent robots and systems, 2006.
Peters, J., & Schaal, S. (2008a). Natural actor critic. Neurocomputing, 71(7–9), 1180–1190.
Article Google Scholar
Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.
Article Google Scholar
Rasmussen, C., & Williams, C. (2006). Gaussian processes for machine learning. Cambridge: The MIT Press.
MATH Google Scholar
Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–73 (Special issue on Robot Learning, Part A).
Article Google Scholar
Sasena, M. (2002). Flexibility and efficiency enhancement for constrained global design optimization with Kriging approximations. PhD thesis, University of Michigan.
Schonlau, M., Welch, W., & Jones, D. (1998). Global versus local search in constrained optimization of computer models. In N. Flournoy, W. Rosenberger, W. Wong (Eds.) New developments and applications in experimental design (Vol. 34, pp. 11–25). Institute of Mathematical Statistics.
Sim, R., & Roy, N. (2005). Global A-optimal robot exploration in SLAM. In Proc. of the IEEE int. conf. on robotics & automation, 2005.
Singh, A., Krause, A., Guestrin, C., Kaiser, W., & Batalin, M. (2007). Efficient planning of informative paths for multiple robots. In Proc. of the int. joint conf. on artificial intelligence, 2007.
Singh, A., Krause, A., Guestrin, C., & Kaiser, W. (2009). Efficient informative sensing using multiple robots. Journal of Artificial Intelligence Research (JAIR), 34, 707–755.
Google Scholar
Singh, S., Kantas, N., Doucet, A., Vo, B., & Evans, R. (2005). Simulation-based optimal sensor scheduling with application to observer trajectory planning. In Proc. of the IEEE conf. on decision and control and eur. control conference (pp. 7296–7301) 2005.
Smallwood, R., & Sondik, E. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071–1088.
Article MATH Google Scholar
Stachniss, C., Grisetti, G., & Burgard, W. (2005). Information gain-based exploration using Rao-Blackwellized particle filters. In Proc. of robotics: science and systems, Cambridge, USA, 2005.
Stolle, M., & Atkeson, C. (2009). Finding and transferring policies using stored behaviors. Autonomous Robots, 27 (Special issue on Robot Learning, Part B) (this issue).
Tremois, O., & Le Cadre, J. (1999). Optimal observer trajectory in bearings-only tracking for manoeuvering sources. IEE Proceeding Radar, Sonar Navigation, 146(1), 31–39.
Article Google Scholar
Vazquez, E., & Bect, J. (2008). On the convergence of the expected improvement algorithm. arXivorg arXiv:0712.3744v2 [stat.CO], http://arxiv.org/abs/0712.3744v2.
Vidal-Calleja, T., Davison, A., Andrade-Cetto, J., & Murray, D. (2006). Active control for single camera SLAM. In Proc. of the IEEE int. conf. on robotics & automation (pp. 1930–1936) 2006.
Vlassis, N., Toussaint, G. K. M., & Piperidis, S. (2009). Learning model-free robot control using a Monte Carlo em algorithm. Autonomous Robots, 27 (Special issue on Robot Learning, Part B) (this issue).
Williams, R. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3), 229–256.
MATH Google Scholar
Zilinskas, A., & Zilinskas, J. (2002). Global optimization based on a statistical model and simplicial partitioning. Computers and Mathematics with Applications, 44, 957–967.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Systems and Robotics, Instituto Superior Técnico, Lisboa, Portugal
Ruben Martinez-Cantin
Department of Computer Science, University of British Columbia, Vancouver, Canada
Nando de Freitas, Eric Brochu & Arnaud Doucet
Department of Computer Science and System Engineering, University of Zaragoza, Zaragoza, Spain
José Castellanos

Authors

Ruben Martinez-Cantin
View author publications
You can also search for this author in PubMed Google Scholar
Nando de Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Eric Brochu
View author publications
You can also search for this author in PubMed Google Scholar
José Castellanos
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Doucet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruben Martinez-Cantin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martinez-Cantin, R., de Freitas, N., Brochu, E. et al. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Auton Robot 27, 93–103 (2009). https://doi.org/10.1007/s10514-009-9130-2

Download citation

Received: 08 November 2008
Accepted: 23 July 2009
Published: 04 August 2009
Issue Date: August 2009
DOI: https://doi.org/10.1007/s10514-009-9130-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Abstract

Access this article

Similar content being viewed by others

Autonomous Exploration with Expectation-Maximization

Demonstration-Guided Motion Planning

Autonomous Exploration with Exact Inverse Sensor Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation