Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state

Sadigh, Dorsa; Landolfi, Nick; Sastry, Shankar S.; Seshia, Sanjit A.; Dragan, Anca D.

doi:10.1007/s10514-018-9746-1

Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state

Published: 04 May 2018

Volume 42, pages 1405–1426, (2018)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Dorsa Sadigh ORCID: orcid.org/0000-0002-7802-9183¹,
Nick Landolfi¹,
Shankar S. Sastry¹,
Sanjit A. Seshia¹ &
…
Anca D. Dragan¹

3591 Accesses
101 Citations
3 Altmetric
Explore all metrics

Abstract

Traditionally, autonomous cars treat human-driven vehicles like moving obstacles. They predict their future trajectories and plan to stay out of their way. While physically safe, this results in defensive and opaque behaviors. In reality, an autonomous car’s actions will actually affect what other cars will do in response, creating an opportunity for coordination. Our thesis is that we can leverage these responses to plan more efficient and communicative behaviors. We introduce a formulation of interaction with human-driven vehicles as an underactuated dynamical system, in which the robot’s actions have consequences on the state of the autonomous car, but also on the human actions and thus the state of the human-driven car. We model these consequences by approximating the human’s actions as (noisily) optimal with respect to some utility function. The robot uses the human actions as observations of her underlying utility function parameters. We first explore learning these parameters offline, and show that a robot planning in the resulting underactuated system is more efficient than when treating the person as a moving obstacle. We also show that the robot can target specific desired effects, like getting the person to switch lanes or to proceed first through an intersection. We then explore estimating these parameters online, and enable the robot to perform active information gathering: generating actions that purposefully probe the human in order to clarify their underlying utility parameters, like driving style or attention level. We show that this significantly outperforms passive estimation and improves efficiency. Planning in our model results in coordination behaviors: the robot inches forward at an intersection to see if can go through, or it reverses to make the other car proceed first. These behaviors result from the optimization, without relying on hand-coded signaling strategies. Our user studies support the utility of our model when interacting with real users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Guiding Autonomous Vehicles Past Obstacles – Theory and Practice

Safe and Efficient Autonomous Navigation in the Presence of Humans at Control Level

MPDM: Multi-policy Decision-Making from Autonomous Driving to Social Robot Navigation

Notes

A preliminary version of our results was reported in Sadigh et al. (2016a, b). This paper extends that work by providing more detailed discussion and experiments...
One exception is Nikolaidis et al. (2016), who propose to solve the full POMDP, albeit for discrete and not continuous state and action spaces.

References

Abbeel, P., & Ng, A. Y. (2005). Exploration and apprenticeship learning in reinforcement learning. In Proceedings of the 22nd international conference on Machine learning (pp. 1–8). ACM.
Agha-Mohammadi, A.-A., Chakravorty, S., & Amato, N. M. (2014). FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements. The International Journal of Robotics Research, 33(2), 268–304.
Article Google Scholar
Andrew, G., & Gao, J. (2007). Scalable training of L1-regularized log-linear models. In Proceedings of the 24th international conference on Machine learning (pp. 33–40). ACM.
Atanasov, N. A. (2015). Active information acquisition with mobile robots.
Atanasov, N., Ny Le J., Daniilidis, K. & Pappas, G. J. (2014). Information acquisition with sensing robots: Algorithms and error bounds. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 6447–6454). IEEE.
Aumann, R. J., Maschler, M., & Stearns, R. E. (1995). Repeated games with incomplete information. Cambridge: MIT Press.
Google Scholar
Bandyopadhyay, T., Won, K. S., Frazzoli, E., Hsu, D., Lee, W. S., & Rus, D.(2013). Intention-aware motion planning. In Algorithmic foundations of robotics X (pp. 475–491). Springer.
Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N., & Bengio, Y. (2012). Theano: new features and speed improvements. In Deep learning and unsupervised feature learning NIPS 2012 workshop.
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., & Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the python for scientific computing conference (SciPy), Oral Presentation.
Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.
Article MathSciNet MATH Google Scholar
Camacho, E. F., & Alba, C. B. (2013). Model predictive control. Berlin: Springer.
Google Scholar
Chaudhari, P., Karaman, S., Hsu, D., & Frazzoli, E. (2013). Sampling-based algorithms for continuous-time POMDPs. In American control conference (ACC), 2013 (pp. 4604–4610). IEEE.
Dissanayake, M., Newman, P., Clark, S., Durrant-Whyte, H. F., & Csorba, M. (2001). A solution to the simultaneous localization and map building (SLAM) problem. IEEE Transactions on Robotics and Automation, 17(3), 229–241.
Article Google Scholar
Falcone, P., Borrelli, F., Asgari, J., Tseng, H. E., & Hrovat, D. (2007). Predictive active steering control for autonomous vehicle systems. IEEE Transactions on Control Systems Technology, 15(3), 566–580.
Article Google Scholar
Falcone, P., Borrelli, F., Tseng, H. E., Asgari, J., & Hrovat, D. (2007). Integrated braking and steering model predictive control approach in autonomous vehicles. Advances in Automotive Control, 5, 273–278.
Google Scholar
Falcone, P., Tseng, H. E., Borrelli, F., Asgari, J., & Hrovat, D. (2008). MPC-based yaw and lateral stabilisation via active front steering and braking. Vehicle System Dynamics, 46(sup1), 611–628.
Article Google Scholar
Fern, A., Natarajan, S., Judah, K., & Tadepalli, P. (2007). A decision-theoretic model of assistance. In IJCAI.
Fudenberg, D., & Tirole, J. (1991). Game theory (Vol. 393). Cambridge, Massachusetts.
Gray, A., Gao, Y., Hedrick, J. K. & Borrelli, F.(2013). Robust predictive control for semi-autonomous vehicles with an uncertain driver model. In Intelligent vehicles symposium (IV), 2013 IEEE (pp. 208–213). IEEE.
Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. AAAI, 4, 709–715.
Google Scholar
Hedden, T., & Zhang, J. (2002). What do you think i think you think?: Strategic reasoning in matrix games. Cognition, 85(1), 1–36.
Article Google Scholar
Hermes, C., Wohler, C., Schenk, K., & Kummert, F. (2009). Long-term vehicle motion prediction. In 2009 IEEE intelligent vehicles symposium (pp. 652–657).
Javdani, S., Bagnell, J. A., & Srinivasa, S. (2015). Shared autonomy via hindsight optimization. arXiv preprint arXiv:1503.07619.
Javdani, S., Klingensmith, M., Bagnell, J. A., Pollard, N. S., & Srinivasa, S. S.(2013). Efficient touch based localization through submodularity. In 2013 IEEE international conference on robotics and automation (ICRA) (pp. 1828–1835). IEEE.
Kuderer, M., Gulati, S., & Burgard, W. (2015). Learning driving styles for autonomous vehicles from demonstration. In Proceedings of the IEEE international conference on robotics & automation (ICRA), Seattle, USA, Vol. 134.
Lam, C.-P., Yang, A. Y., & Sastry, S. S.(2015). An efficient algorithm for discrete-time hidden mode stochastic hybrid systems. In Control conference (ECC), 2015 European. IEEE.
Leonard, J., How, J., Teller, S., Berger, M., Campbell, S., Fiore, G., et al. (2008). A perception-driven autonomous urban vehicle. Journal of Field Robotics, 25(10), 727–774.
Article Google Scholar
Levine, S, & Koltun, V.(2012). Continuous inverse optimal control with locally optimal examples. arXiv preprint arXiv:1206.4617.
Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Kolter, J. Z., Langer, D., Pink, O., Pratt, V., et al. (2011). Towards fully autonomous driving: Systems and algorithms. In 2011 IEEE Intelligent Vehicles Symposium (IV), pages 163–168.
Luders, B., Kothari, M., & How, J. P.(2010). Chance constrained RRT for probabilistic robustness to environmental uncertainty. In AIAA guidance, navigation, and control conference (GNC), Toronto, Canada.
Ng, A. Y., Russell, S. J., et al.(2000). Algorithms for inverse reinforcement learning. In Proceedings of the 17th international conference on Machine learning, pages 663–670.
Nikolaidis, S., Kuznetsov, A., Hsu, D., & Srinivasa, S. (2016). Formalizing human-robot mutual adaptation via a bounded memory based model. In Human-robot interaction.
Nikolaidis, S., Ramakrishnan, R., Gu, K., & Shah, J. (2015). Efficient model learning from joint-action demonstrations for human-robot collaborative tasks. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction (pp. 189–196). ACM.
Patil, S., Kahn, G., Laskey, M., Schulman, J., Goldberg, K., & Abbeel, P. (2015). Scaling up Gaussian belief space planning through covariance-free trajectory optimization and automatic differentiation. In Algorithmic foundations of robotics XI (pp. 515–533). Springer.
Prentice, S., & Roy, N. (2009). The belief roadmap: Efficient planning in belief space by factoring the covariance. The International Journal of Robotics Research, 28, 1448–1465.
Article Google Scholar
Raman, V., Donzé, A., Sadigh, D., Murray, R. M., & Seshia, S. A. (2015). Reactive synthesis from signal temporal logic specifications. In Proceedings of the 18th international conference on hybrid systems: Computation and control (pp. 239–248). ACM.
Sadigh, D., & Kapoor, A. (2015). Safe control under uncertainty. arXiv preprint arXiv:1510.07313.
Sadigh, D., Sastry, S. A., Seshia, S., & Dragan, A. D. (2016a). Planning for autonomous cars that leverages effects on human actions. In Proceedings of the robotics: Science and systems conference (RSS).
Sadigh, D., Sastry, S. S., Seshia, S. A., & Dragan, A. (2016b). Information gathering actions over human internal state. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 66–73). IEEE.
Sadigh, D., Sastry, S. S., Seshia, S. A., & Dragan, A. (2016c). Planning for autonomous cars that leverages effects on human actions. In Proceedings of the robotics: Science and systems conference (RSS).
Seiler, K. M., Kurniawati, H., & Singh, S. P. (2015). An online and approximate solver for POMDPs with continuous action space. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 2290–2297). IEEE.
Shimosaka, M., Kaneko, T., & Nishi, K. (2014). Modeling risk anticipation and defensive driving on residential roads with inverse reinforcement learning. In 2014 IEEE 17th international conference on intelligent transportation systems (ITSC) (pp. 1694–1700). IEEE.
Trautman, P. (2013). Robot navigation in dense crowds: Statistical models and experimental studies of human robot cooperation. Pasadena: California Institute of Technology.
Google Scholar
Trautman, P., & Krause, A. (2010). Unfreezing the robot: Navigation in dense, interacting crowds. In 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 797–803).
Trautman, P., Ma, J., Murray, R. M., & Krause, A. (2013). Robot navigation in dense human crowds: the case for cooperation. In 2013 IEEE international conference on robotics and automation (ICRA) (pp. 2153–2160). IEEE.
Urmson, C., Anhalt, J., Bagnell, D., Baker, C., Bittner, R., Clark, M., et al. (2008). Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8), 425–466.
Article Google Scholar
Vanchinathan, H. P., Nikolic, I., Bona, F. De., & Krause, A. (2014). Explore-exploit in top-n recommender systems via Gaussian processes. In Proceedings of the 8th ACM conference on recommender systems (pp. 225–232). ACM.
Vasudevan, R., Shia, V., Gao, Y., Cervera-Navarro, R., Bajcsy, R., & Borrelli, F. (2012). Safe semi-autonomous control with enhanced driver modeling. In American control conference (ACC) (pp. 2896–2903). IEEE.
Vitus, M. P. & Tomlin, C. J. (2013). A probabilistic approach to planning and control in autonomous urban driving. In 2013 IEEE 52nd annual conference on decision and control (CDC) (pp. 2459–2464).
Ziebart, B. D. (2010). Modeling purposeful adaptive behavior with the principle of maximum causal entropy.
Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008) Maximum entropy inverse reinforcement learning. In AAAI (pp. 1433–1438).

Download references

Acknowledgements

This work was partially supported by Berkeley DeepDrive, NSF VeHICaL 1545126, NSF Grants CCF-1139138 and CCF-1116993, ONR N00014-09-1-0230, NSF CAREER 1652083, and an NDSEG Fellowship.

Author information

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, USA
Dorsa Sadigh, Nick Landolfi, Shankar S. Sastry, Sanjit A. Seshia & Anca D. Dragan

Authors

Dorsa Sadigh
View author publications
You can also search for this author in PubMed Google Scholar
Nick Landolfi
View author publications
You can also search for this author in PubMed Google Scholar
Shankar S. Sastry
View author publications
You can also search for this author in PubMed Google Scholar
Sanjit A. Seshia
View author publications
You can also search for this author in PubMed Google Scholar
Anca D. Dragan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dorsa Sadigh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

This paper combines work from Sadigh et al. (2016b, c). It adds a general formulation of the problem as a game, discusses its limitations, and lays out the assumptions we make to reduce it to a tractable problem. On the experimental side, it adds an analysis of the adaptivity of the behaviors produced to initial conditions for both offline and active estimation, an analysis of the benefits of active estimation on the robot’s actual reward, and results on actively estimating user intentions as opposed to just driving style.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sadigh, D., Landolfi, N., Sastry, S.S. et al. Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Auton Robot 42, 1405–1426 (2018). https://doi.org/10.1007/s10514-018-9746-1

Download citation

Received: 16 February 2017
Accepted: 02 April 2018
Published: 04 May 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10514-018-9746-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state

Abstract

Access this article

Similar content being viewed by others

Guiding Autonomous Vehicles Past Obstacles – Theory and Practice

Safe and Efficient Autonomous Navigation in the Presence of Humans at Control Level

MPDM: Multi-policy Decision-Making from Autonomous Driving to Social Robot Navigation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation