Skip to main content
Log in

Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

  • Regular Papers
  • Robot and Applications
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

The main purpose of this paper is to learn the control performance of an expert by imitating the demonstrations of a multirotor UAV (unmanned aerial vehicle) operated by an expert pilot. First, we collect a set of several demonstrations by an expert for a certain task which we want to learn. We extract a representative trajectory from the dataset. Here, the representative trajectory includes a sequence of state and input. The trajectory is obtained using hidden Markov model (HMM) and dynamic time warping (DTW). In the next step, the multirotor learns to track the trajectory for imitation. Although we have data of feed-forward input for each time sequence, using this input directly can deteriorate the stability of the multirotor due to insufficient data for generalization and numerical issues. For that reason, a controller is needed which generates the input command for the suitable flight maneuver. To design such a controller, we learn the hidden reward function of a quadratic form from the demonstrated flights using inverse reinforcement learning. After we find the optimal reward function that minimizes the trajectory tracking error, we design a reinforcement learning based controller using this reward function. The simulation and experiment applied to a multirotor UAV show successful imitation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Lee, H. Jin Kim, and S. Sastry, “Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter,” International Journal of Control, Automation and Systems, vol. 7, no. 3, pp. 419–428, 2009. [click]

    Article  Google Scholar 

  2. A. P. Schoellig, F. L. Mueller, and R. D’Andrea, “Optimization-based iterative learning for precise quadrocopter trajectory tracking,” Autonomous Robots, vol. 33, no. 1-2, pp. 103–127, 2012. [click]

    Article  Google Scholar 

  3. A. P. Schoellig, C. Wiltsche, and R. D’Andrea, “Feedforward parameter identification for precise periodic quadrocopter motions” Proc. of American Control Conference (ACC), pp. 4313–4318 2012.

    Google Scholar 

  4. D. Mellinger, N. Michael, and V. Kumar, “Trajectory generation and control for precise aggressive maneuvers with quadrotors,” The International Journal of Robotics Research, vol. 31, no. 5, pp. 664–674, 2012. [click]

    Article  Google Scholar 

  5. S. Lupashin, A. Schollig, M. Sherback, and R. D’Andrea, “A simple learning strategy for high-speed quadrocopter multi-flips” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1642–1648 2010. [click]

    Google Scholar 

  6. M. Hammer, M. Waibel, and R. D’Andrea, “Knowledge transfer for high-performance quadrocopter maneuvers” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1714–1719 2013. [click]

    Google Scholar 

  7. T. Tomic, M. Maier, and S. Haddadin, “Learning quadrotor maneuvers from optimal control and generalizing in realtime” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1747–1754 2014. [click]

    Google Scholar 

  8. M. Deisenroth and C. E. Rasmussen, “PILCO: A modelbased and data-efficient approach to policy search” Proc. of the 28th International Conference on Machine Learning (ICML), pp. 465–472 2011.

    Google Scholar 

  9. S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics” Advances in Neural Information Processing Systems (NIPS), pp. 1071–1079 2014.

    Google Scholar 

  10. X. Bu, Z. Hou, and F. Yu, “Stability of first and high order iterative learning control with data dropouts,” International Journal of Control, Automation and Systems, vol. 9, no. 5, pp. 843–849, 2011. [click]

    Article  Google Scholar 

  11. X. Bu, Z. Hou, S. Jin, and R. Chi, “An iterative learning control design approach for networked control systems with data dropouts,” International Journal of Robust and Nonlinear Control, vol. 26, no. 1, pp. 91–109, 2016. [click]

    Article  MathSciNet  MATH  Google Scholar 

  12. B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, 2009. [click]

    Article  Google Scholar 

  13. P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.

    Article  Google Scholar 

  14. S. Calinon, F. Guenter, and A. Billard, “On learning, representing, and generalizing a task in a humanoid robot,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 286–298, 2007. [click]

    Article  Google Scholar 

  15. D. Korkinof and Y. Demiris, “Online quantum mixture regression for trajectory learning by demonstration” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3222–3229 2013.

    Google Scholar 

  16. W. Yang and N. Y. Chong, “Imitation learning of humanoid locomotion using the direction of landing foot,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 585–597, 2009. [click]

    Article  Google Scholar 

  17. J. D. Sweeney and R. Grupen, “A model of shared grasp affordances from demonstration” Proc. of 7th IEEE-RAS International Conference on Humanoid Robots, pp. 27–35 2007.

    Google Scholar 

  18. B. Browning, L. Xu, and M. Veloso, “Skill acquisition and use for a dynamically-balancing soccer robot” The Association for the Advancement of Artificial Intelligence (AAAI), pp. 599–604 2004.

    Google Scholar 

  19. C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” The International Conference on Machine Learning (ICML), vol. 97, pp. 12–20, 1997.

    Google Scholar 

  20. A. K. Tanwani and A. Billard, “Transfer in inverse reinforcement learning for multiple strategies” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3244–3250 2013.

    Google Scholar 

  21. M. S. Malekzadeh, D. Bruno, S. Calinon, T. Nanayakkara, and D. G. Caldwell, “Skills transfer across dissimilar robots by learning context-dependent rewards” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1746–1751 2013. [click]

    Google Scholar 

  22. J. Z. Kolter, P. Abbeel, and A. Y. Ng, “Hierarchical apprenticeship learning with application to quadruped locomotion” Advances in Neural Information Processing Systems (NIPS), pp. 769–776 2007.

    Google Scholar 

  23. P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proc. of the 21st International Conference on Machine Learning (ICML), pp. 1, 2004.

    Google Scholar 

  24. M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, “Learning objective functions for manipulation” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1331–1336 2013. [click]

    Google Scholar 

  25. A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning” Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 182–189 2011.

    Google Scholar 

  26. N. Aghasadeghi and T. Bretl, “Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1561–1566 2011. [click]

    Google Scholar 

  27. M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum Entropy Deep Inverse Reinforcement Learning,” arXiv:1507.04888, 2015.

  28. J. Kennedy, “Particle swarm optimization,” Encyclopedia of Machine Learning, pp. 760–766, Springer US, 2010.

    Google Scholar 

  29. F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, 2012. [click]

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Jin Kim.

Additional information

Recommended by Associate Editor Xiaojie Su under the direction of Editor Jessie (Ju H.) Park. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (MSIP) (No. 2014034854 / 2014M1A3A3A02034854).

Seungwon Choi received the B.S. degree in aerospace engineering from the Korean Advanced Institute of Science and Technology, Daejeon, Korea, in 2012, and the M.S. degree in mechanical and aerospace engineering from Seoul National University, Seoul, Korea. He is currently working toward a Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea. His research interests include learning-based control and planning of robotic systems.

Suseong Kim received the B.S. degree in mechanical engineering from Yonsei University, Seoul, Korea, in 2010. He is currently working toward a Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea. His current research interests include vision-based guidance for mobile robots and nonlinear control of unmanned aerial vehicles.

H. Jin Kim received the B.S. degree in mechanical engineering from the Korean Advanced Institute of Technology, Daejeon, Korea, in 1995, and the M.S. and Ph.D. degrees from the University of California, Berkeley, CA, USA, in 1999 and 2001, respectively. From 2002 to 2004, she was a Postdoctoral Researcher in electrical engineering and computer science at the University of California, Berkeley. In 2004, she joined the School of Mechanical and Aerospace Engineering, Seoul National University, where she is currently a Professor. Her research interests include intelligent control of robotic systems and motion planning.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, S., Kim, S. & Jin Kim, H. Inverse reinforcement learning control for trajectory tracking of a multirotor UAV. Int. J. Control Autom. Syst. 15, 1826–1834 (2017). https://doi.org/10.1007/s12555-015-0483-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-015-0483-3

Keywords

Navigation