Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

Choi, Seungwon; Kim, Suseong; Jin Kim, H.

doi:10.1007/s12555-015-0483-3

Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

Regular Papers
Robot and Applications
Published: 10 July 2017

Volume 15, pages 1826–1834, (2017)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Seungwon Choi¹,
Suseong Kim¹ &
H. Jin Kim¹

1328 Accesses
30 Citations
Explore all metrics

Abstract

The main purpose of this paper is to learn the control performance of an expert by imitating the demonstrations of a multirotor UAV (unmanned aerial vehicle) operated by an expert pilot. First, we collect a set of several demonstrations by an expert for a certain task which we want to learn. We extract a representative trajectory from the dataset. Here, the representative trajectory includes a sequence of state and input. The trajectory is obtained using hidden Markov model (HMM) and dynamic time warping (DTW). In the next step, the multirotor learns to track the trajectory for imitation. Although we have data of feed-forward input for each time sequence, using this input directly can deteriorate the stability of the multirotor due to insufficient data for generalization and numerical issues. For that reason, a controller is needed which generates the input command for the suitable flight maneuver. To design such a controller, we learn the hidden reward function of a quadratic form from the demonstrated flights using inverse reinforcement learning. After we find the optimal reward function that minimizes the trajectory tracking error, we design a reinforcement learning based controller using this reward function. The simulation and experiment applied to a multirotor UAV show successful imitation results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning for UAV Autonomous Tracking Random Moving Target

Reinforced Learning-Based Robust Control Design for Unmanned Aerial Vehicle

Article 24 March 2022

An Adaptive Dynamic Controller for Quadrotor to Perform Trajectory Tracking Tasks

Article 12 March 2018

References

D. Lee, H. Jin Kim, and S. Sastry, “Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter,” International Journal of Control, Automation and Systems, vol. 7, no. 3, pp. 419–428, 2009. [click]
Article Google Scholar
A. P. Schoellig, F. L. Mueller, and R. D’Andrea, “Optimization-based iterative learning for precise quadrocopter trajectory tracking,” Autonomous Robots, vol. 33, no. 1-2, pp. 103–127, 2012. [click]
Article Google Scholar
A. P. Schoellig, C. Wiltsche, and R. D’Andrea, “Feedforward parameter identification for precise periodic quadrocopter motions” Proc. of American Control Conference (ACC), pp. 4313–4318 2012.
Google Scholar
D. Mellinger, N. Michael, and V. Kumar, “Trajectory generation and control for precise aggressive maneuvers with quadrotors,” The International Journal of Robotics Research, vol. 31, no. 5, pp. 664–674, 2012. [click]
Article Google Scholar
S. Lupashin, A. Schollig, M. Sherback, and R. D’Andrea, “A simple learning strategy for high-speed quadrocopter multi-flips” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1642–1648 2010. [click]
Google Scholar
M. Hammer, M. Waibel, and R. D’Andrea, “Knowledge transfer for high-performance quadrocopter maneuvers” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1714–1719 2013. [click]
Google Scholar
T. Tomic, M. Maier, and S. Haddadin, “Learning quadrotor maneuvers from optimal control and generalizing in realtime” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1747–1754 2014. [click]
Google Scholar
M. Deisenroth and C. E. Rasmussen, “PILCO: A modelbased and data-efficient approach to policy search” Proc. of the 28th International Conference on Machine Learning (ICML), pp. 465–472 2011.
Google Scholar
S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics” Advances in Neural Information Processing Systems (NIPS), pp. 1071–1079 2014.
Google Scholar
X. Bu, Z. Hou, and F. Yu, “Stability of first and high order iterative learning control with data dropouts,” International Journal of Control, Automation and Systems, vol. 9, no. 5, pp. 843–849, 2011. [click]
Article Google Scholar
X. Bu, Z. Hou, S. Jin, and R. Chi, “An iterative learning control design approach for networked control systems with data dropouts,” International Journal of Robust and Nonlinear Control, vol. 26, no. 1, pp. 91–109, 2016. [click]
Article MathSciNet MATH Google Scholar
B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, 2009. [click]
Article Google Scholar
P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.
Article Google Scholar
S. Calinon, F. Guenter, and A. Billard, “On learning, representing, and generalizing a task in a humanoid robot,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 286–298, 2007. [click]
Article Google Scholar
D. Korkinof and Y. Demiris, “Online quantum mixture regression for trajectory learning by demonstration” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3222–3229 2013.
Google Scholar
W. Yang and N. Y. Chong, “Imitation learning of humanoid locomotion using the direction of landing foot,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 585–597, 2009. [click]
Article Google Scholar
J. D. Sweeney and R. Grupen, “A model of shared grasp affordances from demonstration” Proc. of 7th IEEE-RAS International Conference on Humanoid Robots, pp. 27–35 2007.
Google Scholar
B. Browning, L. Xu, and M. Veloso, “Skill acquisition and use for a dynamically-balancing soccer robot” The Association for the Advancement of Artificial Intelligence (AAAI), pp. 599–604 2004.
Google Scholar
C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” The International Conference on Machine Learning (ICML), vol. 97, pp. 12–20, 1997.
Google Scholar
A. K. Tanwani and A. Billard, “Transfer in inverse reinforcement learning for multiple strategies” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3244–3250 2013.
Google Scholar
M. S. Malekzadeh, D. Bruno, S. Calinon, T. Nanayakkara, and D. G. Caldwell, “Skills transfer across dissimilar robots by learning context-dependent rewards” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1746–1751 2013. [click]
Google Scholar
J. Z. Kolter, P. Abbeel, and A. Y. Ng, “Hierarchical apprenticeship learning with application to quadruped locomotion” Advances in Neural Information Processing Systems (NIPS), pp. 769–776 2007.
Google Scholar
P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proc. of the 21st International Conference on Machine Learning (ICML), pp. 1, 2004.
Google Scholar
M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, “Learning objective functions for manipulation” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1331–1336 2013. [click]
Google Scholar
A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning” Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 182–189 2011.
Google Scholar
N. Aghasadeghi and T. Bretl, “Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1561–1566 2011. [click]
Google Scholar
M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum Entropy Deep Inverse Reinforcement Learning,” arXiv:1507.04888, 2015.
J. Kennedy, “Particle swarm optimization,” Encyclopedia of Machine Learning, pp. 760–766, Springer US, 2010.
Google Scholar
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, 2012. [click]
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical and Aerospace engineering, Seoul National University, Seoul, Korea
Seungwon Choi, Suseong Kim & H. Jin Kim

Authors

Seungwon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Suseong Kim
View author publications
You can also search for this author in PubMed Google Scholar
H. Jin Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. Jin Kim.

Additional information

Recommended by Associate Editor Xiaojie Su under the direction of Editor Jessie (Ju H.) Park. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (MSIP) (No. 2014034854 / 2014M1A3A3A02034854).

Seungwon Choi received the B.S. degree in aerospace engineering from the Korean Advanced Institute of Science and Technology, Daejeon, Korea, in 2012, and the M.S. degree in mechanical and aerospace engineering from Seoul National University, Seoul, Korea. He is currently working toward a Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea. His research interests include learning-based control and planning of robotic systems.

Suseong Kim received the B.S. degree in mechanical engineering from Yonsei University, Seoul, Korea, in 2010. He is currently working toward a Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea. His current research interests include vision-based guidance for mobile robots and nonlinear control of unmanned aerial vehicles.

H. Jin Kim received the B.S. degree in mechanical engineering from the Korean Advanced Institute of Technology, Daejeon, Korea, in 1995, and the M.S. and Ph.D. degrees from the University of California, Berkeley, CA, USA, in 1999 and 2001, respectively. From 2002 to 2004, she was a Postdoctoral Researcher in electrical engineering and computer science at the University of California, Berkeley. In 2004, she joined the School of Mechanical and Aerospace Engineering, Seoul National University, where she is currently a Professor. Her research interests include intelligent control of robotic systems and motion planning.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, S., Kim, S. & Jin Kim, H. Inverse reinforcement learning control for trajectory tracking of a multirotor UAV. Int. J. Control Autom. Syst. 15, 1826–1834 (2017). https://doi.org/10.1007/s12555-015-0483-3

Download citation

Received: 28 December 2015
Revised: 04 September 2016
Accepted: 11 October 2016
Published: 10 July 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s12555-015-0483-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning for UAV Autonomous Tracking Random Moving Target

Reinforced Learning-Based Robust Control Design for Unmanned Aerial Vehicle

An Adaptive Dynamic Controller for Quadrotor to Perform Trajectory Tracking Tasks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning for UAV Autonomous Tracking Random Moving Target

Reinforced Learning-Based Robust Control Design for Unmanned Aerial Vehicle

An Adaptive Dynamic Controller for Quadrotor to Perform Trajectory Tracking Tasks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation