Abstract
The main purpose of this paper is to learn the control performance of an expert by imitating the demonstrations of a multirotor UAV (unmanned aerial vehicle) operated by an expert pilot. First, we collect a set of several demonstrations by an expert for a certain task which we want to learn. We extract a representative trajectory from the dataset. Here, the representative trajectory includes a sequence of state and input. The trajectory is obtained using hidden Markov model (HMM) and dynamic time warping (DTW). In the next step, the multirotor learns to track the trajectory for imitation. Although we have data of feed-forward input for each time sequence, using this input directly can deteriorate the stability of the multirotor due to insufficient data for generalization and numerical issues. For that reason, a controller is needed which generates the input command for the suitable flight maneuver. To design such a controller, we learn the hidden reward function of a quadratic form from the demonstrated flights using inverse reinforcement learning. After we find the optimal reward function that minimizes the trajectory tracking error, we design a reinforcement learning based controller using this reward function. The simulation and experiment applied to a multirotor UAV show successful imitation results.
Similar content being viewed by others
References
D. Lee, H. Jin Kim, and S. Sastry, “Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter,” International Journal of Control, Automation and Systems, vol. 7, no. 3, pp. 419–428, 2009. [click]
A. P. Schoellig, F. L. Mueller, and R. D’Andrea, “Optimization-based iterative learning for precise quadrocopter trajectory tracking,” Autonomous Robots, vol. 33, no. 1-2, pp. 103–127, 2012. [click]
A. P. Schoellig, C. Wiltsche, and R. D’Andrea, “Feedforward parameter identification for precise periodic quadrocopter motions” Proc. of American Control Conference (ACC), pp. 4313–4318 2012.
D. Mellinger, N. Michael, and V. Kumar, “Trajectory generation and control for precise aggressive maneuvers with quadrotors,” The International Journal of Robotics Research, vol. 31, no. 5, pp. 664–674, 2012. [click]
S. Lupashin, A. Schollig, M. Sherback, and R. D’Andrea, “A simple learning strategy for high-speed quadrocopter multi-flips” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1642–1648 2010. [click]
M. Hammer, M. Waibel, and R. D’Andrea, “Knowledge transfer for high-performance quadrocopter maneuvers” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1714–1719 2013. [click]
T. Tomic, M. Maier, and S. Haddadin, “Learning quadrotor maneuvers from optimal control and generalizing in realtime” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1747–1754 2014. [click]
M. Deisenroth and C. E. Rasmussen, “PILCO: A modelbased and data-efficient approach to policy search” Proc. of the 28th International Conference on Machine Learning (ICML), pp. 465–472 2011.
S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics” Advances in Neural Information Processing Systems (NIPS), pp. 1071–1079 2014.
X. Bu, Z. Hou, and F. Yu, “Stability of first and high order iterative learning control with data dropouts,” International Journal of Control, Automation and Systems, vol. 9, no. 5, pp. 843–849, 2011. [click]
X. Bu, Z. Hou, S. Jin, and R. Chi, “An iterative learning control design approach for networked control systems with data dropouts,” International Journal of Robust and Nonlinear Control, vol. 26, no. 1, pp. 91–109, 2016. [click]
B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, 2009. [click]
P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.
S. Calinon, F. Guenter, and A. Billard, “On learning, representing, and generalizing a task in a humanoid robot,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 286–298, 2007. [click]
D. Korkinof and Y. Demiris, “Online quantum mixture regression for trajectory learning by demonstration” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3222–3229 2013.
W. Yang and N. Y. Chong, “Imitation learning of humanoid locomotion using the direction of landing foot,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 585–597, 2009. [click]
J. D. Sweeney and R. Grupen, “A model of shared grasp affordances from demonstration” Proc. of 7th IEEE-RAS International Conference on Humanoid Robots, pp. 27–35 2007.
B. Browning, L. Xu, and M. Veloso, “Skill acquisition and use for a dynamically-balancing soccer robot” The Association for the Advancement of Artificial Intelligence (AAAI), pp. 599–604 2004.
C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” The International Conference on Machine Learning (ICML), vol. 97, pp. 12–20, 1997.
A. K. Tanwani and A. Billard, “Transfer in inverse reinforcement learning for multiple strategies” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3244–3250 2013.
M. S. Malekzadeh, D. Bruno, S. Calinon, T. Nanayakkara, and D. G. Caldwell, “Skills transfer across dissimilar robots by learning context-dependent rewards” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1746–1751 2013. [click]
J. Z. Kolter, P. Abbeel, and A. Y. Ng, “Hierarchical apprenticeship learning with application to quadruped locomotion” Advances in Neural Information Processing Systems (NIPS), pp. 769–776 2007.
P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proc. of the 21st International Conference on Machine Learning (ICML), pp. 1, 2004.
M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, “Learning objective functions for manipulation” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1331–1336 2013. [click]
A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning” Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 182–189 2011.
N. Aghasadeghi and T. Bretl, “Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1561–1566 2011. [click]
M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum Entropy Deep Inverse Reinforcement Learning,” arXiv:1507.04888, 2015.
J. Kennedy, “Particle swarm optimization,” Encyclopedia of Machine Learning, pp. 760–766, Springer US, 2010.
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, 2012. [click]
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Xiaojie Su under the direction of Editor Jessie (Ju H.) Park. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (MSIP) (No. 2014034854 / 2014M1A3A3A02034854).
Seungwon Choi received the B.S. degree in aerospace engineering from the Korean Advanced Institute of Science and Technology, Daejeon, Korea, in 2012, and the M.S. degree in mechanical and aerospace engineering from Seoul National University, Seoul, Korea. He is currently working toward a Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea. His research interests include learning-based control and planning of robotic systems.
Suseong Kim received the B.S. degree in mechanical engineering from Yonsei University, Seoul, Korea, in 2010. He is currently working toward a Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea. His current research interests include vision-based guidance for mobile robots and nonlinear control of unmanned aerial vehicles.
H. Jin Kim received the B.S. degree in mechanical engineering from the Korean Advanced Institute of Technology, Daejeon, Korea, in 1995, and the M.S. and Ph.D. degrees from the University of California, Berkeley, CA, USA, in 1999 and 2001, respectively. From 2002 to 2004, she was a Postdoctoral Researcher in electrical engineering and computer science at the University of California, Berkeley. In 2004, she joined the School of Mechanical and Aerospace Engineering, Seoul National University, where she is currently a Professor. Her research interests include intelligent control of robotic systems and motion planning.
Rights and permissions
About this article
Cite this article
Choi, S., Kim, S. & Jin Kim, H. Inverse reinforcement learning control for trajectory tracking of a multirotor UAV. Int. J. Control Autom. Syst. 15, 1826–1834 (2017). https://doi.org/10.1007/s12555-015-0483-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-015-0483-3