Reinforcement learning in feedback control

Hafner, Roland; Riedmiller, Martin

doi:10.1007/s10994-011-5235-x

Reinforcement learning in feedback control

Challenges and benchmarks from technical process control

Published: 27 February 2011

Volume 84, pages 137–169, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Reinforcement learning in feedback control

Download PDF

Roland Hafner¹ &
Martin Riedmiller¹

10k Accesses
123 Citations
6 Altmetric
Explore all metrics

Abstract

Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches—in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch.

This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article.

A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.

References

Anderson, C., & Miller, W. (1990). Challenging control problems. In Neural networks for control (pp. 475–410).
Google Scholar
Anderson, C. W., Hittle, D., Katz, A., & Kretchmar, R. M. (1997). Synthesis of reinforcement learning, neural networks, and pi control applied to a simulated heating coil. Journal of Artificial Intelligence in Engineering, 11(4), 423–431.
Google Scholar
Bellman, R. (1957). Dynamic programming. Princeton: Princeton Univ Press.
MATH Google Scholar
Boyan, J., & Littman, M. (1994). Packet routing in dynamically changing networks—a reinforcement learning approach. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6.
Google Scholar
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In: Andvances in neural information processing systems 8.
Google Scholar
CTM (1996). Digital Control Tutorial. University of Michigan, www.engin.umich.edu/group/ctm (online).
Deisenroth, M., Rasmussen, C., & Peters, J. (2009). Gaussian process dynamic programming. Neurocomputing, 72(7–9), 1508–1524.
Article Google Scholar
Dullerud, G. P. F. (2000). A course in robust control theory: A convex approach. New York: Springer.
Google Scholar
El-Fakdi, A., & Carreras, M. (2008). Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In International conference on intelligent robots and systems, 2008. IROS 2008. IEEE/RSJ (pp. 3635–3640).
Chapter Google Scholar
Farrel, J. A., & Polycarpou, M. M. (2006). Adaptive approximation based control. New York: Wiley Interscience.
Book Google Scholar
Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intellifent Computing, 24(4).
Goodwin, G. C., & Payne, R. L. (1977). Dynamic system identification: experiment design and data analysis. New York: Academic Press.
MATH Google Scholar
Hafner, R. (2009). Dateneffiziente selbstlernende neuronale Regler. PhD thesis, University of Osnabrueck.
Hafner, R., & Riedmiller, M. (2007). Neural reinforcement learning controllers for a real robot application. In Proceedings of the IEEE international conference on robotics and automation (ICRA 07), Rome, Italy.
Google Scholar
Jordan, M. I., & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In D. Touretzky (Ed.), Advances in neural information processing systems (NIPS) 2 (pp. 324–331). San Mateo: Morgan Kaufmann.
Google Scholar
Kaloust, J., Ham, C., & Qu, Z. (1997). Nonlinear autopilot control design for a 2-dof helicopter model. IEE Proceedings. Control Theory and Applications, 144(6), 612–616.
Article MATH Google Scholar
Kretchmar, R. M. (2000). A synthesis of reinforcement learning and robust control theory. PhD thesis, Colorado State University, Fort Collins, CO.
Krishnakumar, K., & Gundy-burlet, K. (2001). Intelligent control approaches for aircraft applications (Technical report). National Aeronautics and Space Administration, Ames Research.
Kwan, C., Lewis, F., & Kim, Y. (1999). Robust neural network control of rigid link flexible-joint robots. Asian Journal of Control, 1(3), 188–197.
Article Google Scholar
Liu, D., Javaherian, H., Kovalenko, O., & Huang, T. (2008). Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man and Cybernetics. Part B. Cybernetics, 38(4), 988–993.
Article Google Scholar
Ljung, L. (1999). System identification theory for the user (2nd ed.). Upper Saddle River: PTR Prentice Hall.
Google Scholar
Martinez, J. J., Sename, O., & Voda, A. (2009). Modeling and robust control of blu-ray disc servo-mechanisms. Mechatronics, 19(5), 715–725.
Article Google Scholar
Nelles, O. (2001). Nonlinear system identification. Berlin: Springer.
MATH Google Scholar
Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. In International symposium on experimental robotics.
Google Scholar
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE international conference on intelligent robotics systems (Iros 2006).
Google Scholar
Prokhorov, D., & Wunsch, D. (1997). Adaptive critic designs. IEEE Transactions on Neural Networks, 8, 997–1007.
Article Google Scholar
Riedmiller, M. (2005). Neural fitted q iteration—first experiences with a data efficient neural reinforcement learning method. In Proc. of the European conference on machine learning, ECML 2005, Porto, Portugal.
Google Scholar
Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini (Ed.), Proceedings of the IEEE international conference on neural networks (ICNN), San Francisco (pp. 586–591).
Chapter Google Scholar
Riedmiller, M., Hafner, R., Lange, S., & Timmer, S. (2006). Clsquare—software framework for closed loop control. Available at http://ml.informatik.uni-freiburg.de/research/clsquare.
Riedmiller, M., Montemerlo, M., & Dahlkamp, H. (2007a). Learning to drive in 20 minutes. In Proceedings of the FBIT 2007 conference, Jeju, Korea. Berlin: Springer. Best Paper Award.
Google Scholar
Riedmiller, M., Peters, J., & Schaal, S. (2007b). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In Proceedings of the IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL 07), Honolulu, USA.
Google Scholar
Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots, 27(1), 55–74.
Article Google Scholar
Schiffmann, W., Joost, M., & Werner, R. (1993). Comparison of optimized backpropagation algorithms. In Proc. of ESANN’93, Brussels (pp. 97–104).
Google Scholar
Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Deylon, B., Glorennec, Y. P., Hjalmarsson, H., & Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31, 1691–1724.
Article MATH Google Scholar
Slotine, J. E., & Li, W. (1991). Applied nonlinear control. New York: Prentice Hall.
MATH Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (adaptive computation and machine learning). Cambridge: MIT Press.
Google Scholar
Szepesvari, C. (2009). Successful application of rl. Available at http://www.ualberta.ca/szepesva/RESEARCH/RLApplications.html.
Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133–2136.
Google Scholar
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
MATH Google Scholar
Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., & White, S. R. (2004). A multi-agent systems approach to autonomic computing. In AAMAS ’04: Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 464–471). Washington: IEEE Computer Society.
Google Scholar
Underwood, D. M., & Crawford, R. R. (1991). Dynamic nonlinear modeling of a hot-water-to-air heat exchanger for control applications. ASHRAE Transactions, 97(1), 149–155.
Google Scholar
Wang, Y., & Si, J. (2001). On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks, 12(2), 264–276.
Article MathSciNet Google Scholar
Watkins, C. J. (1989). Learning from delayed rewards. PhD thesis, Cambridge University.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292.
MATH Google Scholar
Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.
Google Scholar
Yang, Z.-J., & Minashima, M. (2001). Robust nonlinear control of a feedback linearizable voltage-controlled magnetic levitation system. Transactions of the Institute of Electrical Engeneers of Japan, 1203–1211.
Yang, Z.-J., & Tateishi, M. (2001). Adaptive robust nonlinear control of a magnetic levitation system. Automatica, 37(7), 1125–1131.
Article MATH Google Scholar
Yang, Z.-J., Tsubakihara, H., Kanae, S., & Wada, K. (2007). Robust nonlinear control of a voltage-controlled magnetic levitation system using disturbance observer. Transactions of IEE of Japan, 127-C(12), 2118–2125.
Google Scholar
Yang, Z.-J., Kunitoshi, K., Kanae, S., & Wada, K. (2008). Adaptive robust output feedback control of a magnetic levitation system by k-filter approach. IEEE Transactions on Industrial Electronics, 55(1), 390–399.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Machine Learning Lab, Albert-Ludwigs University Freiburg, Freiburg im Breisgau, Germany
Roland Hafner & Martin Riedmiller

Authors

Roland Hafner
View author publications
You can also search for this author in PubMed Google Scholar
Martin Riedmiller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roland Hafner.

Additional information

Editors: S. Whiteson and M. Littman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hafner, R., Riedmiller, M. Reinforcement learning in feedback control. Mach Learn 84, 137–169 (2011). https://doi.org/10.1007/s10994-011-5235-x

Download citation

Received: 26 February 2010
Revised: 03 January 2011
Accepted: 08 January 2011
Published: 27 February 2011
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10994-011-5235-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reinforcement learning in feedback control

Abstract

Article PDF

Similar content being viewed by others

A review of PID control, tuning methods and applications

Review on model predictive control: an engineering perspective

Human-in-the-loop machine learning: a state of the art

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement learning in feedback control

Abstract

Article PDF

Similar content being viewed by others

A review of PID control, tuning methods and applications

Review on model predictive control: an engineering perspective

Human-in-the-loop machine learning: a state of the art

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation