Learning Complex Behaviors via Sequential Composition and Passivity-Based Control

Lopes, Gabriel A. D.; Najafi, Esmaeil; Nageshrao, Subramanya P.; Babuška, Robert

doi:10.1007/978-3-319-26327-4_3

Gabriel A. D. Lopes⁴,
Esmaeil Najafi⁴,
Subramanya P. Nageshrao⁴ &
…
Robert Babuška⁴

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 42))

1076 Accesses
14 Citations

Abstract

The model-free paradigm of Reinforcement learning (RL) is a theoretical strength. However in practice, the stringent assumptions required for optimal solutions (full state space exploration) and experimental issues, such as slow learning rates, render model-free RL a practical weakness. This paper addresses practical implementations of RL by interfacing elements of systems and control and robotics. In our approach space is handled by Sequential Composition (a technique commonly used in robotics) and time is handled by the use of passivity-based control methods (a standard nonlinear control approach) towards speeding up learning and providing a stopping time criteria. Sequential composition in effect partitions the state space and allows for the composition of controllers, each having different domains of attraction (DoA) and goal sets. This results in learning taking place in subsets of the state space. Passivity-based control (PBC) is a model-based control approach where total energy is computable. This total energy can be used as a candidate Lyapunov function to evaluate the stability of a controller and find estimates of its DoA. This enables learning in finite time: while learning the candidate Lyapunov function is monitored online to approximate the DoA of the learned controller. Once this DoA covers relevant states, from the point of view of sequential composition, the learning process is stopped. The result of this process is a collection of learned controllers that cover a desired range of the state space, and can be composed in sequence to achieve various desired goals. Optimality is lost in favour of practicality. Other implications include safety while learning and incremental learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our definition of practicality is: the algorithm runs in finite-time with a finite-memory, and results in guaranteed stability/convergence assuming a known model of the system.

References

Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M (2009) Natural actor–critic algorithms. Automatica 45(11):2471–2482
Google Scholar
Burridge RR, Rizzi AA, Koditschek DE (1999) Sequential composition of dynamically dexterous robot behaviors. Int J Robot Res 18(6):534–555
Article Google Scholar
Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press
Google Scholar
Chesi G (2004) Estimating the domain of attraction for uncertain polynomial systems. Automatica 40(11):1981–1986
Article MathSciNet MATH Google Scholar
Chesi G (2011) Domain of attraction: analysis and control via SOS programming. Springer
Google Scholar
Chiang HD, Hirsch MW, Wu FF (1988) Stability regions of nonlinear autonomous dynamical systems. IEEE Trans Autom Control 33(1):16–27
Article MathSciNet MATH Google Scholar
Conner DC, Choset H, Rizzi AA (2009) Flow-through policies for hybrid controller synthesis applied to fully actuated systems. IEEE Trans Robot 25(1):136–146
Article Google Scholar
Fujimoto K, Sugie T (2001) Canonical transformation and stabilization of generalized hamiltonian systems. Syst Control Lett 42(3):217–227
Article MathSciNet Google Scholar
Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor–critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42(6):1291–1307
Google Scholar
Hachicho O (2007) A novel LMI-based optimization algorithm for the guaranteed estimation of the domain of attraction using rational Lyapunov functions. J Frankl Inst 344(5):535–552
Article MathSciNet MATH Google Scholar
Henrion D, Korda M (2013) Convex computation of the region of attraction of polynomial control systems. In: Proceedings of the european control conference, pp 676–681
Google Scholar
Khalil HK (2002) Nonlinear systems, vol 3. Prentice hall
Google Scholar
Konda VR, Tsitsiklis JN (2003) On actor–critic algorithms. SIAM j Control Optim 42(4):1143–1166
Google Scholar
Konidaris G, Barreto AS (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems. pp 1015–1023
Google Scholar
Le Ny J, Pappas GJ (2012) Sequential composition of robust controller specifications. In: Proceedings of the IEEE international conference on robotics and automation. pp 5190–5195
Google Scholar
Lindemann SR, LaValle SM (2009) Simple and efficient algorithms for computing smooth, collision-free feedback laws over given cell decompositions. Int J Robot Res 28(5):600–621
Article Google Scholar
Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less time. Mach Learn 13(1):103–130
Google Scholar
Najafi E, Lopes GA, Babuska R (2013) Reinforcement learning for sequential composition control. In: Proceedings of the IEEE international conference on decision and control. pp 7265–7270
Google Scholar
Najafi E, Lopes GA, Babuska R (2014a) Balancing a legged robot using state-dependent Riccati equation control. In: Proceedings of the 19th IFAC world congress, vol 19. pp 2177–2182
Google Scholar
Najafi E, Lopes GA, Nageshrao SP, Babuska R (2014b) Rapid learning in sequential composition control. In: Proceedings of the IEEE international conference on decision and control
Google Scholar
Ortega R (1998) Passivity-based control of Euler-Lagrange systems: mechanical, electrical and electromechanical applications. Springer
Google Scholar
Ortega R, Garcia-Canseco E (2004) Interconnection and damping assignment passivity-based control: a survey. Eur J Control 10(5):432–450
Article MathSciNet MATH Google Scholar
Ortega R, Van der Schaft AJ, Mareels I, Maschke B (2001) Putting energy back in control. IEEE Control Syst 21(2):18–33
Article Google Scholar
Ortega R, Van Der Schaft A, Maschke B, Escobar G (2002) Interconnection and damping assignment passivity-based control of port-controlled hamiltonian systems. Automatica 38(4):585–596
Article MathSciNet MATH Google Scholar
Ortega R, van der Schaft A, Castaños F, Astolfi A (2008) Control by interconnection and standard passivity-based control of port-hamiltonian systems. IEEE Trans Autom Control 53(11):2527–2542
Article Google Scholar
Packard A, Topcu U, Seiler P, Balas G (2010) Help on SOS. IEEE Control Syst 30(4):18–23
Article MathSciNet Google Scholar
Parrilo PA (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D thesis, California Institute of Technology
Google Scholar
Sprangers O, Babuska R, Nageshrao S, Lopes G (2015) Reinforcement learning for Port-Hamiltonian systems. IEEE Trans Cybern 45(5):1003–1013
Article Google Scholar
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press
Google Scholar
Tedrake R, Manchester IR, Tobenkin M, Roberts JW (2010) LQR-trees: feedback motion planning via sums-of-squares verification. Int J Robot Res 29(8):1038–1052
Article Google Scholar
van der Schaft A, Jeltsema D (2014) Port-hamiltonian systems theory: an introductory overview. Found Trends Syst Control 1(2–3):173–378
Article Google Scholar
Vidyasagar M (2002) Nonlinear systems analysis, vol 42. SIAM
Google Scholar
West DB, et al (2001) Introduction to graph theory, 2nd edn. Prentice hall
Google Scholar

Download references

Author information

Authors and Affiliations

Delft Center for Systems and Control, Delft University of Technology, 2628CD, Delft, The Netherlands
Gabriel A. D. Lopes, Esmaeil Najafi, Subramanya P. Nageshrao & Robert Babuška

Authors

Gabriel A. D. Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Esmaeil Najafi
View author publications
You can also search for this author in PubMed Google Scholar
Subramanya P. Nageshrao
View author publications
You can also search for this author in PubMed Google Scholar
Robert Babuška
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel A. D. Lopes .

Editor information

Editors and Affiliations

Automation Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Lucian Busoniu
Automation Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Levente Tamás

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lopes, G.A.D., Najafi, E., Nageshrao, S.P., Babuška, R. (2015). Learning Complex Behaviors via Sequential Composition and Passivity-Based Control. In: Busoniu, L., Tamás, L. (eds) Handling Uncertainty and Networked Structure in Robot Control. Studies in Systems, Decision and Control, vol 42. Springer, Cham. https://doi.org/10.1007/978-3-319-26327-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-26327-4_3
Published: 07 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26325-0
Online ISBN: 978-3-319-26327-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics