Skip to main content

Learning Complex Behaviors via Sequential Composition and Passivity-Based Control

  • Chapter
  • First Online:
Handling Uncertainty and Networked Structure in Robot Control

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 42))

Abstract

The model-free paradigm of Reinforcement learning (RL) is a theoretical strength. However in practice, the stringent assumptions required for optimal solutions (full state space exploration) and experimental issues, such as slow learning rates, render model-free RL a practical weakness. This paper addresses practical implementations of RL by interfacing elements of systems and control and robotics. In our approach space is handled by Sequential Composition (a technique commonly used in robotics) and time is handled by the use of passivity-based control methods (a standard nonlinear control approach) towards speeding up learning and providing a stopping time criteria. Sequential composition in effect partitions the state space and allows for the composition of controllers, each having different domains of attraction (DoA) and goal sets. This results in learning taking place in subsets of the state space. Passivity-based control (PBC) is a model-based control approach where total energy is computable. This total energy can be used as a candidate Lyapunov function to evaluate the stability of a controller and find estimates of its DoA. This enables learning in finite time: while learning the candidate Lyapunov function is monitored online to approximate the DoA of the learned controller. Once this DoA covers relevant states, from the point of view of sequential composition, the learning process is stopped. The result of this process is a collection of learned controllers that cover a desired range of the state space, and can be composed in sequence to achieve various desired goals. Optimality is lost in favour of practicality. Other implications include safety while learning and incremental learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our definition of practicality is: the algorithm runs in finite-time with a finite-memory, and results in guaranteed stability/convergence assuming a known model of the system.

References

  • Bhatnagar S, Sutton RS, Ghavamzadeh M, Lee M (2009) Natural actor–critic algorithms. Automatica 45(11):2471–2482

    Google Scholar 

  • Burridge RR, Rizzi AA, Koditschek DE (1999) Sequential composition of dynamically dexterous robot behaviors. Int J Robot Res 18(6):534–555

    Article  Google Scholar 

  • Busoniu L, Babuska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. CRC Press

    Google Scholar 

  • Chesi G (2004) Estimating the domain of attraction for uncertain polynomial systems. Automatica 40(11):1981–1986

    Article  MathSciNet  MATH  Google Scholar 

  • Chesi G (2011) Domain of attraction: analysis and control via SOS programming. Springer

    Google Scholar 

  • Chiang HD, Hirsch MW, Wu FF (1988) Stability regions of nonlinear autonomous dynamical systems. IEEE Trans Autom Control 33(1):16–27

    Article  MathSciNet  MATH  Google Scholar 

  • Conner DC, Choset H, Rizzi AA (2009) Flow-through policies for hybrid controller synthesis applied to fully actuated systems. IEEE Trans Robot 25(1):136–146

    Article  Google Scholar 

  • Fujimoto K, Sugie T (2001) Canonical transformation and stabilization of generalized hamiltonian systems. Syst Control Lett 42(3):217–227

    Article  MathSciNet  Google Scholar 

  • Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor–critic reinforcement learning: Standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C: Appl Rev 42(6):1291–1307

    Google Scholar 

  • Hachicho O (2007) A novel LMI-based optimization algorithm for the guaranteed estimation of the domain of attraction using rational Lyapunov functions. J Frankl Inst 344(5):535–552

    Article  MathSciNet  MATH  Google Scholar 

  • Henrion D, Korda M (2013) Convex computation of the region of attraction of polynomial control systems. In: Proceedings of the european control conference, pp 676–681

    Google Scholar 

  • Khalil HK (2002) Nonlinear systems, vol 3. Prentice hall

    Google Scholar 

  • Konda VR, Tsitsiklis JN (2003) On actor–critic algorithms. SIAM j Control Optim 42(4):1143–1166

    Google Scholar 

  • Konidaris G, Barreto AS (2009) Skill discovery in continuous reinforcement learning domains using skill chaining. In: Advances in neural information processing systems. pp 1015–1023

    Google Scholar 

  • Le Ny J, Pappas GJ (2012) Sequential composition of robust controller specifications. In: Proceedings of the IEEE international conference on robotics and automation. pp 5190–5195

    Google Scholar 

  • Lindemann SR, LaValle SM (2009) Simple and efficient algorithms for computing smooth, collision-free feedback laws over given cell decompositions. Int J Robot Res 28(5):600–621

    Article  Google Scholar 

  • Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less time. Mach Learn 13(1):103–130

    Google Scholar 

  • Najafi E, Lopes GA, Babuska R (2013) Reinforcement learning for sequential composition control. In: Proceedings of the IEEE international conference on decision and control. pp 7265–7270

    Google Scholar 

  • Najafi E, Lopes GA, Babuska R (2014a) Balancing a legged robot using state-dependent Riccati equation control. In: Proceedings of the 19th IFAC world congress, vol 19. pp 2177–2182

    Google Scholar 

  • Najafi E, Lopes GA, Nageshrao SP, Babuska R (2014b) Rapid learning in sequential composition control. In: Proceedings of the IEEE international conference on decision and control

    Google Scholar 

  • Ortega R (1998) Passivity-based control of Euler-Lagrange systems: mechanical, electrical and electromechanical applications. Springer

    Google Scholar 

  • Ortega R, Garcia-Canseco E (2004) Interconnection and damping assignment passivity-based control: a survey. Eur J Control 10(5):432–450

    Article  MathSciNet  MATH  Google Scholar 

  • Ortega R, Van der Schaft AJ, Mareels I, Maschke B (2001) Putting energy back in control. IEEE Control Syst 21(2):18–33

    Article  Google Scholar 

  • Ortega R, Van Der Schaft A, Maschke B, Escobar G (2002) Interconnection and damping assignment passivity-based control of port-controlled hamiltonian systems. Automatica 38(4):585–596

    Article  MathSciNet  MATH  Google Scholar 

  • Ortega R, van der Schaft A, Castaños F, Astolfi A (2008) Control by interconnection and standard passivity-based control of port-hamiltonian systems. IEEE Trans Autom Control 53(11):2527–2542

    Article  Google Scholar 

  • Packard A, Topcu U, Seiler P, Balas G (2010) Help on SOS. IEEE Control Syst 30(4):18–23

    Article  MathSciNet  Google Scholar 

  • Parrilo PA (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D thesis, California Institute of Technology

    Google Scholar 

  • Sprangers O, Babuska R, Nageshrao S, Lopes G (2015) Reinforcement learning for Port-Hamiltonian systems. IEEE Trans Cybern 45(5):1003–1013

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press

    Google Scholar 

  • Tedrake R, Manchester IR, Tobenkin M, Roberts JW (2010) LQR-trees: feedback motion planning via sums-of-squares verification. Int J Robot Res 29(8):1038–1052

    Article  Google Scholar 

  • van der Schaft A, Jeltsema D (2014) Port-hamiltonian systems theory: an introductory overview. Found Trends Syst Control 1(2–3):173–378

    Article  Google Scholar 

  • Vidyasagar M (2002) Nonlinear systems analysis, vol 42. SIAM

    Google Scholar 

  • West DB, et al (2001) Introduction to graph theory, 2nd edn. Prentice hall

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriel A. D. Lopes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lopes, G.A.D., Najafi, E., Nageshrao, S.P., Babuška, R. (2015). Learning Complex Behaviors via Sequential Composition and Passivity-Based Control. In: Busoniu, L., Tamás, L. (eds) Handling Uncertainty and Networked Structure in Robot Control. Studies in Systems, Decision and Control, vol 42. Springer, Cham. https://doi.org/10.1007/978-3-319-26327-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26327-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26325-0

  • Online ISBN: 978-3-319-26327-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics