file.pdf (138.95 kB)
Covariant Policy Search
journal contribution
posted on 2003-01-01, 00:00 authored by J. Andrew Bagnell, Jeff SchneiderWe investigate the problem of non-covariant behavior
of policy gradient reinforcement learning algorithms.
The policy gradient approach is amenable
to analysis by information geometric methods. This
leads us to propose a natural metric on controller
parameterization that results from considering the
manifold of probability distributions over paths induced
by a stochastic controller. Investigation
of this approach leads to a covariant gradient ascent
rule. Interesting properties of this rule are
discussed, including its relation with actor-critic
style reinforcement learning algorithms. The algorithms
discussed here are computationally quite
efficient and on some interesting problems lead
to dramatic performance improvement over noncovariant
rules.