Skip to main content

Adaptive Sparse Grids in Reinforcement Learning

  • Chapter
  • First Online:
Extraction of Quantifiable Information from Complex Systems

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 102))

  • 1218 Accesses

Abstract

We propose a model-based online reinforcement learning approach for continuous domains with deterministic transitions using a spatially adaptive sparse grid in the planning stage. The model learning employs Gaussian processes regression and allows a low sample complexity. The adaptive sparse grid is introduced to allow the representation of the value function in the planning stage in higher dimensional state spaces. This work gives numerical evidence that adaptive sparse grids are applicable in the case of reinforcement learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We call a discrete space \(V _{\underline{k}}\) smaller than a space \(V _{\underline{l}}\) if ∀ t k t  ≤ l t and \(\exists t: k_{t} < l_{t}\). In the same way a grid \(\varOmega _{\underline{k}}\) is smaller than a grid \(\varOmega _{\underline{l}}\).

References

  1. Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. In: Systems and Control: Foundations and Applications. Birkhäuser, Boston (1997)

    Google Scholar 

  2. Barles, G., Jakobsen, E.R.: On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations. M2AN Math. Model. Numer. Anal. 36(1), 33–54 (2002)

    Google Scholar 

  3. Barles, G., Jakobsen, E.R.: Error bounds for monotone approximation schemes for parabolic Hamilton-Jacobi-Bellman equations. Math. Comput. 76(240), 1861–1893 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  4. Barles, G., Souganidis, P.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptot. Anal. 4(3), 271–283 (1991)

    MATH  MathSciNet  Google Scholar 

  5. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  6. Bokanowski, O., Garcke, J., M-Griebel, Klompmaker, I.: An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton-Jacobi Bellman equations. J. Sci. Comput. 55(3), 575–605 (2013)

    Google Scholar 

  7. Bonnans, J.F., Ottenwaelter, E., Zidani, H.: A fast algorithm for the two dimensional HJB equation of stochastic control. M2AN, Math. Model. Numer. Anal. 38(4), 723–735 (2004)

    Google Scholar 

  8. Bonnans, J.F., Zidani, H.: Consistency of generalized finite difference schemes for the stochastic HJB equation. SIAM J. Numer. Anal. 41(3), 1008–1021 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. Brafman, R., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)

    MathSciNet  Google Scholar 

  10. Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numer. 13, 1–123 (2004)

    Article  MathSciNet  Google Scholar 

  11. Camilli, F., Falcone, M.: An approximation scheme for the optimal control of diffusion processes. RAIRO, Modélisation Math. Anal. Numér. 29(1), 97–122 (1995)

    Google Scholar 

  12. Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. In: Proceedings of the 12th International Joint Conference on Artificial Intelligence, San Mateo, pp. 726–731 (1991)

    Google Scholar 

  13. Deisenroth, M.P., Rasmussen, C., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7–9), 1508–1524 (2009)

    Article  Google Scholar 

  14. Farahmand, A.M., Munos, R., Szepesvári, C.: Error propagation for approximate policy and value iteration. In: NIPS. Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 568–576. (2010)

    Google Scholar 

  15. Feuersänger, C.: Sparse grid methods for higher dimensional approximation. Dissertation, Institut für Numerische Simulation, Universität Bonn (2010)

    Google Scholar 

  16. Garcke, J.: Regression with the optimised combination technique. In: Cohen, W., Moore, A. (eds.) Proceedings of the 23rd ICML’06, Pittsburgh, pp. 321–328. ACM, New York (2006)

    Google Scholar 

  17. Garcke, J.: Sparse grids in a nutshell. In: Sparse Grids and Applications. Lecture Notes in Computational Science and Engineering, vol. 88, pp. 57–80. Springer, Berlin/New York (2013)

    Google Scholar 

  18. Griebel, M.: Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing 61(2), 151–179 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  19. Grüne, L.: An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation. Numer. Math. 75(3), 319–337 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  20. Grüne, L.: Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation. Numer. Math. 99(1), 85–112 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  21. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)

    Book  MATH  Google Scholar 

  22. Heinecke, A., PflügerS, D.: Multi- and many-core data mining with adaptive sparse grids. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, CF’11, Ischia, pp. 29:1–29:10. ACM (2011)

    Google Scholar 

  23. Jung, T., Stone, P.: Gaussian processes for sample efficient reinforcement learning with RMAX-Like exploration. In: Balcázar, J.L., Bonchi, F., Gionis, A. Sebag, M. (eds.) ECML/PKDD 2010 (1). Lecture Notes in Computer Science, vol. 6321, pp. 601–616. Springer, Berlin/New York (2010)

    Google Scholar 

  24. Krylov, N.V.: The rate of convergence of finite-difference approximations for Bellman equations with Lipschitz coefficients. Appl. Math. Optim. 52(3), 365–399 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  25. Kushner, H., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. No. 24 in Applications of Mathematics, 2nd edn. Springer, New York (2001)

    Google Scholar 

  26. Munos, R.: A study of reinforcement learning in the continuous case by the means of viscosity solutions. Mach. Learn. 40(3), 265–299 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  27. Munos, R.: Performance bounds in L p -norm for approximate value iteration. SIAM J. Control Optim. 46(2), 541–561 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  28. Munos, R., Moore, A.: Variable resolution discretization in optimal control. Mach. Learn. 49(2–3), 291–323 (2002)

    Article  MATH  Google Scholar 

  29. Noordmans, J., Hemker, P.: Application of an adaptive sparse grid technique to a model singular perturbation problem. Computing 65, 357–378 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  30. Pareigis, S.: Adaptive choice of grid and time in reinforcement learning. In: NIPS. MIT, Cambridge (1997).

    Google Scholar 

  31. Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)

    Google Scholar 

  32. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT, Cambridge (2006)

    MATH  Google Scholar 

  33. Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)

    Google Scholar 

  34. Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl. Akad. Nauk SSSR 148, 1042–1043 (1963)

    MATH  MathSciNet  Google Scholar 

  35. Tourin, A.: Splitting methods for Hamilton-Jacobi equations. Numer. Methods Partial Differ. Equ. 22(2), 381–396 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  36. Yserentant, H.: On the multi-level splitting of finite element spaces. Numerische Mathematik 49, 379–412 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  37. Zenger, C.: Sparse grids. In: Hackbusch, W. (ed.) Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, 1990. Notes on Numerical Fluid Mechanics, vol. 31, pp. 241–251. Vieweg, Braunschweig (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jochen Garcke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Garcke, J., Klompmaker, I. (2014). Adaptive Sparse Grids in Reinforcement Learning. In: Dahlke, S., et al. Extraction of Quantifiable Information from Complex Systems. Lecture Notes in Computational Science and Engineering, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-08159-5_9

Download citation

Publish with us

Policies and ethics