Structure in the Space of Value Functions

Foster, David; Dayan, Peter

doi:10.1023/A:1017944732463

Structure in the Space of Value Functions

Published: November 2002

Volume 49, pages 325–346, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Structure in the Space of Value Functions

Download PDF

David Foster^1,2 &
Peter Dayan³

1289 Accesses
30 Citations
Explore all metrics

Abstract

Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixture model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structure in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.

References

Attias, H. (2000). Learning structure of latent variable models by variational Bayes. In S. A. Solla, T. K. Leen, & K.-R. M¨uller (Eds.), Advances in neural information processing systems, 13. Cambridge, MA: MIT Press.
Google Scholar
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834–846.
Google Scholar
Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1990). Learning and sequential decision making. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks. Cambridge, MA: MIT Press, Bradford Books.
Google Scholar
Boutilier, C., Dean, T., & Hanks, S. (2000). Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94.
Google Scholar
Burt, P. J., & Adelson, E. H. (1983). The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31, 532–540.
Google Scholar
Cohen, L. D., & Cohen, I. (1993). Finite element methods for active contour models and balloons for 2d and 3d images. IEEE Transactions in Pattern Analysis and Machine Intelligence, 15, 1131–1147.
Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY: Wiley.
Google Scholar
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing, 9. Cambridge, MA: MIT Press.
Google Scholar
Currie, K. W., & Tate, A. (1991). O-Plan: The open planning architecture. Artificial Intelligence, 52, 49–86.
Google Scholar
Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems, 5. (pp. 271–278). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1–38.
Google Scholar
Dietterich, T. G. (1997). Hierarchical reinforcement learning with the MAXQ value function decomposition. In Proceedings of the 15th International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.
Google Scholar
Drummond, C. (1998). Composing functions to speed up reinforcement learning in a changing world. In C. Nedellec & C. Rouveirol (Eds.), Lecture Notes in Artificial Intelligence, 1398, 370–381.
Fikes, R. E., Hart, P. E., & Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251–288.
Google Scholar
Forestier, J.-P., & Varaiya, P. (1978). Multilayer control of large Markov chains. IEEE Transactions on Automatic Control, AC-23, 298–304.
Google Scholar
Frey, B. J., & Hinton, G. E. (1997). Efficient stochastic source coding and an application to a Bayesian network source model. The Computer Journal, 40, 157–165.
Google Scholar
Ghahramani, Z., & Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analyzers. In S. A. Solla, T. K. Leen, & K.-R. M¨uller (Eds.), Advances in neural information processing systems, 13. Cambridge, MA: MIT Press.
Google Scholar
Gordon, G. J. (1996). Stable fitted reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8. (pp. 1052–1058). Cambridge, MA: MIT Press.
Google Scholar
Hauskrecht, M. (1998). Planning with temporally abstract actions. Report CS-98-01, Department of Computer Science, Brown University, Providence, RI.
Google Scholar
Hauskrecht, M., Meuleau, N., Boutilier, C., Kaelbling, L. P., & Dean, T. (1998). Hierarchical solution of Markovdecision processes using macro-actions. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence.
Hinton, G. E., & Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society, Series B, 352, 1177–1190.
Google Scholar
Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length, and Helmholtz free energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Kaelbling, L. P. (1993). Hierarchical reinforcement learning: Preliminary results. In Proceedings of the 10th International Conference on Machine Learning. (p. 163). San Francisco, CA, USA: Morgan Kaufmann Publishers.
Google Scholar
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1–63.
Google Scholar
Korf, R. E. (1985). Macro-operators: A weak method for learning. Artificial Intelligence, 26, 35–77.
Google Scholar
McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York, NY: Marcel Dekker.
Google Scholar
Moore, A. W., Baird, L., & Kaelbling, L. P. (1999). Multi-value functions: Efficient automatic action hierarchies for multiple goal MDPs. International Joint Conference on Artificial Intelligence.
Parr, R. (1998). Hierarchical control and learning for Markov decision processes. Ph.D. Thesis, Computer Science Division, UC Berkeley.
Google Scholar
Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In M. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 11. Cambridge, MA: MIT Press.
Google Scholar
Precup, D., & Sutton, R. S. (1998). Multi-time models for temporally abstract planning. In M. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 11. (pp. 1050–1056). Cambridge, MA: MIT Press.
Google Scholar
Precup, D., Sutton, R. S., & Singh, S. P. (1998). Theoretical results on reinforcement learning with temporally abstract options. In Proceedings of the 10th European Conference on Machine Learning. (pp. 382–393). Berlin, Germany: Springer-Verlag.
Google Scholar
Rissanen, J. (1989). Stochastic complexity in statistical inquiry. Singapore: World Scientific.
Google Scholar
Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166. Engineering Department, Cambridge University.
Singh, S. P. (1992). Reinforcement learning with a hierarchy of abstract models. In Proceedings of the 10th National Conference on Artificial Intelligence. (pp. 202–207). Menlo Park, CA: AAAI Press/MIT Press.
Google Scholar
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems, 7. (pp. 361–368). Cambridge, MA: MIT Press.
Google Scholar
Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Proceedings of the Twelfth International Conference on Machine Learning. (pp. 531–539). San Francisco, CA, USA: Morgan Kaufmann Publishers.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Sutton, R. S., Precup, D., & Singh, S. P. (1998). Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales. Report 98-74, Department of Computer Science, University of Massachusetts, Amherst, MA.
Google Scholar
Tate, A. (1977). Generating project networks. In Proceedings of the 5th International Joint Conference on Artificial Intelligence. (pp. 888–893). Cambridge, MA: IJCAI.
Google Scholar
Thrun, S., & Schwartz, A. (1995). Finding structure in reinforcement learning. In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems, 7. Cambridge, MA: MIT Press.
Google Scholar
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis, University of Cambridge, Cambridge, UK.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Neuroscience, University of Edinburgh, Edinburgh, UK;
David Foster
Gatsby Computational Neuroscience Unit, London, UK
David Foster
Gatsby Computational Neuroscience Unit, London
Peter Dayan

Authors

David Foster
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foster, D., Dayan, P. Structure in the Space of Value Functions. Machine Learning 49, 325–346 (2002). https://doi.org/10.1023/A:1017944732463

Download citation

Issue Date: November 2002
DOI: https://doi.org/10.1023/A:1017944732463

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structure in the Space of Value Functions

Abstract

Article PDF

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Introduction to Reinforcement Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Structure in the Space of Value Functions

Abstract

Article PDF

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Introduction to Reinforcement Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation