Abstract
Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixture model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structure in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.
Article PDF
Similar content being viewed by others
References
Attias, H. (2000). Learning structure of latent variable models by variational Bayes. In S. A. Solla, T. K. Leen, & K.-R. M¨uller (Eds.), Advances in neural information processing systems, 13. Cambridge, MA: MIT Press.
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834–846.
Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1990). Learning and sequential decision making. In M. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks. Cambridge, MA: MIT Press, Bradford Books.
Boutilier, C., Dean, T., & Hanks, S. (2000). Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94.
Burt, P. J., & Adelson, E. H. (1983). The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31, 532–540.
Cohen, L. D., & Cohen, I. (1993). Finite element methods for active contour models and balloons for 2d and 3d images. IEEE Transactions in Pattern Analysis and Machine Intelligence, 15, 1131–1147.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY: Wiley.
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing, 9. Cambridge, MA: MIT Press.
Currie, K. W., & Tate, A. (1991). O-Plan: The open planning architecture. Artificial Intelligence, 52, 49–86.
Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.), Advances in neural information processing systems, 5. (pp. 271–278). San Mateo, CA: Morgan Kaufmann.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1–38.
Dietterich, T. G. (1997). Hierarchical reinforcement learning with the MAXQ value function decomposition. In Proceedings of the 15th International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.
Drummond, C. (1998). Composing functions to speed up reinforcement learning in a changing world. In C. Nedellec & C. Rouveirol (Eds.), Lecture Notes in Artificial Intelligence, 1398, 370–381.
Fikes, R. E., Hart, P. E., & Nilsson, N. J. (1972). Learning and executing generalized robot plans. Artificial Intelligence, 3, 251–288.
Forestier, J.-P., & Varaiya, P. (1978). Multilayer control of large Markov chains. IEEE Transactions on Automatic Control, AC-23, 298–304.
Frey, B. J., & Hinton, G. E. (1997). Efficient stochastic source coding and an application to a Bayesian network source model. The Computer Journal, 40, 157–165.
Ghahramani, Z., & Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analyzers. In S. A. Solla, T. K. Leen, & K.-R. M¨uller (Eds.), Advances in neural information processing systems, 13. Cambridge, MA: MIT Press.
Gordon, G. J. (1996). Stable fitted reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8. (pp. 1052–1058). Cambridge, MA: MIT Press.
Hauskrecht, M. (1998). Planning with temporally abstract actions. Report CS-98-01, Department of Computer Science, Brown University, Providence, RI.
Hauskrecht, M., Meuleau, N., Boutilier, C., Kaelbling, L. P., & Dean, T. (1998). Hierarchical solution of Markovdecision processes using macro-actions. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence.
Hinton, G. E., & Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society, Series B, 352, 1177–1190.
Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length, and Helmholtz free energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6. San Mateo, CA: Morgan Kaufmann.
Kaelbling, L. P. (1993). Hierarchical reinforcement learning: Preliminary results. In Proceedings of the 10th International Conference on Machine Learning. (p. 163). San Francisco, CA, USA: Morgan Kaufmann Publishers.
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1–63.
Korf, R. E. (1985). Macro-operators: A weak method for learning. Artificial Intelligence, 26, 35–77.
McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York, NY: Marcel Dekker.
Moore, A. W., Baird, L., & Kaelbling, L. P. (1999). Multi-value functions: Efficient automatic action hierarchies for multiple goal MDPs. International Joint Conference on Artificial Intelligence.
Parr, R. (1998). Hierarchical control and learning for Markov decision processes. Ph.D. Thesis, Computer Science Division, UC Berkeley.
Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In M. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 11. Cambridge, MA: MIT Press.
Precup, D., & Sutton, R. S. (1998). Multi-time models for temporally abstract planning. In M. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 11. (pp. 1050–1056). Cambridge, MA: MIT Press.
Precup, D., Sutton, R. S., & Singh, S. P. (1998). Theoretical results on reinforcement learning with temporally abstract options. In Proceedings of the 10th European Conference on Machine Learning. (pp. 382–393). Berlin, Germany: Springer-Verlag.
Rissanen, J. (1989). Stochastic complexity in statistical inquiry. Singapore: World Scientific.
Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166. Engineering Department, Cambridge University.
Singh, S. P. (1992). Reinforcement learning with a hierarchy of abstract models. In Proceedings of the 10th National Conference on Artificial Intelligence. (pp. 202–207). Menlo Park, CA: AAAI Press/MIT Press.
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems, 7. (pp. 361–368). Cambridge, MA: MIT Press.
Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Proceedings of the Twelfth International Conference on Machine Learning. (pp. 531–539). San Francisco, CA, USA: Morgan Kaufmann Publishers.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Sutton, R. S., Precup, D., & Singh, S. P. (1998). Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales. Report 98-74, Department of Computer Science, University of Massachusetts, Amherst, MA.
Tate, A. (1977). Generating project networks. In Proceedings of the 5th International Joint Conference on Artificial Intelligence. (pp. 888–893). Cambridge, MA: IJCAI.
Thrun, S., & Schwartz, A. (1995). Finding structure in reinforcement learning. In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems, 7. Cambridge, MA: MIT Press.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis, University of Cambridge, Cambridge, UK.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Foster, D., Dayan, P. Structure in the Space of Value Functions. Machine Learning 49, 325–346 (2002). https://doi.org/10.1023/A:1017944732463
Issue Date:
DOI: https://doi.org/10.1023/A:1017944732463