ABSTRACT
We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.
- Boyan, J. A. (1999). Least-squares temporal difference learning. ICML-99. Google ScholarDigital Library
- Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 2. Google ScholarDigital Library
- Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. AAAI-97. Google ScholarDigital Library
- Keller, P., Mannor, S., & Precup, D. (2006). Automatic basis function construction for approximate dynamic programming and reinforcement learning. ICML 2006. Google ScholarDigital Library
- Koller, D., & Parr, R. (1999). Computing factored value functios for policies in structured MDPs. IJCAI-99. Google ScholarDigital Library
- Lagoudakis, M., & Parr, R. (2003). Least squares policy iteration. JMLR, 4. Google ScholarDigital Library
- Mahadevan, S., & Maggioni, M. (2007). Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. JMLR, 8. Google ScholarDigital Library
- Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. on Signal Processing, 41.Google ScholarDigital Library
- Parr, R., Painter-Wakefield, C., Li, L., & Littman, M. (2007). Analyzing feature generation for value-function approximation. ICML-07. Google ScholarDigital Library
- Petrik, M. (2007). An analysis of Laplacian methods for value function approximation in MDPs. IJCAI-07. Google ScholarDigital Library
- Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. UAI-05.Google Scholar
- Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3. Google ScholarDigital Library
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. The MIT Press. Google ScholarDigital Library
- Wu, J.-H., & Givan, R. (2004). Feature-discovering approximate value iteration methods (Technical Report TR-ECE-04-06). Purdue University.Google Scholar
- Yu, H., & Bertsekas, D. (2006). Convergence results for some temporal difference methods based on least squares (Technical Report LIDS-2697). MIT.Google Scholar
Index Terms
- An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
Recommendations
Kernelized value function approximation for reinforcement learning
ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningA recent surge in research in kernelized approaches to reinforcement learning has sought to bring the benefits of kernelized machine learning techniques to reinforcement learning. Kernelized reinforcement learning techniques are fairly new and different ...
Linear feature encoding for reinforcement learning
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing SystemsFeature construction is of vital importance in reinforcement learning, as the quality of a value function or policy is largely determined by the corresponding features. The recent successes of deep reinforcement learning (RL) only increase the ...
Differentially Private Reinforcement Learning with Linear Function Approximation
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer SystemsMotivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) ...
Comments