Abstract
A finite semi-Markov decision process is studied to maximize the expected average reward. The semi-Markov kernel of the process depends on an unknown parameter taking values in a subset [a, b] of ℝS. A controller modelled as a learning automaton updates sequentially the probabilities of generating decisions based on the observed decisions, states, and jump times. Convergence results are stated in the form of theorems and some examples are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
El-Fattah, Y.M. (1981) Gradient approach for recursive estimation and control in finite Markon chains. Adv. Appl. Probability, 13, 778–803.
Jewell, W.S. (1963) Markov-renewal programming, I,II. Operations Research, 2, 938–971.
Polyak, B.T. and Tsypkin, Ya.Z. (1973) Pseudo-gradient adptation and training algorithms. Automation and Remote Control, 34, 377–397.
Ross, S.M. (1970) Applied probability models with optimization applications. Holden Day, San Francisco.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1983 Springer-Verlag New York Inc.
About this paper
Cite this paper
El-Fattah, Y.M. (1983). Learning Automaton for Finite Semi-Markov Decision Processes. In: Herkenrath, U., Kalin, D., Vogel, W. (eds) Mathematical Learning Models — Theory and Algorithms. Lecture Notes in Statistics, vol 20. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-5612-0_4
Download citation
DOI: https://doi.org/10.1007/978-1-4612-5612-0_4
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-90913-4
Online ISBN: 978-1-4612-5612-0
eBook Packages: Springer Book Archive