Abstract
We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.
Similar content being viewed by others
References
Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.
Cavazos-Cadena, R. (1992). Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl. Math. Optim., this issue, 171–194.
Dugundji, J. (1977). Topology. Allyn and Bacon, Boston.
Foster F. G. (1953). On the stochastic processes associated with certain queueing processes. Ann. Math. Statist. 24, 355–360.
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer-Verlag, New York.
Hordjik, A. (1977). Dynamic Programming and Potential Theory. Mathematical Centre Tracts 51, Mathematish Centrum, Amsterdam, The Netherlands.
Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer-Verlag, New York.
Kolonko, M. (1982). The average-optimal control of a renewal model in presence of an unknown parameter. Math. Operationsforsch. Statist. Ser. Optim. 18, 567–591.
Loève, M. (1977). Probability Theory I. Springer-Verlag, New York.
Mandl, P. (1979). On the adaptive control of countable Markov Chains, in Probability Theory (Z. Ciesielski, ed.). Banach Centre, PWN Warsow, Vol. 5, pp. 159–173.
Ross, S. M. (1970). Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.
Royden, H. L. (1968). Real Analysis. Macmillan, New York.
Thomas, L. C. (1980). Conectedness conditions for denumerable state Markov decision processes, in Recent Developments in Markov Decision Processes (R. Hartley, L. C. Thomas, and D. J. White, eds.). Academic Press, New York, pp. 181–204.
Author information
Authors and Affiliations
Additional information
Communicated by D. Ocone
This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.
Rights and permissions
About this article
Cite this article
Cavazos-Cadena, R., Hernández-Lerma, O. Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26, 113–137 (1992). https://doi.org/10.1007/BF01189027
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01189027