Equivalence of Lyapunov stability criteria in a class of Markov decision processes

Cavazos-Cadena, Rolando; Hernández-Lerma, Onésimo

doi:10.1007/BF01189027

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

Published: September 1992

Volume 26, pages 113–137, (1992)
Cite this article

Applied Mathematics and Optimization Submit manuscript

Rolando Cavazos-Cadena¹ &
Onésimo Hernández-Lerma²

143 Accesses
16 Citations
Explore all metrics

Abstract

We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stability Estimation of Transient Markov Decision Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Article 29 September 2018

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Article 04 August 2020

References

Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.
Google Scholar
Cavazos-Cadena, R. (1992). Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl. Math. Optim., this issue, 171–194.
Dugundji, J. (1977). Topology. Allyn and Bacon, Boston.
Google Scholar
Foster F. G. (1953). On the stochastic processes associated with certain queueing processes. Ann. Math. Statist. 24, 355–360.
Google Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer-Verlag, New York.
Google Scholar
Hordjik, A. (1977). Dynamic Programming and Potential Theory. Mathematical Centre Tracts 51, Mathematish Centrum, Amsterdam, The Netherlands.
Google Scholar
Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer-Verlag, New York.
Google Scholar
Kolonko, M. (1982). The average-optimal control of a renewal model in presence of an unknown parameter. Math. Operationsforsch. Statist. Ser. Optim. 18, 567–591.
Google Scholar
Loève, M. (1977). Probability Theory I. Springer-Verlag, New York.
Google Scholar
Mandl, P. (1979). On the adaptive control of countable Markov Chains, in Probability Theory (Z. Ciesielski, ed.). Banach Centre, PWN Warsow, Vol. 5, pp. 159–173.
Google Scholar
Ross, S. M. (1970). Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.
Google Scholar
Royden, H. L. (1968). Real Analysis. Macmillan, New York.
Google Scholar
Thomas, L. C. (1980). Conectedness conditions for denumerable state Markov decision processes, in Recent Developments in Markov Decision Processes (R. Hartley, L. C. Thomas, and D. J. White, eds.). Academic Press, New York, pp. 181–204.
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estadistica y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista 25315, Saltillo, COAH, México
Rolando Cavazos-Cadena
Departamento de Matemáticas, CINVESTAV-IPN, Apartado Postal 14-740, 07000, México D.F., Mexico
Onésimo Hernández-Lerma

Authors

Rolando Cavazos-Cadena
View author publications
You can also search for this author in PubMed Google Scholar
Onésimo Hernández-Lerma
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by D. Ocone

This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Hernández-Lerma, O. Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26, 113–137 (1992). https://doi.org/10.1007/BF01189027

Download citation

Accepted: 11 March 1991
Issue Date: September 1992
DOI: https://doi.org/10.1007/BF01189027

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Stability Estimation of Transient Markov Decision Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Stability Estimation of Transient Markov Decision Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation