Skip to main content
Log in

Equivalence of Lyapunov stability criteria in a class of Markov decision processes

  • Published:
Applied Mathematics and Optimization Submit manuscript

Abstract

We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.

    Google Scholar 

  2. Cavazos-Cadena, R. (1992). Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state. Appl. Math. Optim., this issue, 171–194.

  3. Dugundji, J. (1977). Topology. Allyn and Bacon, Boston.

    Google Scholar 

  4. Foster F. G. (1953). On the stochastic processes associated with certain queueing processes. Ann. Math. Statist. 24, 355–360.

    Google Scholar 

  5. Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer-Verlag, New York.

    Google Scholar 

  6. Hordjik, A. (1977). Dynamic Programming and Potential Theory. Mathematical Centre Tracts 51, Mathematish Centrum, Amsterdam, The Netherlands.

    Google Scholar 

  7. Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer-Verlag, New York.

    Google Scholar 

  8. Kolonko, M. (1982). The average-optimal control of a renewal model in presence of an unknown parameter. Math. Operationsforsch. Statist. Ser. Optim. 18, 567–591.

    Google Scholar 

  9. Loève, M. (1977). Probability Theory I. Springer-Verlag, New York.

    Google Scholar 

  10. Mandl, P. (1979). On the adaptive control of countable Markov Chains, in Probability Theory (Z. Ciesielski, ed.). Banach Centre, PWN Warsow, Vol. 5, pp. 159–173.

    Google Scholar 

  11. Ross, S. M. (1970). Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.

    Google Scholar 

  12. Royden, H. L. (1968). Real Analysis. Macmillan, New York.

    Google Scholar 

  13. Thomas, L. C. (1980). Conectedness conditions for denumerable state Markov decision processes, in Recent Developments in Markov Decision Processes (R. Hartley, L. C. Thomas, and D. J. White, eds.). Academic Press, New York, pp. 181–204.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by D. Ocone

This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Hernández-Lerma, O. Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl Math Optim 26, 113–137 (1992). https://doi.org/10.1007/BF01189027

Download citation

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01189027

Key words

Navigation