Abstract
This paper studies the expected total cost (ETC) criterion for discrete-time Markov control processes on Borel spaces, and possibly unbounded cost-per-stage functions. It presents optimality results which include conditions for a control policy to be ETC-optimal and for the ETC-value function to be a solution of the dynamic programming equation. Conditions are also given for the ETC-value function to be the limit of the α-discounted cost value function as α ↑ 1, and for the Markov control process to be `stable" in the sense of Lagrange and almost surely. In addition, transient control models are fully analized. The paper thus provides a fairly complete, up-dated, survey-like presentation of the ETC criterion for Markov control processes on Borel spaces.
Similar content being viewed by others
References
Bertsekas, D. P.: Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987.
Billingsley, P.: Convergence of Probability Measures, Wiley, New York, 1968.
Blackwell, D.: Positive dynamic programming, In: Proc. Fifth Berkeley Sympos. Math. Statist. and Probab. (Berkeley, Calif., 1965/66), Vol. I: Statistics, Univ. California Press, Berkeley, CA, 1967, pp. 415–418.
Chan, K. S.: Deterministic stability, stochastic stability, and ergodicity, in: Tong [30], Appendix 1, pp. 448–466.
Derman, C. and Strauch, R. E.: A note on memoryless rules for controlling sequential control processes, Ann. Math. Statist. 37 (1966), 276–278.
Doob, J. L.: Measure Theory, Springer-Verlag, New York, 1994.
Dynkin, E. B. and Yushkevich, A. A.: Controlled Markov Processes, Springer-Verlag, New York, 1979.
Gordienko, E. and Hernández-Lerma, O.: Average cost Markov control processes with weighted norms: Existence of canonical policies, Appl. Math. (Warsaw) 23 (1995), 199–218.
Gordienko, E. and Hernández-Lerma, O.: Average cost Markov control processes: Value iteration, Appl. Math. (Warsaw) 23 (1995), 219–237.
Hernández-Lerma, O.: Lyapunov criteria for stability of differential equations with Markov parameters, Bol. Soc. Mat. Mexicana 24 (1979), 27–48.
Hernández-Lerma, O. and Lasserre, J. B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996.
Hernández-Lerma, O. and Lasserre, J. B.: Policy iteration for average cost Markov control processes on Borel spaces, Acta Appl. Math. 47 (1997), 125–154.
Hernández-Lerma, O. and Vega-Amaya, O.: Infinite-horizon Markov control processes with undiscounted cost criteria: From average to overtaking optimality, Appl. Math. (Warsaw) 25 (1998), 153–178.
Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G.: Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim. (to appear).
Hinderer, K.: Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter, Lecture Notes in Oper. Res. Math. Systems 33, Springer-Verlag, Berlin, 1970.
Kallenberg, L. C. M.: Linear Programming and Finite Markovian Control Problems, Mathematical Centre Tracts No. 148, Mathematisch Centrum, Amsterdam, 1983.
Kushner, H. H.: Introduction to Stochastic Control, Holt, Rinehart and Winston, New York, 1971.
Laha, R. G. and Rohatgi, V. K.: Probability Theory, Wiley, New York, 1979.
Meyn, S. P. and Tweedie, R. L.: Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.
Neveu, J.: Mathematical Foundations of the Calculus of Probability, Holden-Day, San Francisco, 1965.
Parthasarathy, K. R.: Probability Measures on Metric Spaces, Academic Press, New York, 1967.
Pliska, S. R.: On the transient case for Markov decision chains with general state space, in: M. L. Puterman (ed.), Dynamic Programming and its Applications, Academic Press, New York, 1979.
Puterman, M. L.: Markov Decision Processes, Wiley, New York, 1994.
Quelle, G.: Dynamic programming of expectation and variance, J. Math. Anal. Appl. 55 (1976), 239–252.
Ramsey, F. P.: A mathematical theory of savings, Economic J. 38 (1928), 543–559.
Rieder, U.: On optimal policies and martingales in dynamic programming, J. Appl. Probab. 13 (1976), 507–518.
Rieder, U.: On Howard's policy improvement method, Math. Oper. Statist., Ser. Optim. 8 (1977), 227–236.
Schäl, M.: Conditions for optimality and for the limit of n-stage optimal policies to be optimal, Z. Wahrs. verw. Geb. 32 (1975), 179–196.
Strauch, R. E.: Negative dynamic programming, Ann. Mah. Statist. 37 (1966), 871–890.
Tong, H.: Non-linear Time Series, Oxford Univ. Press, Oxford, 1993.
Veinott, A. F.: Discrete dynamic programming with sensitive discount optimality criteria, Ann. Math. Statist. 40 (1969), 1635–1660.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hernández-Lerma, O., Carrasco, G. & Pérez-Hernández, R. Markov Control Processes with the Expected Total Cost Criterion: Optimality, Stability, and Transient Models. Acta Applicandae Mathematicae 59, 229–269 (1999). https://doi.org/10.1023/A:1006368714127
Issue Date:
DOI: https://doi.org/10.1023/A:1006368714127