Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Bohy, Aaron; Bruyère, Véronique; Raskin, Jean-François; Bertrand, Nathalie

doi:10.1007/s00236-016-0255-4

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Original Article
Published: 01 February 2016

Volume 54, pages 545–587, (2017)
Cite this article

Acta Informatica Aims and scope Submit manuscript

Aaron Bohy¹,
Véronique Bruyère¹,
Jean-François Raskin² &
…
Nathalie Bertrand³

244 Accesses
1 Citation
Explore all metrics

Abstract

When treating Markov decision processes (MDPs) with large state spaces, using explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have proposed a so-called symblicit algorithm for the synthesis of optimal strategies in MDPs, in the quantitative setting of expected mean-payoff. This algorithm, based on the strategy iteration algorithm of Howard and Veinott, efficiently combines symbolic and explicit data structures, and uses binary decision diagrams as symbolic representation. The aim of this paper is to show that the new data structure of pseudo-antichains (an extension of antichains) provides another interesting alternative, especially for the class of monotonic MDPs. We design efficient pseudo-antichain based symblicit algorithms (with open source implementations) for two quantitative settings: the expected mean-payoff and the stochastic shortest path. For two practical applications coming from automated planning and \(\mathsf {LTL}\) synthesis, we report promising experimental results w.r.t. both the run time and the memory consumption. We also show that a variant of pseudo-antichains allows to handle the infinite state spaces underlying the qualitative verification of probabilistic lossy channel systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Article 12 January 2023

Automated Verification and Strategy Synthesis for Probabilistic Systems

Notes

An alternative objective might be to maximize the value function, in which case \(\lambda ^*\) is optimal if \(\mathbb {E}^{~\cdot }_{\lambda ^*}(s) = \sup _{\lambda \in \varLambda } \mathbb {E}^{~\cdot }_{\lambda }(s)\) for all \(s \in S\).
If the expected truncated sum has to be maximized, the cost function is restricted to the strictly negative real numbers and \(\arg \min \) is replaced by \(\arg \max \) in line 4.
If the expected mean-payoff has to be maximized, one has to replace \(\arg \min \) by \(\arg \max \) in lines 4 and 7.
We use calligraphic style for symbols denoting a symbolic representation.
A data structure closely related to our pseudo-antichains has been proposed in [2] in the particular context of probabilistic lossy channel systems. A comparison is given in Sect. 7.3.
“PA-representation” means pseudo-antichain-based representation.
This can be easily tested by Proposition 2.
Remark 2 also holds for Assumption 2.
for all \(s \in G\) and \(\sigma \in \varSigma _s\), \(\sum _{s'\in G}\mathbf{P }(s, \sigma , s') = 1\).
The “iff” holds since probabilities p are pairwise distinct.
The improvement of a strategy for the EMP, with the gain g or the bias b values (see Algorithm 2), is similar and is thus not detailed.
As Algorithm Split only works on \(S_\sigma \), it is not a problem if \(\lambda \) is not defined on \(S \backslash S_\sigma \).
A comparison with an MTBDD based symblicit algorithm is done in the second application for the EMP problem.
In [9], the authors study a different problem that is to maximize the probability of reaching the goal within a given number of steps.
On our benchmarks, the value iteration algorithm of \(\mathsf{PRISM}\) performs better than the strategy iteration one w.r.t. the run time and memory consumption. However, it still consumes more memory than the pseudo-antichain-based algorithm, and runs out of memory on several examples.
Note that in [11, 12], the weight function w is more general since it also associates values to \({\textsf {Lit}} (I)\). However, for this application, we restrict w to \({\textsf {Lit}} (O)\).
More precisely, it reduces to the EMP problem where the objective is to maximize the expected mean-payoff (see footnotes 1 and 3).
For all the MDPs considered in Tables 1 and 2, this ratio is 1.

References

Abdulla, P.A., Jonsson, B.: Verifying programs with unreliable channels. Inf. Comput. 127(2), 91–101 (1996)
Article MathSciNet MATH Google Scholar
Baier, C., Bertrand, N., Schnoebelen, P.: Symbolic verification of communicating systems with probabilistic message losses: liveness and fairness. In: Najm, E., Pradat-Peyre, J., Donzeau-Gouge, V. (eds.) FORTE, volume 4229 of Lecture Notes in Computer Science, pp. 212–227. Springer (2006)
Baier, C., Bertrand, N., Schnoebelen, P.: Verifying nondeterministic probabilistic channel systems against \(\omega \)-regular linear-time properties. ACM Trans. Comput. Log. 9(1), Article No. 5 (2007)
Baier, C.: Principles of model checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Baier, C., Katoen, J.-P., Hermanns, H., Wolf, V.: Comparative branching-time semantics for Markov chains. Inf. Comput. 200(2), 149–214 (2005)
Article MathSciNet MATH Google Scholar
Bertrand, N., Schnoebelen, P.: Computable fixpoints in well-structured symbolic model checking. Form. Methods Syst. Des. 43(2), 233–267 (2013)
Article MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.: Neuro-dynamic programming. Athena Scientific, Anthropological Field Studies, Belmont (1996)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)
Article MathSciNet MATH Google Scholar
Blum, A.L., Langford, J.C.: Probabilistic planning in the graphplan framework. In: Biundo, S., Fox, M. (eds.) Recent Advances in AI Planning, pp. 319–332. Springer (2000)
Bohy, A., Bruyère, V., Filiot, E., Jin, N., Raskin, J.-F.: Acacia+, a tool for LTL synthesis. In: Madhusudan, P., Seshia, S.A. (eds.) CAV, volume 7358 of Lecture Notes in Computer Science, pp. 652–657. Springer (2012)
Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. CoRR, abs/1210.3539 (2012)
Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. In: Piterman, N., Smolka, S.A. (eds.) TACAS, volume 7795 of Lecture Notes in Computer Science, pp. 169–184. Springer (2013)
Bohy, A., Bruyère, V., Raskin, J.: Symblicit algorithms for optimal strategy synthesis in monotonic markov decision processes. In: Chatterjee, K., Ehlers, R., Jha, S. (eds.) SYNT, volume 157 of EPTCS, pp. 51–67 (2014)
Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput. 35(8), 677–691 (1986)
Article MATH Google Scholar
Buchholz, P.: Exact and ordinary lumpability in finite Markov chains. J. Appl. Probab. 31(1), 59–75 (1994)
Article MathSciNet MATH Google Scholar
Burch, J.R., Clarke, E.M., McMillan, K.L., Dill, D.L., Hwang, L.J.: Symbolic model checking: \(10^{20}\) states and beyond. Inf. Comput. 98(2), 142–170 (1992)
Article MathSciNet MATH Google Scholar
Chatterjee, K., Henzinger, T.A., Jobstmann, B., Singh, R.: In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS, volume 6605 of Lecture Notes in Computer Science. Lecture Notes in Computer Science, pp. 267–271. Springer (2011)
Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons using branching-time temporal logic. In: Kozen, D. (ed.) Logic of Programs, volume 131 of Lecture Notes in Computer Science, pp. 52–71. Springer (1981)
de Alfaro, L.: Computing minimum and maximum reachability times in probabilistic systems. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR, volume 1664 of Lecture Notes in Computer Science, pp. 66–81. Springer (1999)
Derisavi, S., Hermanns, H., Sanders, W.H.: Optimal state-space lumping in Markov chains. Inf. Process. Lett. 87(6), 309–315 (2003)
Article MathSciNet MATH Google Scholar
Doyen, L., Raskin, J.-F.: Improved algorithms for the automata-based approach to model-checking. In: Grumberg, O., Huth, M. (eds.) TACAS, volume 4424 of Lecture Notes in Computer Science, pp. 451–465. Springer (2007)
Fikes, R.E., Nilsson, N.J.: STRIPS:a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3), 189–208 (1972)
MATH Google Scholar
Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Berlin (1997)
MATH Google Scholar
Filiot, E., Jin, N., Raskin, J.-F.: Antichains and compositional algorithms for LTL synthesis. Form. Methods Syst. Des. 39(3), 261–296 (2011)
Article MATH Google Scholar
Finkel, A.: Decidability of the termination problem for completely specified protocols. Distrib. Comput. 7(3), 129–135 (1994)
Article Google Scholar
Finkel, A., Schnoebelen, P.: Well-structured transition systems everywhere!. Theor. Comput. Sci. 256(1–2), 63–92 (2001)
Fujita, M., McGeer, P.C., Yang, J.C.-Y.: Multi-terminal binary decision diagrams: an efficient data structure for matrix representation. Form. Methods Syst. Des. 10(2/3), 149–169 (1997)
Article Google Scholar
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Form. Asp. Comput. 6(5), 512–535 (1994)
Article MATH Google Scholar
Hartmanns, A.: Modest: a unified language for quantitative models. In: FDL, IEEE, pp. 44–51 (2012)
Higman, G.: Ordering by divisibility in abstract algebras. Proc. Lond. Math. Soc. 3(2), 326–336 (1952)
Article MathSciNet MATH Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. Wiley, New Jersey (1960)
MATH Google Scholar
Jansen, D.N., Katoen, J.-P., Oldenkamp, M., Stoelinga, M., Zapreev, I.S.: How fast and fat is your probabilistic model checker? an experimental performance comparison. In: Yorav, K. (ed.) Haifa Verification Conference, volume 4899 of Lecture Notes in Computer Science, pp. 69–85. Springer (2007)
Katoen, J.-P., Zapreev, I.S., Hahn, E.M., Hermanns, H., Jansen, D.N.: The ins and outs of the probabilistic model checker MRMC. Perform. Eval. 68(2), 90–104 (2011)
Article Google Scholar
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Van Nostrand Company, Inc, New York (1960)
MATH Google Scholar
Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV, volume 6806 of Lecture Notes in Computer Science, pp. 585–591. Springer (2011)
Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)
Article MathSciNet MATH Google Scholar
Majercik, S.M., Littman, M.L.: Maxplan: a new approach to probabilistic planning. In: Simmons, R.G., Veloso, M.M., Smith, S.F. (eds.) AIPS, pp. 86–93. AAAI, Palo Alto (1998)
Google Scholar
Pachl, J.K.: Protocol description and analysis based on a state transition model with channel expressions. In: Rudin, H., West, C.H. (eds.) PSTV, Proceedings of the IFIP WG6.1, pp. 207–219. North-Holland (1987)
Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987)
Article MathSciNet MATH Google Scholar
Parker, D.: Personal communication, 2013-11-20
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New Jersey (1994)
Book MATH Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995)
MATH Google Scholar
Veinott, A.F.: On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Stat. 37(5), 1284–1294 (1966)
Von Essen, C.: Personal communication, 2013-11-20
Von Essen, C., Jobstmann, B.: Synthesizing efficient controllers. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI, volume 7148 of Lecture Notes in Computer Science, pp. 428–444. Springer (2012)
Wimmer, R., Braitling, B., Becker, B., Hahn, E.M., Crouzen, P., Hermanns, H., Dhama, A., Theel, O.E.: Symblicit calculation of long-run averages for concurrent probabilistic systems. In: QEST, IEEE Computer Society, pp. 27–36 (2010)
Wulf, M.D., Doyen, L., Henzinger, T.A., Raskin, J.-F.: Antichains: a new algorithm for checking universality of finite automata. In: Ball, T., Jones, R.B. (eds.) CAV, volume 4144 of Lecture Notes in Computer Science, pp. 17–30. Springer (2006)

Download references

Acknowledgments

We would like to thank Mickael Randour for his fruitful discussions, Marta Kwiatkowska, David Parker and Christian Von Essen for their help regarding the tool \(\mathsf {PRISM}\), and Holger Hermanns and Ernst Moritz Hahn for sharing with us their prototypical implementation. This work has been partly supported by ERC Starting Grant (279499: inVEST), ARC project (Number AUWB-2010-10/15-UMONS-3), European project Cassting (FP7-ICT-601148), and an F.R.S.-FNRS grant “Mission Scientifique”.

Author information

Authors and Affiliations

Université de Mons, 20 Place du Parc, 7000, Mons, Belgium
Aaron Bohy & Véronique Bruyère
Université Libre de Bruxelles, Campus de la Plaine, CP212, 1050, Brussels, Belgium
Jean-François Raskin
INRIA Rennes Bretagne-Atlantique, Campus Universitaire de Beaulieu, 35042, Rennes Cedex, France
Nathalie Bertrand

Authors

Aaron Bohy
View author publications
You can also search for this author in PubMed Google Scholar
Véronique Bruyère
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Raskin
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Bertrand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Véronique Bruyère.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bohy, A., Bruyère, V., Raskin, JF. et al. Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes. Acta Informatica 54, 545–587 (2017). https://doi.org/10.1007/s00236-016-0255-4

Download citation

Received: 15 April 2015
Accepted: 09 January 2016
Published: 01 February 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00236-016-0255-4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Automated Verification and Strategy Synthesis for Probabilistic Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Automated Verification and Strategy Synthesis for Probabilistic Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation