Skip to main content

Gradient-Descent for Randomized Controllers Under Partial Observability

  • Conference paper
  • First Online:
Verification, Model Checking, and Abstract Interpretation (VMCAI 2022)

Abstract

Randomization is a powerful technique to create robust controllers, in particular in partially observable settings. The degrees of randomization have a significant impact on the system performance, yet they are intricate to get right. The use of synthesis algorithms for parametric Markov chains (pMCs) is a promising direction to support the design process of such controllers. This paper shows how to define and evaluate gradients of pMCs. Furthermore, it investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC. The resulting method scales to significantly larger pMCs than before and empirically outperforms the state-of-the-art, often by at least one order of magnitude.

Supported by DFG RTG 2236 “UnRAVeL” and ERC AdG 787914 FRAPPANT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Availability Statement

The tools used and data generated in our experimental evaluation are archived at DOI 10.5281/5568910 [27].

Notes

  1. 1.

    ETR = Existential Theory of the Reals. ETR-complete decision problems are as hard as finding the roots of a multivariate polynomials.

  2. 2.

    Technically, we use graph-preserving to ensure continuously differentiability of . For acyclic pMCs, these functions are continuously differentiable without assuming graph-preservation [35].

  3. 3.

    In our implementation, we support exact rationals or floating point arithmetic.

  4. 4.

    It is only based on the first derivative and not on higher ones.

  5. 5.

    When considering a minimization problem, \({bar}\) is subtracted from f.

  6. 6.

    The feasibility problem remains a combinatorially hard problem, but the presence of easy parameters typically (but not always) indicates that the gradient remains (positive/negative) over the complete space.

References

  1. Aberdeen, D.A.: Policy-gradient algorithms for partially observable Markov decision processes. Ph.D. thesis, The Australian National University (2003)

    Google Scholar 

  2. Alur, R., et al.: Syntax-guided synthesis. In: Dependable Software Systems Engineering, NATO Science for Peace and Security Series D: Information and Communication Security, vol. 40, pp. 1–25. IOS Press (2015)

    Google Scholar 

  3. Andriushchenko, R., Češka, M., Junges, S., Katoen, J.-P.: Inductive synthesis for probabilistic programs reaches new horizons. In: Groote, J.F., Larsen, K.G. (eds.) TACAS 2021. LNCS, vol. 12651, pp. 191–209. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72016-2_11

    Chapter  MATH  Google Scholar 

  4. Baier, C., Größer, M., Bertrand, N.: Probabilistic \(\omega \)-automata. J. ACM 59(1), 1:1-1:52 (2012)

    Article  Google Scholar 

  5. Baier, C., Hensel, C., Hutschenreiter, L., Junges, S., Katoen, J.P., Klein, J.: Parametric Markov chains: PCTL complexity and fraction-free Gaussian elimination. Inf. Comput. 272, 104504 (2020)

    Article  MathSciNet  Google Scholar 

  6. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  7. Bartocci, E., Grosu, R., Katsaros, P., Ramakrishnan, C.R., Smolka, S.A.: Model repair for probabilistic systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 326–340. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19835-9_30

    Chapter  MATH  Google Scholar 

  8. Bork, A., Junges, S., Katoen, J.-P., Quatmann, T.: Verification of indefinite-horizon POMDPs. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302, pp. 288–304. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59152-6_16

    Chapter  Google Scholar 

  9. Carr, S., Jansen, N., Topcu, U.: Verifiable RNN-based policies for POMDPs under temporal logic constraints. In: IJCAI, pp. 4121–4127. ijcai.org (2020)

    Google Scholar 

  10. Chen, T., Feng, Y., Rosenblum, D.S., Su, G.: Perturbation analysis in verification of discrete-time Markov chains. In: Baldan, P., Gorla, D. (eds.) CONCUR 2014. LNCS, vol. 8704, pp. 218–233. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44584-6_16

    Chapter  Google Scholar 

  11. Chen, T., Hahn, E.M., Han, T., Kwiatkowska, M.Z., Qu, H., Zhang, L.: Model repair for Markov decision processes. In: TASE. IEEE (2013)

    Google Scholar 

  12. Cubuktepe, M., et al.: Sequential convex programming for the efficient verification of parametric MDPs. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 133–150. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_8

    Chapter  Google Scholar 

  13. Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Synthesis in pMDPs: a tale of 1001 parameters. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 160–176. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_10

    Chapter  Google Scholar 

  14. Daws, C.: Symbolic and parametric model checking of discrete-time Markov chains. In: Liu, Z., Araki, K. (eds.) ICTAC 2004. LNCS, vol. 3407, pp. 280–294. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31862-0_21

    Chapter  MATH  Google Scholar 

  15. Dehnert, C., et al.: PROPhESY: a PRObabilistic ParamEter SYnthesis tool. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 214–231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_13

    Chapter  Google Scholar 

  16. Droste, M., Kuich, W., Vogler, H.: Handbook of Weighted Automata. Springer, Heidelberg (2009)

    Book  Google Scholar 

  17. Fang, X., Calinescu, R., Gerasimou, S., Alhwikem, F.: Fast parametric model checking through model fragmentation. In: ICSE, pp. 835–846. IEEE (2021)

    Google Scholar 

  18. Filieri, A., Ghezzi, C., Tamburrelli, G.: Run-time efficient probabilistic model checking. In: ICSE. ACM (2011)

    Google Scholar 

  19. Fremont, D.J., Seshia, S.A.: Reactive control improvisation. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 307–326. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3_17

    Chapter  Google Scholar 

  20. Gainer, P., Hahn, E.M., Schewe, S.: Accelerated model checking of parametric Markov chains. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 300–316. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_18

    Chapter  Google Scholar 

  21. Giro, S., D’Argenio, P.R.: Quantitative model checking revisited: neither decidable nor approximable. In: Raskin, J.-F., Thiagarajan, P.S. (eds.) FORMATS 2007. LNCS, vol. 4763, pp. 179–194. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75454-1_14

    Chapter  Google Scholar 

  22. Hahn, E.M., Han, T., Zhang, L.: Synthesis for PCTL in parametric Markov decision processes. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 146–161. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_12

    Chapter  Google Scholar 

  23. Hahn, E.M., Hermanns, H., Zhang, L.: Probabilistic reachability for parametric Markov models. In: Păsăreanu, C.S. (ed.) SPIN 2009. LNCS, vol. 5578, pp. 88–106. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02652-2_10

    Chapter  Google Scholar 

  24. Han, J., Moraga, C.: The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Mira, J., Sandoval, F. (eds.) IWANN 1995. LNCS, vol. 930, pp. 195–201. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59497-3_175

    Chapter  Google Scholar 

  25. Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11427, pp. 344–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_20

    Chapter  Google Scholar 

  26. Heck, L., Spel, J., Junges, S., Moerman, J., Katoen, J.P.: Gradient-descent for randomized controllers under partial observability. CoRR abs/2111.04407 (2021, extended version)

    Google Scholar 

  27. Heck, L., Spel, J., Junges, S., Moerman, J., Katoen, J.P.: Gradient-descent for randomized controllers under partial observability (artifact). Zenodo (2021). https://doi.org/10.4121/14910426

  28. Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker storm. CoRR abs/2002.07080 (2020)

    Google Scholar 

  29. Horák, K., Bosanský, B., Chatterjee, K.: Goal-HSVI: heuristic search value iteration for goal POMDPs. In: IJCAI, pp. 4764–4770. ijcai.org (2018)

    Google Scholar 

  30. Israeli, A., Jalfon, M.: Token management schemes and random walks yield self-stabilizing mutual exclusion. In: PODC, pp. 119–131. ACM (1990)

    Google Scholar 

  31. Jansen, N., et al.: Accelerating parametric probabilistic verification. In: Norman, G., Sanders, W. (eds.) QEST 2014. LNCS, vol. 8657, pp. 404–420. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10696-0_31

    Chapter  Google Scholar 

  32. Junges, S.: Parameter synthesis in Markov models. Ph.D. thesis, RWTH Aachen University, Germany (2020)

    Google Scholar 

  33. Junges, S., Ábrahám, E., Hensel, C., Jansen, N., Katoen, J.P., Quatmann, T., Volk, M.: Parameter synthesis for Markov models. CoRR abs/1903.07993 (2019)

    Google Scholar 

  34. Junges, S., et al.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI. AUAI Press (2018)

    Google Scholar 

  35. Junges, S., Katoen, J.P., Pérez, G.A., Winkler, T.: The complexity of reachability in parametric Markov decision processes. J. Comput. Syst. Sci. 119, 183–210 (2021)

    Article  MathSciNet  Google Scholar 

  36. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)

    Article  MathSciNet  Google Scholar 

  37. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)

    Google Scholar 

  38. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  39. Lanotte, R., Maggiolo-Schettini, A., Troina, A.: Parametric probabilistic transition systems for system design and analysis. Formal Aspects Comput. 19(1), 93–109 (2007)

    Article  Google Scholar 

  40. Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR. OpenReview.net (2020)

    Google Scholar 

  41. Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. 39(1), 162–175 (1991)

    Article  MathSciNet  Google Scholar 

  42. Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)

    Article  MathSciNet  Google Scholar 

  43. Meuleau, N., Kim, K., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by searching the space of finite policies. In: UAI, pp. 417–426. Morgan Kaufmann (1999)

    Google Scholar 

  44. Meuleau, N., Peshkin, L., Kim, K., Kaelbling, L.P.: Learning finite-state controllers for partially observable environments. In: UAI, pp. 427–436. Morgan Kaufmann (1999)

    Google Scholar 

  45. Mnih, V., et al.: Playing Atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)

    Google Scholar 

  46. Moulay, E., Léchappé, V., Plestan, F.: Properties of the sign gradient descent algorithms. Inf. Sci. 492, 29–39 (2019)

    Article  MathSciNet  Google Scholar 

  47. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). In: Dokl. akad. nauk Sssr, vol. 269, pp. 543–547 (1983)

    Google Scholar 

  48. Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real-Time Syst. 53(3), 354–402 (2017). https://doi.org/10.1007/s11241-017-9269-4

    Article  MATH  Google Scholar 

  49. Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032. Morgan Kaufmann (2003)

    Google Scholar 

  50. Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4

    Chapter  MATH  Google Scholar 

  51. Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)

  52. Rumelhart, D.E.: Parallel Distributed Processing. MIT Press, Cambridge (1989)

    Google Scholar 

  53. Salmani, B., Katoen, J.-P.: Bayesian inference by symbolic model checking. In: Gribaudo, M., Jansen, D.N., Remke, A. (eds.) QEST 2020. LNCS, vol. 12289, pp. 115–133. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59854-9_9

    Chapter  Google Scholar 

  54. Scutari, M.: Bayesian network repository (2021). https://www.bnlearn.com/bnrepository/

  55. Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: NIPS, pp. 2164–2172. Curran Associates, Inc. (2010)

    Google Scholar 

  56. Smith, A.E., Coit, D.W., Baeck, T., Fogel, D., Michalewicz, Z.: Penalty functions. Handb. Evol. Comput. 97(1), C5 (1997)

    Google Scholar 

  57. Spaan, M.T.J., Vlassis, N.A.: Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24, 195–220 (2005)

    Article  Google Scholar 

  58. Spel, J., Junges, S., Katoen, J.-P.: Are parametric Markov chains monotonic? In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 479–496. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_28

    Chapter  Google Scholar 

  59. Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. In: ICML (3). JMLR Workshop and Conference Proceedings, vol. 28, pp. 1139–1147. JMLR.org (2013)

    Google Scholar 

  60. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)

    MATH  Google Scholar 

  61. Tieleman, T., Hinton, G.: Lecture 6.5–RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)

    Google Scholar 

  62. Vanderbei, R.J.: Linear programming - foundations and extensions, Kluwer International Series in Operations Research and Management Service, vol. 4. Kluwer (1998)

    Google Scholar 

  63. Walraven, E., Spaan, M.T.J.: Accelerated vector pruning for optimal POMDP solvers. In: AAAI, pp. 3672–3678. AAAI Press (2017)

    Google Scholar 

  64. Winterer, L., et al.: Strategy synthesis for POMDPs in robot planning via game-based abstractions. IEEE Trans. Autom. Control 66(3), 1040–1054 (2021)

    Article  MathSciNet  Google Scholar 

  65. Winterer, L., Wimmer, R., Jansen, N., Becker, B.: Strengthening deterministic policies for POMDPs. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 115–132. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_7

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jip Spel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Heck, L., Spel, J., Junges, S., Moerman, J., Katoen, JP. (2022). Gradient-Descent for Randomized Controllers Under Partial Observability. In: Finkbeiner, B., Wies, T. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2022. Lecture Notes in Computer Science(), vol 13182. Springer, Cham. https://doi.org/10.1007/978-3-030-94583-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-94583-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-94582-4

  • Online ISBN: 978-3-030-94583-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics