Skip to main content

Advertisement

Log in

Automatic verification of competitive stochastic systems

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

We present automatic verification techniques for the modelling and analysis of probabilistic systems that incorporate competitive behaviour. These systems are modelled as turn-based stochastic multi-player games, in which the players can either collaborate or compete in order to achieve a particular goal. We define a temporal logic called rPATL for expressing quantitative properties of stochastic multi-player games. This logic allows us to reason about the collective ability of a set of players to achieve a goal relating to the probability of an event’s occurrence or the expected amount of cost/reward accumulated. We give an algorithm for verifying properties expressed in this logic and implement the techniques in a probabilistic model checker, as an extension of the PRISM tool. We demonstrate the applicability and efficiency of our methods by deploying them to analyse and detect potential weaknesses in a variety of large case studies, including algorithms for energy management in Microgrids and collective decision making for autonomous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Rational values can be handled by re-scaling all rewards by the lowest common multiple of the denominators of rewards appearing in the game. Note that re-scaling does not increase the size of the model, so the stated complexity results are not affected.

  2. The tool is currently available from: http://www.prismmodelchecker.org/games/.

  3. Models and properties are at: http://www.prismmodelchecker.org/files/fmsd-smg/.

References

  1. Aizatulin M, Schnoor H, Wilke T (2009) Computationally sound analysis of a probabilistic contract signing protocol. In: Proc of the 14th European symposium on research in computer security (ESORICS’09). LNCS, vol 5789. Springer, Berlin, pp 571–586

    Google Scholar 

  2. Alur R, Henzinger T, Kupferman O (2002) Alternating-time temporal logic. J ACM 49(5):672–713

    Article  MathSciNet  Google Scholar 

  3. Alur R, Henzinger T, Mang F, Qadeer S, Rajamani S, MOCHA ST (1998) Modularity in model checking. In: Proc of the 10th international conference on computer aided verification (CAV’98), Vancouver. LNCS, vol 1427. Springer, Berlin, pp 521–525

    Chapter  Google Scholar 

  4. Andova S, Hermanns H, Katoen J-P (2003) Discrete-time rewards model-checked. In: Proc of the formal modeling and analysis of timed systems (FORMATS’03). LNCS, vol 2791. Springer, Berlin, pp 88–104

    Chapter  Google Scholar 

  5. Baier C, Brázdil T, Größer M, Kucera A (2007) Stochastic game logic. In: Proc of the 4th international conference on quantitative evaluation of systems (QEST’07). IEEE Press, New York, pp 227–236

    Chapter  Google Scholar 

  6. Baier C, Katoen J-P (2008) Principles of model checking. MIT Press, Cambridge

    MATH  Google Scholar 

  7. Ballarini P, Fisher M, Wooldridge M (2006) Uncertain agent verification through probabilistic model-checking. In: Proc of the 3rd international workshop on safety and security in multi-agent systems (SASEMAS’06)

    Google Scholar 

  8. Bianco A, de Alfaro L (1995) Model checking of probabilistic and nondeterministic systems. In: Thiagarajan P (ed) Proc of the 15th conference on foundations of software technology and theoretical computer science (FSTTCS’95). LNCS, vol 1026. Springer, Berlin, pp 499–513

    Chapter  Google Scholar 

  9. Brihaye T, Markey N, Ghannem M, Rieg L (2008) Good friends are hard to find! In: Demri S, Jensen C (eds) Proc of the 15th international symposium on temporal representation and reasoning (TIME’08). IEEE Press, New York, pp 32–40

    Google Scholar 

  10. Bulling N, Jamroga W (2009) What agents can probably enforce. Fundam Inform 93(1–3):81–96

    MathSciNet  MATH  Google Scholar 

  11. Cerný P, Chatterjee K, Henzinger T, Radhakrishna A, Singh R (2011) Quantitative synthesis for concurrent programs. In: Gopalakrishnan G, Qadeer S (eds) Proc of the 23rd international conference on computer aided verification (CAV’11). LNCS, vol 6806. Springer, Berlin, pp 243–259

    Chapter  Google Scholar 

  12. Chatterjee K (2007) Stochastic ω-regular games. PhD thesis, University of California at Berkeley

  13. Chatterjee K, Henzinger T (2008) Value iteration. In: 25 years of model checking, pp 107–138

    Chapter  Google Scholar 

  14. Chatterjee K, Henzinger T, Jobstmann B, Gist AR (2010) A solver for probabilistic games. In: Proc of the 22nd international conference on computer aided verification (CAV’10). LNCS. Springer, Berlin, pp 665–669

    Chapter  Google Scholar 

  15. Chatterjee K, Jurdzinski M, Henzinger T (2004) Quantitative stochastic parity games. In: Munro JI (ed) Proc of the 15th annual ACM-SIAM symposium on discrete algorithms (SODA’04). SIAM, Philadelphia, pp 121–130

    Google Scholar 

  16. Chen T, Forejt V, Kwiatkowska M, Parker D, Simaitis A (2012) Automatic verification of competitive stochastic systems. In: Flanagan C, König B (eds) Proc of the 18th international conference on tools and algorithms for the construction and analysis of systems (TACAS’12). LNCS, vol 7214. Springer, Berlin, pp 315–330

    Chapter  Google Scholar 

  17. Chen T, Kwiatkowska M, Parker D, Simaitis A (2011) Verifying team formation protocols with probabilistic model checking. In: Proc of the 12th international workshop on computational logic in multi-agent systems (CLIMA XII 2011). LNCS, vol 6814. Springer, Berlin, pp 190–297

    Chapter  Google Scholar 

  18. Chen T, Lu J (2007) Probabilistic alternating-time temporal logic and model checking algorithm. In: Proc of the 4th international conference on fuzzy systems and knowledge discovery (FSKD’07). IEEE Press, New York, pp 35–39

    Chapter  Google Scholar 

  19. Condon A (1993) On algorithms for simple stochastic games. In: Advances in computational complexity theory. DIMACS series in discrete mathematics and theoretical computer science, vol 13, pp 51–73

    Google Scholar 

  20. Courcoubetis C, Yannakakis M (1995) The complexity of probabilistic verification. J ACM 42(4):857–907

    Article  MathSciNet  MATH  Google Scholar 

  21. de Alfaro L (1999) Computing minimum and maximum reachability times in probabilistic systems. In: Baeten J, Mauw S (eds) Proc of the 10th international conference on concurrency theory (CONCUR’99). LNCS, vol 1664. Springer, Berlin, pp 66–81

    Chapter  Google Scholar 

  22. de Alfaro L, Henzinger T (2000) Concurrent omega-regular games. In: Proc of the 15th annual IEEE symposium on logic in computer science. IEEE Comput Soc, Los Alamitos, pp 141–154

    Google Scholar 

  23. Filar J, Vrieze K (1997) Competitive Markov decision processes. Springer, Berlin

    MATH  Google Scholar 

  24. Forejt V, Kwiatkowska M, Norman G, Parker D (2011) Automated verification techniques for probabilistic systems. In: Bernardo M, Issarny V (eds) Formal methods for eternal networked software systems (SFM’11). LNCS, vol 6659. Springer, Berlin, pp 53–113

    Chapter  Google Scholar 

  25. Hansson H, Jonsson B (1994) A logic for reasoning about time and reliability. Form Asp Comput 6(5):512–535

    Article  MATH  Google Scholar 

  26. Hildmann H, Saffre F (2011) Influence of variable supply and load flexibility on demand-side management. In: Proc of the 8th international conference on the European energy market (EEM’11), pp 63–68

    Google Scholar 

  27. Kremer S, Raskin J-F (2003) A game-based verification of non-repudiation and fair exchange protocols. J Comput Secur 11(3):399–430

    Google Scholar 

  28. Kwiatkowska M, Norman G, Parker D (2011) PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan G, Qadeer S (eds) Proc of the 23rd international conference on computer aided verification (CAV’11). LNCS, vol 6806. Springer, Berlin, pp 585–591

    Chapter  Google Scholar 

  29. Laroussinie F, Markey N, Oreiby G (2007) On the expressiveness and complexity of ATL. In: Seidl H (ed) Proc of the 10th international conference on foundations of software science and computational structures (FOSSACS’07). LNCS, vol 4423. Springer, Berlin, pp 243–257

    Google Scholar 

  30. Lomuscio A, Qu H, Raimondi F (2009) MCMAS: a model checker for the verification of multi-agent systems. In: Proc of the 21st international conference on computer aided verification (CAV’09). LNCS, vol 5643. Springer, Berlin, pp 682–688

    Chapter  Google Scholar 

  31. Martin D (1998) The determinacy of Blackwell games. J Symb Log 63(4):1565–1581

    Article  MATH  Google Scholar 

  32. McIver A, Morgan C (2007) Results on the quantitative mu-calculus qMu. ACM Trans Comput Log 8(1). doi:10.1145/1182613.1182616

  33. Saffre F, Simaitis A (2012) Host selection through collective decision. ACM Trans Auton Adapt Syst 7(1). doi:10.1145/2168260.2168264

  34. Schnoor H (2010) Strategic planning for probabilistic games with incomplete information. In: Proc of the 9th international conference on autonomous agents and multiagent systems (AAMAS’10), pp 1057–1064

    Google Scholar 

  35. Ummels M (2010) Stochastic multiplayer games: theory and algorithms. PhD thesis, RWTH Aachen University

  36. van der Hoek W, Wooldridge M (2003) Model checking cooperation, knowledge, and time—a case study. Res Econ 57(3):235–265

    Article  Google Scholar 

  37. Zhang C, Pang J (2010) On probabilistic alternating simulations. In: Calude C, Sassone V (eds) Proc of the 6th IFIP conference on theoretical computer science (TCS’10). IFIP, vol 323. Springer, Berlin, pp 71–85

    Chapter  Google Scholar 

  38. Zhang C, Pang J (2012) An algorithm for probabilistic alternating simulation. In: Bieliková M, Friedrich G, Gottlob G, Katzenbeisser S, Turán G (eds) Proc of the 38th conference on current trends in theory and practice of computer science (SOFSEM’12). LNCS, vol 7147. Springer, Berlin, pp 431–442

    Google Scholar 

Download references

Acknowledgements

The authors are partially supported by ERC Advanced Grant VERIWARE, the Institute for the Future of Computing at the Oxford Martin School and EPSRC grant EP/F001096/1. Vojtěch Forejt is supported by a Royal Society Newton Fellowship. We also thank the anonymous referees for various helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aistis Simaitis.

Appendices

Appendix: Proofs

This appendix contains proofs for the results stated in the text. We begin by stating some known results that we will require later.

Theorem 2

[6, 12]

The following statements hold:

  1. 1.

    Memoryless deterministic strategies suffice for achieving minimum/maximum values in a state for extended reachability, Büchi, and coBüchi objectives in stochastic two-player zero-sum games.

  2. 2.

    Finding minimum/maximum values in a state for Markov decision processes (MDPs) for extended reachability, Büchi, and coBüchi objectives can be done in polynomial time.

Appendix A: Proofs for optimality of expected rewards

2.1 A.1 Finite-memory strategies for ⋆=0

We first show that finite-memory strategies are required for optimality of expected rewards of type ⋆=0, i.e. for optimal values of \(\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,0,T)}]\). Later, in the proof of Lemma 3 we show that the finite memory is indeed sufficient. Let us consider the following example:

figure a

The target set is T={s 1} and the reward structure r assigns 1 to s 0 and 0 to the other states. We analyse the optimal value of rew(r,0,T) in s 0. Let σ be a memoryless strategy that in s 0 picks a with probability x and b with probability 1−x. The reward obtained is then:

which, for any x, is lower than \(\frac{25}{9}\).

Now consider the strategy σ′ that is deterministic, and picks b on the first 8 visits to s 0 and then a on the 9th visit. The value under this strategy is:

$$9\cdot0.9^8 \approx3.8 > \frac{25}{9} $$

Remark

An optimal (memoryless deterministic) strategy in s 0 for both ⋆=∞ and ⋆=c is to take the action b and thus achieve values ∞ and 10, respectively.

2.2 A.2 Memoryless strategies for ⋆={∞,c}

Secondly, we prove that memoryless (deterministic) strategies suffice for optimality of the expected reward \(\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\star,T)}]\) for types ⋆={∞,c}.

If the expected value is infinite, then memoryless deterministic strategies suffice by Theorem 2 because this cases reduces to the problem of reaching a state where the expected value is infinite with positive probability. The states sT get value 0 by definition. Otherwise, the values \(\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\star,T)}]\) satisfy:

$$ \mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}\bigl[{\mathit{rew}(r, \star,T)}\bigr] =r(s) + \mathrm{opt}^s_{a\in A(s)}\sum _{s'\in S}\Delta(s,a) \bigl(s'\bigr)\cdot \mathbb{E}^{\max,\min}_{\mathcal{G}_C,s'}\bigl[{\mathit{rew}(r,\star,T)}\bigr] $$
(3)

Let A opt(s) be the set of actions that realise the optimum in s, where opt is max or min, for players 1 and 2, respectively; similarly opts is max if sS 1 and min if sS 2.

We first analyse the case ⋆=∞. Any strategy \(\sigma^{\infty}_{1}\in\varSigma_{1}\) that in s picks the action from A opt(s) is optimal. For player 2, any strategy \(\sigma^{\infty}_{2}\in\varSigma_{2}\) is optimal if it picks the action from A opt(s) in s such that T is reached almost surely under any counter-strategy for player 1.

Next, assume ⋆=c and let \(T_{0}=\{s\mid\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,c,T)}] = 0\}\). To optimise rew(r,c,T), we fix \(\sigma^{c}_{1}\in\varSigma_{1}\) that uses an action A opt(s) in s and ensures that T 0 is reached almost surely. For player 2, any strategy \(\sigma^{c}_{2}\in\varSigma_{2}\) is optimal if it picks an action from A opt(s) in s.

Proof of correctness of definitions of strategies

Given a state s and a strategy σ 1 for player 1, we denote:

$$\mathit{err}^{\sigma_1}(s) = \frac{\min_{\sigma_2\in\varSigma_2}\mathbb{E}^{\sigma_1,\sigma _2}_{\mathcal{G}_C,s}[{\mathit{rew}(r,\star,T)}]}{ \mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}\bigl[{\mathit{rew}(r,\star,T)}\bigr]} $$

where we assume \(\mathit{err}^{\sigma_{1}}(s)=1\) if the denominator is 0. Observe that we have \(\mathit{err}^{\sigma_{1}}(s)\cdot\mathbb {E}^{\max,\min }_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\star,T)}] = \min_{\sigma_{2}\in \varSigma _{2}}\mathbb{E}^{\sigma_{1},\sigma_{2}}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,\star ,T)}]\).

Let ⋆=c. We prove that the maximiser’s strategy \(\sigma=\sigma^{c}_{1}\) defined above is optimal. Assume, for a contradiction, that it is not, i.e. err σ(s)<1 for some s. For all s, we have:

(4)

and, for all sS 2, there must be an action a such that:

(5)

Fix s such that err σ(s)<1 is minimal. Thanks to (3), (4) and (5), we get that the value must also be minimal for all successors of s. However, this implies that T 0 is not reached with probability equal to 1 because, in every s′∈T 0, we have err σ(s′)=1.

The other cases (\(\sigma_{2}^{c}\), \(\sigma_{1}^{\infty}\) and \(\sigma_{1}^{\infty}\)) can be proved analogously.

Appendix B: Proofs of correctness for Sect. 4.3

In this section, we prove the correctness of the methods given in Sect. 4.3 for computing rew(r,⋆,T) for the cases ⋆={c,∞,0}.

3.1 B.1 Proof of correctness for ⋆=c

Let us first consider the states with infinite value. Recall that we denote by inf(a rew ) the set of paths that visit a state with positive reward infinitely often (and thus get infinite reward). If, for a state s, there is σ 1Σ 1 such that the probability \(\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathit {inf}(a_{\mathit{rew}}))\) is positive for all σ 2Σ 2, then the strategy σ 1 itself yields the infinite reward. In the other direction, suppose that for every σ 1Σ 1 there is some σ 2Σ 2 such that \(\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathit {inf}(a_{\mathit{rew}}))\) is equal to zero. It is straightforward to extend the results of [22] and prove that, for every σ 1, a strategy σ 2 exists which in addition ensures that the expected number of visits to a state satisfying a rew is finite and bounded from above. The rest follows easily because the rewards assigned to states are also bounded from above.

Let us now consider finite values. Because of the assumption that no reward is accumulated after visiting a target state, we can change the random variable and use ∑ j∈ℕ r(st λ (j)) instead of rew(r,c,T). It can be shown by induction that the expected value w.r.t. this variable can be obtained as lim i→∞ f s (i) where:

$$ f_{s}(i) = \begin{cases} 0 &\text{if }i=0\\ r(s) + \mathrm{opt}^s_{a\in Act(s)}\sum_{s'\in a_S}\Delta (s,a)(s')\cdot f_{s'}(i-1)&\text{otherwise} \end{cases} $$
(6)

We can then apply the Kleene fixpoint theorem and prove that lim i→∞ f s (i) is equal to the least fixpoint of (2).

3.2 B.2 Proof of correctness for ⋆=∞

First, observe that if a state s is assigned infinite value in the initial step, then we indeed have \(\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}]=\infty\) by definition. We prove the correctness for the other values. Let u:S→ℚ be a function that assigns to each s a value such that \(u(s) \ge\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}]\). Recall that we compute values of (2) by value iteration, i.e. we compute:

$$ f(s) (i) = \begin{cases} 0 &\text{if }s\in T\\ u(s) &\text{if }i=0\\ r(s) + \mathrm{opt}_{a\in A(s)}^s\sum_{s'\in S}\Delta(s,a)(s')\cdot f(s')(i-1)&\text{otherwise} \end{cases} $$

for sufficiently large i, and we show that \(\lim_{i\rightarrow \infty}f(s)(i) = \mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,\infty ,T)}]\).

Let us consider auxiliary functions \({\mathit{rew}_{u}^{i}}\) which assign numbers to paths as follows:

$$ {\mathit{rew}_{u}^{i}}(\lambda) = \begin{cases} \sum_{j< k} r(\mathit{st}_{\lambda}(j)) & \exists k\le i : \mathit{st}_{\lambda}(k)\in T\wedge \forall j<k : \mathit{st}_{\lambda}(j)\notin T,\\[6pt] \sum_{j< i} r(\mathit{st}_{\lambda}(j)) + u(\mathit {st}_{\lambda}(i)) & \text{otherwise.} \end{cases} $$

Intuitively, the function \({\mathit{rew}_{u}^{i}}\) alters the definition of rew(r,∞,T) by assigning rewards given by r for the first i steps, and then assigning the reward given by u, if the target has not been reached yet. One can easily prove by induction that the value of f(s)(i) is equal to \(\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}_{u}^{i}}]\).

We need to show that \(\lim_{i\rightarrow\infty} f(s)(i) \ge\mathbb {E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]\). This can be done by inductively showing that \(f(s)(i) \ge\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}]\) for every i. The base case i=0 follows from the definition of f and u, and the inductive steps follow by monotonicity of the function f.

Furthermore, we show that \(\lim_{i\rightarrow\infty} f(s)(i) \le \mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]\). Let σ minΣ 2 be a memoryless strategy satisfying \(\max_{\sigma\in\varSigma_{1}}\mathbb{E}^{\sigma,\sigma_{\min }}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}] =\) \(\mathbb {E}^{\max,\min }_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]\), i.e. σ min is the optimal minimising strategy for player 2. Let \(\tau(i) = \min_{\sigma\in\varSigma_{1}} \mathrm{Pr}_{s}^{\sigma, \sigma _{\min}} (\{\lambda\in{\varOmega_{{\mathcal{G}_{C},s}}} \mid \exists j\le i : \mathit{st}_{\lambda}(j)\in T\})\) be the minimal probability with which we end in T within i steps when playing according to σ min. We have lim i→∞ τ(i)=1, because otherwise player 1 would have a strategy to prevent the target from being reached almost surely and the reward obtained would be infinite. Thus, we have \(\max_{\sigma\in\varSigma_{1}} \mathbb{E}^{\sigma,\sigma_{\min }}_{s}[{\mathit{rew}_{u}^{i}}] \le\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}] + (1-\tau(i))\cdot K\) where K=max sS u(s). As we let i go to ∞, the second summand diminishes, and so \(f(s)(i)=\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit {rew}_{u}^{i}} ]\le \mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]\).

3.3 B.3 Proof of correctness for ⋆=0

Lemma 1

\(\sup_{\sigma_{1}\in\varSigma_{1}}\inf_{\sigma_{2}\in \varSigma_{2}}\mathbb{E}^{\sigma_{1}, \sigma_{2}}_{s}[{\mathit{rew}(r,0,T )}]=\infty\) iff there is σ 1Σ 1 such that for all \(\sigma_{2}{\in }\varSigma_{2}\ \mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma _{2}}(\mathit {inf}^{t}(a_{\mathit{rew}}))>0\).

Proof

In the direction ⇐, let q∈ℝ be any number. Player 1’s strategy σ to ensure that the expected reward achieved is at least q works as follows. Suppose σ 1 is such that \(\mathrm{Pr}_{\mathcal {G}_{C},s}^{\sigma_{1}, \sigma _{2}}(\mathit{inf}^{t}(a_{\mathit{rew}}))>p\) for all σ 2. By [22], we can safely assume that p>0. The strategy σ mimics a strategy σ 1Σ 1 if the history λ satisfies \({\mathrm{r}(\lambda)} < \frac {q}{p\cdot x^{|S|}}\) where x is the minimal probability that occurs in the game. When r(λ) exceeds this bound and the formula P >0[F t] is satisfied in the last state of λ, the strategy σ changes its behaviour and maximises the probability to reach T. Because memoryless deterministic strategies are sufficient for both players for reachability queries, σ can ensure that T is reached with probability at least x |S| from λ. The rest is a simple computation.

Let us analyse the direction ⇒. Similarly to the ⋆=c case, we can show that, if for every σ 1Σ 1 there is σ 2Σ 2 such that \(\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathit {inf}(a^{t}_{\mathit{rew}}))\) is equal to zero, then there is σ 2 which ensures that the expected number of visits to a state satisfying \(a^{t}_{\mathit{rew}}\) is finite. The rest follows as in ⋆=c; we only need to further consider that if the state satisfies a rew but not P >0[F t] (i.e. it gets nonzero reward but is not labelled with \(a^{t}_{\mathit{rew}}\)), then the reward achievable by player 1 in such a state is 0. □

Given the state s, we denote the set of actions which can be taken by the strategy which achieved maximum probability to reach T by A(s,T). We first show that, if player 1 wants to maximise the expected reward w.r.t. rew(r,0,T) using only actions from A(s,T) in each state, he can do so using a memoryless deterministic strategy.

Lemma 2

Let \(\varSigma_{1}^{T}\subseteq\varSigma_{1}\) contain all strategies that use only the actions from A(s,T) and \(\forall\sigma_{1} \in\varSigma_{1}^{T}{:}\min_{\sigma_{2}\in \varSigma_{2}} \mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma _{2}}(\mathsf{F} t)=\mathrm{Pr}_{\mathcal{G} _{C},s}^{\max, \min}(\mathsf{F} t)\). There is a memoryless deterministic strategy \(\sigma_{1}^{*}\in\varSigma_{1}^{T}\) satisfying:

$$\min_{\sigma_2\in\varSigma_2} \mathbb{E}^{\sigma_1^*,\sigma _2}_{\mathcal{G}_C,s}\bigl[{ \mathit{rew}(r,0,T)}\bigr] = \max_{\sigma_1\in\varSigma_1^T} \min_{\sigma_2\in\varSigma_2} \mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit {rew}(r,0,T)}\bigr] . $$

Proof

Assume the game is restricted so that the only actions available in s are A(s,T) for all s. We first create a new reward structure r′ defined by \(r'(s)=r(s) \cdot\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)\). We show that, for all \(\sigma_{1}\in\varSigma_{1}^{T}\) and σ 2Σ 2 with \(\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathsf{F} t)=\mathrm{Pr}_{\mathcal{G} _{C},s}^{\max, \min}(\mathsf{F} t)\), we have that:

$$\mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit{rew} \bigl(r',c,T\bigr)}\bigr] = \mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s} \bigl[{\mathit {rew}(r,0,T)}\bigr]\,, $$

from which the lemma follows directly, as memoryless deterministic strategies suffice for achieving the optimal value of rew(r′,c,T) (see the proof in Appendix A.2).

Let \({\varOmega_{{\mathcal{G}_{C},s}}}(T) \stackrel{{\mathrm{def}}}{=} \{ \lambda\in{\varOmega_{{\mathcal{G}_{C},s}}} \mid\exists i : \mathit{st}_{\lambda}(i)\in T\}\), and t(λ)=min i∈ℕ st λ (i)∈T. For any strategy profile σ 1,σ 2 such that \(\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathsf{F} t)=\mathrm{Pr}_{\mathcal{G} _{C},s}^{\max, \min}(\mathsf{F} t)\),

This completes the proof. □

Below, given a path h, we use \(\mathbb{E}^{\sigma_{1},\sigma_{2}}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,0,T )}\mid h]\) to denote the conditional expectation of rew(r,0,T) on infinite paths initiated in h, i.e.:

$$\mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit {rew}(r,0,T)}\mid h \bigr] =\frac{\int_{\{\lambda\mid\lambda\text{ starts with } h\} }{\mathrm{r}(\lambda)}\,d\mathrm{Pr}_{\mathcal{G}_C,s}^{\sigma _1,\sigma_2}}{\mathrm{Pr}_{\mathcal{G}_C,s}^{\sigma_1,\sigma_2}( \{\lambda\mid \lambda\text{ starts with } h\}} $$

Lemma 3

For each state sS, there exists a finite-memory strategy σ for player 1 which maximises the expected reward rew(r,0,T) from the state s. In particular, there exists some bound B such that for r(h)≥B, σ (h) becomes memoryless.

Proof

Fix two strategies σ 1Σ 1 and σ 2Σ 2. For each state sS and a path h=s 0 a 0 s 1s n ending in s′ we have that:

where rew(r,0,T)+r(h) is a random variable assigning rew(r,0,T)(λ)+r(h) to a path h reaching T, and 0 otherwise; and where \(\sigma_{i}^{h}(h') = \sigma_{i}(s_{0}a_{0}s_{1}\ldots s_{n-1}a_{n-1}{\cdot}h')\).

Given a state s, we use \(\mathit{PR}^{\underline{\max}}_{s}\) to denote the maximal reachability probability to reach T under the strategies for which at s, actions in A(s,T) are disallowed for a single step, i.e.:

$$\mathit{PR}^{\underline{\max}}_{s} = \max_{a\in A(s)\setminus A(s,T)} \sum _{s'\in S} \Delta(s,a) \bigl(s'\bigr)\cdot \mathrm{Pr}_{\mathcal {G}_C,s'}^{\max,\min }(\mathsf{F} t) $$

Intuitively, \(\mathit{PR}^{\underline{\max}}_{s}\) denotes the “second” maximal reachability probability. Below, we assume that A(s,T)≠Δ(s). Define:

$$B_s= \frac{\mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}[{\mathit {rew}(r,c,T )}]}{\mathrm{Pr}^{\max,\min}_{\mathcal{G}_C,s}(\mathsf {F} t)- \mathit{PR}^{\underline{\max}}_{s}} $$

Let B=max sS B s . We show that, on paths h ending in s and satisfying r(h)>B, no optimal strategy of player 1 can use actions from A(s)∖A(s,T) and, together with Lemma 2, we obtain the statement of this lemma.

Let h be a path ending in s n S 1 and satisfying r(h)>B. Assume σ 1(h) deterministically chooses action from A(s)∖A(s,T) (for randomised choices the argument follows analogously). By above we have, for any σ 2Σ 2:

which contradicts that σ 1 is optimal.

Clearly, the strategy optimising rew(r,0,T) is of finite-memory with upper bound B on the memory needed. □

By the equalities from the proof of Lemma 2 and by Lemma 3, the procedure described in step 2 of the algorithm on page 13 is correct. The procedure from step 3 of the algorithm is correct because, for all paths h, we have that:

$$\mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit {rew}(r,0,T)}\mid h \bigr] =\max_{a\in A(s)}\sum_{s'\in S}\Delta(s,a) \bigl(s'\bigr)\cdot\mathbb {E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{ \mathit{rew}(r,0,T)}\mid h{\cdot }a{\cdot}s'\bigr] . $$

Appendix C: Proof of Theorem 1

Theorem 1(a) Let φ be a rPATL formula with no \(\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi]\) operator and where k for the temporal operator U k is given in unary. The problem of deciding whether the formula is satisfied in s is in NPcoNP .

Proof

By equivalences such as the one in equation (1) on page 8, we can assume that all probabilistic and reward operators only contain bounds ≥ or >, so in the proof we assume ⋈∈{>,≥}.

Let φ 1,φ 2,…,φ n be the sequence of all state formulae occurring in φ. Also, if φ i ’s outermost operator is temporal, let C i denote the outermost coalition in φ i , and \(\varSigma^{C}_{j}\) denote the set of all memoryless deterministic strategies for player j in the coalition game \(\mathcal{G}_{C}\).

We show that the problem is in NPcoNP by describing a polynomial-size certificate c that allows us to check that a formula is (not) satisfied. The certificate c is a function that assigns an element of \(\varSigma^{C_{i}}_{1}\cup\varSigma^{C_{i}}_{2}\) to each tuple (i,s) where sS and φ i is a formula whose outermost operator is temporal:

  • If φ i ≡〈〈C〉〉 P q [ψ] and sφ i , then: c(i,s)=σ 1 for \(\sigma_{1}\in\varSigma^{C}_{1}\) such that \(\min_{\sigma_{2}\in\varSigma_{2}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q \) holds.

  • If φ i ≡〈〈C〉〉 P q [ψ] and \(s\not\models\varphi_{i}\), then: c(i,s)=σ 2 for \(\sigma_{2}\in\varSigma^{C}_{2}\) such that \(\max_{\sigma_{1}\in\varSigma_{1}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q \) does not hold.

  • If \(\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\) and sφ i , then: c(i,s)=σ 1 for \(\sigma_{1}\in\varSigma^{C}_{1}\) such that \(\min_{\sigma_{2}\in\varSigma_{2}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x \) holds.

  • If \(\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\) and \(s\not \models\varphi_{i}\), then: c(i,s)=σ 2 for \(\sigma_{2}\in\varSigma^{C}_{2}\) such that \(\max_{\sigma_{1}\in\varSigma_{1}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x \) does not hold.

The existence of the strategies assigned by c follows from Theorem 2 and from Appendix A.2.

To check the certificate in polynomial time, we compute Sat(φ′) for all state subformulae φ′ of φ, traversing the parse tree of φ bottom-up. Suppose that we are analysing a formula φ′ and that we have computed Sat(φ″) for all state subformulae φ″ of φ′. If φ′ is an atomic proposition or its outermost operator is a boolean connective, we construct Sat(φ″) in the obvious way. Otherwise:

$${\mathit{Sat}}\bigl(\varphi'\bigr)=\bigl\{s \mid c(i,s)\text{ is a strategy for the first player in the coalition game}\bigr\}. $$

We verify that our choice of Sat(φ′) is correct as follows. For all sSat(φ′), we construct an MDP from the appropriate coalition game by fixing the decisions of the first player according to c(i,s), and in polynomial time we check that the minimal probability (or reward) in the resulting MDP exceeds the bound given by the outermost operator of φ′ (see Theorem 2). If sSat(φ′), then we fix the decisions of the second player according to c(i,s) and proceed analogously, computing the maximal probabilities. □

Theorem 1(b) Model checking an arbitrary rPATL formula is in NEXPcoNEXP .

Proof

The proof is similar to that for Theorem 1(a) above. We only need to extend the certificate from the proof to provide a witnessing strategy for formulae of the form \(\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi]\). This is straightforward since, in the proof of Lemma 3, we showed that players need only strategies of exponential size.

In Lemma 3 we have shown that, for the optimal strategy, it suffices to play a deterministic memoryless strategy after a certain reward bound B has been reached and, before that, the strategy needs to remember only the reward accumulated along the history. The rewards are integers, therefore, the strategy in a state may need a different action for each value of reward below B, and one action for reward which is greater or equal to B. So, the overall size of the memory will be \(\mathcal{O}(|S|\times B)\). A deterministic strategy suffices in this case; observe that one could ‘embed’ the memory into the game by constructing a new game where the set of states is S×{0,…,B+r max−1}∪{s f }. The transition relation is preserved for states (s,k) where k<B, and states (s,k) where kB have a transition to s f only. The reward structure r assigns reward R (s,k) to states where kB, which can be computed using step 2 of the algorithm for ⋆=0 in Sect. 4, and 0 to all other states. Then, the deterministic memoryless strategy that maximises rew(r,c,{s f }) in this new game will also be an optimal strategy in the original game (but requiring memory of size B). The size of B can be at most exponential in the size of \(\mathcal {G}\), i.e. from Lemma 3 it follows that the size of B for a state s is bounded by

$$ B_s= \frac{\mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}[{\mathit {rew}(r,c,T )}]}{\mathrm{Pr}^{\max,\min}_{\mathcal{G}_C,s}(\mathsf {F} t)- \mathit{PR}^{\underline{\max}}_{s}}. $$
(7)

We claim that all \(\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,c,T)}]\), \(\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)\) and \(\mathit{PR}^{\underline{\max}}_{s}\) can be represented as fractions of integers whose binary representation is polynomial in the size of the input, from which the bound on the size of B s follows. For \(\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,c,T)}]\) (or \(\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)\), \(\mathit{PR}^{\underline{\max}}_{s}\)), fixing the optimal strategies for both players we can construct a linear program whose size is polynomial in the size of input, and whose solution is equal to \(\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,c,T)}]\) (or \(\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)\), \(\mathit {PR}^{\underline{\max}}_{s}\), respectively). Because the solution of the linear program can be represented as a fraction of two integers of polynomial binary representations, we get the claim. Therefore, B s is at most exponential in the size of \(\mathcal{G}\). □

Appendix D: Proof of correctness for Sect. 4.4

Price-bounded coalitions

In the proof of Theorem 1 (when all coalitions were fixed), we exploited the fact that there is an exponential size certificate c, which is a function that assigns an element of \(\varSigma^{C_{i}}_{1}\cup \varSigma^{C_{i}}_{2}\) to each tuple (i,s) where sS and φ i is a formula whose outermost operator is temporal.

We extend this approach by changing the certificate c so that it returns an element of \(\varSigma^{C}_{1}\cup\varSigma^{C}_{2}\) to each tuple (i,s,C), where sS, φ i is a formula whose outermost operator is temporal, CΠ, and ⋈∈{>,≥}:

  • If φ i ≡〈〈C〉〉 P q [ψ] or φ i ≡〈〈?〉〉y P q [ψ], and s⊨〈〈C〉〉 P q [ψ], then: c(i,s,C)=σ 1 for \(\sigma_{1}\in\varSigma^{C}_{1}\) such that \(\min_{\sigma_{2}\in\varSigma_{2}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q \) holds.

  • If φ i ≡〈〈C〉〉 P q [ψ] or φ i ≡〈〈?〉〉y P q [ψ] , and \(s\not\models\langle\!\langle C \rangle\!\rangle \,\mathsf {P}_{\bowtie q} [\psi]\), then: c(i,s,C)=σ 2 for \(\sigma_{2}\in\varSigma^{C}_{2}\) such that \(\max_{\sigma_{1}\in\varSigma_{1}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q \) does not hold.

  • If \(\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\) or \(\varphi_{i}\equiv\langle\!\langle ? \rangle\!\rangle_{\le y} \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\), and \(s\models \langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\), then: c(i,s,C)=σ 1 for \(\sigma_{1}\in\varSigma^{C}_{1}\) such that \(\min_{\sigma_{2}\in\varSigma_{2}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x \) holds.

  • If \(\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\) or \(\varphi_{i}\equiv\langle\!\langle ? \rangle\!\rangle_{\le y} \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\), and \(s\not\models \langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\), then: c(i,s,C)=σ 2 for \(\sigma_{2}{\in} \varSigma^{C}_{2}\) such that \(\max_{\sigma_{1}\in\varSigma_{1}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x \) does not hold.

  • If φ i ≡〈〈C′〉〉 P q [ψ] φ i ≡〈〈C′〉〉 P q [ψ], \(\varphi_{i}\equiv\langle\!\langle C' \rangle\!\rangle \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]\), or \(\varphi_{i}\equiv\langle\!\langle C' \rangle\!\rangle \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star }\phi]\), and C′≠C, then c(i,s,C) returns an arbitrary memoryless deterministic strategy from \(\varSigma^{C'}_{1}\cup\varSigma^{C'}_{2}\).

As before, the existence of strategies assigned by c follows from Theorem 2 and from Appendix A.2: for all formulae but \(\langle\!\langle C \rangle\!\rangle \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi ]\) and \(\langle\!\langle ? \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi]\), memoryless deterministic strategies exist; and, for the aforementioned formulae, exponential memory deterministic strategies suffice.

To check the certificate in polynomial time (in the worst-case size of c, which is exponential in the size of the model), we compute Sat(φ′) for all state subformulae φ′ of φ, traversing the parse tree of φ bottom-up. Suppose that we are analysing a formula φ′ and that we have computed Sat(φ″) for all state subformulae φ″ of φ′. If φ′ is an atomic proposition or its outermost operator is a boolean connective, we construct Sat(φ′) in the obvious way. Otherwise, if the outermost coalition in φ′ is fixed to C:

$${\mathit{Sat}}\bigl(\varphi'\bigr)=\bigl\{s \mid c(i,s,C) \in \varSigma_1^C\bigr\} $$

and, if the outermost coalition in φ′ is not specified, but a coalition of price ≤y is required:

$${\mathit{Sat}}\bigl(\varphi'\bigr)=\biggl\{s \mid\exists C \subseteq\varPi: \sum_{\gamma\in C} p(\gamma) \le y \text{ and } c(i,s,C)\in\varSigma_1^C\biggr\}. $$

We verify that our choice of Sat(φ′) is correct as follows. For all sSat(φ′), we construct an MDP from the appropriate coalition game by fixing the decisions of the first player according to c(i,s,C) and in polynomial time we check that the minimal probability (or reward) in the resulting MDP exceeds the bound given by the outermost operator of φ′ (see Theorem 2). If sSat(φ′), and the outermost coalition of φ′ is C, then we fix the decisions of the second player according to c(i,s,C) and proceed analogously, computing the maximal probabilities. If sSat(φ′), and the outermost coalition of φ′ is not specified, but required to be of price at most y, we need to construct MDPs from the coalition games \(\mathcal{G}_{C}\) for all C where ∑ γC p(γ)≤y by fixing the decisions of the second player and computing the maximal probabilities. There are only polynomially many (in the size of c) possible choices of C (the number of different coalitions is exponential in the size of \(\mathcal{G}\), but the certificate is exponential in \(\mathcal{G}\) too), and each choice can be checked in polynomial time.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, T., Forejt, V., Kwiatkowska, M. et al. Automatic verification of competitive stochastic systems. Form Methods Syst Des 43, 61–92 (2013). https://doi.org/10.1007/s10703-013-0183-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10703-013-0183-7

Keywords

Navigation