Automatic verification of competitive stochastic systems

Chen, Taolue; Forejt, Vojtěch; Kwiatkowska, Marta; Parker, David; Simaitis, Aistis

doi:10.1007/s10703-013-0183-7

Automatic verification of competitive stochastic systems

Published: 16 February 2013

Volume 43, pages 61–92, (2013)
Cite this article

Formal Methods in System Design Aims and scope Submit manuscript

Taolue Chen¹,
Vojtěch Forejt¹,
Marta Kwiatkowska¹,
David Parker² &
…
Aistis Simaitis¹

1106 Accesses
79 Citations
Explore all metrics

Abstract

We present automatic verification techniques for the modelling and analysis of probabilistic systems that incorporate competitive behaviour. These systems are modelled as turn-based stochastic multi-player games, in which the players can either collaborate or compete in order to achieve a particular goal. We define a temporal logic called rPATL for expressing quantitative properties of stochastic multi-player games. This logic allows us to reason about the collective ability of a set of players to achieve a goal relating to the probability of an event’s occurrence or the expected amount of cost/reward accumulated. We give an algorithm for verifying properties expressed in this logic and implement the techniques in a probabilistic model checker, as an extension of the PRISM tool. We demonstrate the applicability and efficiency of our methods by deploying them to analyse and detect potential weaknesses in a variety of large case studies, including algorithms for energy management in Microgrids and collective decision making for autonomous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PRISM-games: A Model Checker for Stochastic Multi-Player Games

Automated Verification of Concurrent Stochastic Games

Automatic verification of concurrent stochastic systems

Article Open access 22 January 2021

Notes

Rational values can be handled by re-scaling all rewards by the lowest common multiple of the denominators of rewards appearing in the game. Note that re-scaling does not increase the size of the model, so the stated complexity results are not affected.
The tool is currently available from: http://www.prismmodelchecker.org/games/.
Models and properties are at: http://www.prismmodelchecker.org/files/fmsd-smg/.

References

Aizatulin M, Schnoor H, Wilke T (2009) Computationally sound analysis of a probabilistic contract signing protocol. In: Proc of the 14th European symposium on research in computer security (ESORICS’09). LNCS, vol 5789. Springer, Berlin, pp 571–586
Google Scholar
Alur R, Henzinger T, Kupferman O (2002) Alternating-time temporal logic. J ACM 49(5):672–713
Article MathSciNet Google Scholar
Alur R, Henzinger T, Mang F, Qadeer S, Rajamani S, MOCHA ST (1998) Modularity in model checking. In: Proc of the 10th international conference on computer aided verification (CAV’98), Vancouver. LNCS, vol 1427. Springer, Berlin, pp 521–525
Chapter Google Scholar
Andova S, Hermanns H, Katoen J-P (2003) Discrete-time rewards model-checked. In: Proc of the formal modeling and analysis of timed systems (FORMATS’03). LNCS, vol 2791. Springer, Berlin, pp 88–104
Chapter Google Scholar
Baier C, Brázdil T, Größer M, Kucera A (2007) Stochastic game logic. In: Proc of the 4th international conference on quantitative evaluation of systems (QEST’07). IEEE Press, New York, pp 227–236
Chapter Google Scholar
Baier C, Katoen J-P (2008) Principles of model checking. MIT Press, Cambridge
MATH Google Scholar
Ballarini P, Fisher M, Wooldridge M (2006) Uncertain agent verification through probabilistic model-checking. In: Proc of the 3rd international workshop on safety and security in multi-agent systems (SASEMAS’06)
Google Scholar
Bianco A, de Alfaro L (1995) Model checking of probabilistic and nondeterministic systems. In: Thiagarajan P (ed) Proc of the 15th conference on foundations of software technology and theoretical computer science (FSTTCS’95). LNCS, vol 1026. Springer, Berlin, pp 499–513
Chapter Google Scholar
Brihaye T, Markey N, Ghannem M, Rieg L (2008) Good friends are hard to find! In: Demri S, Jensen C (eds) Proc of the 15th international symposium on temporal representation and reasoning (TIME’08). IEEE Press, New York, pp 32–40
Google Scholar
Bulling N, Jamroga W (2009) What agents can probably enforce. Fundam Inform 93(1–3):81–96
MathSciNet MATH Google Scholar
Cerný P, Chatterjee K, Henzinger T, Radhakrishna A, Singh R (2011) Quantitative synthesis for concurrent programs. In: Gopalakrishnan G, Qadeer S (eds) Proc of the 23rd international conference on computer aided verification (CAV’11). LNCS, vol 6806. Springer, Berlin, pp 243–259
Chapter Google Scholar
Chatterjee K (2007) Stochastic ω-regular games. PhD thesis, University of California at Berkeley
Chatterjee K, Henzinger T (2008) Value iteration. In: 25 years of model checking, pp 107–138
Chapter Google Scholar
Chatterjee K, Henzinger T, Jobstmann B, Gist AR (2010) A solver for probabilistic games. In: Proc of the 22nd international conference on computer aided verification (CAV’10). LNCS. Springer, Berlin, pp 665–669
Chapter Google Scholar
Chatterjee K, Jurdzinski M, Henzinger T (2004) Quantitative stochastic parity games. In: Munro JI (ed) Proc of the 15th annual ACM-SIAM symposium on discrete algorithms (SODA’04). SIAM, Philadelphia, pp 121–130
Google Scholar
Chen T, Forejt V, Kwiatkowska M, Parker D, Simaitis A (2012) Automatic verification of competitive stochastic systems. In: Flanagan C, König B (eds) Proc of the 18th international conference on tools and algorithms for the construction and analysis of systems (TACAS’12). LNCS, vol 7214. Springer, Berlin, pp 315–330
Chapter Google Scholar
Chen T, Kwiatkowska M, Parker D, Simaitis A (2011) Verifying team formation protocols with probabilistic model checking. In: Proc of the 12th international workshop on computational logic in multi-agent systems (CLIMA XII 2011). LNCS, vol 6814. Springer, Berlin, pp 190–297
Chapter Google Scholar
Chen T, Lu J (2007) Probabilistic alternating-time temporal logic and model checking algorithm. In: Proc of the 4th international conference on fuzzy systems and knowledge discovery (FSKD’07). IEEE Press, New York, pp 35–39
Chapter Google Scholar
Condon A (1993) On algorithms for simple stochastic games. In: Advances in computational complexity theory. DIMACS series in discrete mathematics and theoretical computer science, vol 13, pp 51–73
Google Scholar
Courcoubetis C, Yannakakis M (1995) The complexity of probabilistic verification. J ACM 42(4):857–907
Article MathSciNet MATH Google Scholar
de Alfaro L (1999) Computing minimum and maximum reachability times in probabilistic systems. In: Baeten J, Mauw S (eds) Proc of the 10th international conference on concurrency theory (CONCUR’99). LNCS, vol 1664. Springer, Berlin, pp 66–81
Chapter Google Scholar
de Alfaro L, Henzinger T (2000) Concurrent omega-regular games. In: Proc of the 15th annual IEEE symposium on logic in computer science. IEEE Comput Soc, Los Alamitos, pp 141–154
Google Scholar
Filar J, Vrieze K (1997) Competitive Markov decision processes. Springer, Berlin
MATH Google Scholar
Forejt V, Kwiatkowska M, Norman G, Parker D (2011) Automated verification techniques for probabilistic systems. In: Bernardo M, Issarny V (eds) Formal methods for eternal networked software systems (SFM’11). LNCS, vol 6659. Springer, Berlin, pp 53–113
Chapter Google Scholar
Hansson H, Jonsson B (1994) A logic for reasoning about time and reliability. Form Asp Comput 6(5):512–535
Article MATH Google Scholar
Hildmann H, Saffre F (2011) Influence of variable supply and load flexibility on demand-side management. In: Proc of the 8th international conference on the European energy market (EEM’11), pp 63–68
Google Scholar
Kremer S, Raskin J-F (2003) A game-based verification of non-repudiation and fair exchange protocols. J Comput Secur 11(3):399–430
Google Scholar
Kwiatkowska M, Norman G, Parker D (2011) PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan G, Qadeer S (eds) Proc of the 23rd international conference on computer aided verification (CAV’11). LNCS, vol 6806. Springer, Berlin, pp 585–591
Chapter Google Scholar
Laroussinie F, Markey N, Oreiby G (2007) On the expressiveness and complexity of ATL. In: Seidl H (ed) Proc of the 10th international conference on foundations of software science and computational structures (FOSSACS’07). LNCS, vol 4423. Springer, Berlin, pp 243–257
Google Scholar
Lomuscio A, Qu H, Raimondi F (2009) MCMAS: a model checker for the verification of multi-agent systems. In: Proc of the 21st international conference on computer aided verification (CAV’09). LNCS, vol 5643. Springer, Berlin, pp 682–688
Chapter Google Scholar
Martin D (1998) The determinacy of Blackwell games. J Symb Log 63(4):1565–1581
Article MATH Google Scholar
McIver A, Morgan C (2007) Results on the quantitative mu-calculus qMu. ACM Trans Comput Log 8(1). doi:10.1145/1182613.1182616
Saffre F, Simaitis A (2012) Host selection through collective decision. ACM Trans Auton Adapt Syst 7(1). doi:10.1145/2168260.2168264
Schnoor H (2010) Strategic planning for probabilistic games with incomplete information. In: Proc of the 9th international conference on autonomous agents and multiagent systems (AAMAS’10), pp 1057–1064
Google Scholar
Ummels M (2010) Stochastic multiplayer games: theory and algorithms. PhD thesis, RWTH Aachen University
van der Hoek W, Wooldridge M (2003) Model checking cooperation, knowledge, and time—a case study. Res Econ 57(3):235–265
Article Google Scholar
Zhang C, Pang J (2010) On probabilistic alternating simulations. In: Calude C, Sassone V (eds) Proc of the 6th IFIP conference on theoretical computer science (TCS’10). IFIP, vol 323. Springer, Berlin, pp 71–85
Chapter Google Scholar
Zhang C, Pang J (2012) An algorithm for probabilistic alternating simulation. In: Bieliková M, Friedrich G, Gottlob G, Katzenbeisser S, Turán G (eds) Proc of the 38th conference on current trends in theory and practice of computer science (SOFSEM’12). LNCS, vol 7147. Springer, Berlin, pp 431–442
Google Scholar

Download references

Acknowledgements

The authors are partially supported by ERC Advanced Grant VERIWARE, the Institute for the Future of Computing at the Oxford Martin School and EPSRC grant EP/F001096/1. Vojtěch Forejt is supported by a Royal Society Newton Fellowship. We also thank the anonymous referees for various helpful comments.

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, UK
Taolue Chen, Vojtěch Forejt, Marta Kwiatkowska & Aistis Simaitis
School of Computer Science, University of Birmingham, Birmingham, UK
David Parker

Authors

Taolue Chen
View author publications
You can also search for this author in PubMed Google Scholar
Vojtěch Forejt
View author publications
You can also search for this author in PubMed Google Scholar
Marta Kwiatkowska
View author publications
You can also search for this author in PubMed Google Scholar
David Parker
View author publications
You can also search for this author in PubMed Google Scholar
Aistis Simaitis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aistis Simaitis.

Appendices

Appendix: Proofs

This appendix contains proofs for the results stated in the text. We begin by stating some known results that we will require later.

Theorem 2

[6, 12]

The following statements hold:

1.
Memoryless deterministic strategies suffice for achieving minimum/maximum values in a state for extended reachability, Büchi, and coBüchi objectives in stochastic two-player zero-sum games.
2.
Finding minimum/maximum values in a state for Markov decision processes (MDPs) for extended reachability, Büchi, and coBüchi objectives can be done in polynomial time.

Appendix A: Proofs for optimality of expected rewards

2.1 A.1 Finite-memory strategies for ⋆=0

We first show that finite-memory strategies are required for optimality of expected rewards of type ⋆=0, i.e. for optimal values of $\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,0,T)}]$. Later, in the proof of Lemma 3 we show that the finite memory is indeed sufficient. Let us consider the following example:

The target set is T={s ₁} and the reward structure r assigns 1 to s ₀ and 0 to the other states. We analyse the optimal value of rew(r,0,T) in s ₀. Let σ be a memoryless strategy that in s ₀ picks a with probability x and b with probability 1−x. The reward obtained is then:

which, for any x, is lower than $\frac{25}{9}$.

Now consider the strategy σ′ that is deterministic, and picks b on the first 8 visits to s ₀ and then a on the 9th visit. The value under this strategy is:

$$9\cdot0.9^8 \approx3.8 > \frac{25}{9} $$

Remark

An optimal (memoryless deterministic) strategy in s ₀ for both ⋆=∞ and ⋆=c is to take the action b and thus achieve values ∞ and 10, respectively.

2.2 A.2 Memoryless strategies for ⋆={∞,c}

Secondly, we prove that memoryless (deterministic) strategies suffice for optimality of the expected reward $\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\star,T)}]$ for types ⋆={∞,c}.

If the expected value is infinite, then memoryless deterministic strategies suffice by Theorem 2 because this cases reduces to the problem of reaching a state where the expected value is infinite with positive probability. The states s∈T get value 0 by definition. Otherwise, the values $\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\star,T)}]$ satisfy:

$$ \mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}\bigl[{\mathit{rew}(r, \star,T)}\bigr] =r(s) + \mathrm{opt}^s_{a\in A(s)}\sum _{s'\in S}\Delta(s,a) \bigl(s'\bigr)\cdot \mathbb{E}^{\max,\min}_{\mathcal{G}_C,s'}\bigl[{\mathit{rew}(r,\star,T)}\bigr] $$

(3)

Let A _opt(s) be the set of actions that realise the optimum in s, where opt is max or min, for players 1 and 2, respectively; similarly opt^s is max if s∈S ₁ and min if s∈S ₂.

We first analyse the case ⋆=∞. Any strategy $\sigma^{\infty}_{1}\in\varSigma_{1}$ that in s picks the action from A _opt(s) is optimal. For player 2, any strategy $\sigma^{\infty}_{2}\in\varSigma_{2}$ is optimal if it picks the action from A _opt(s) in s such that T is reached almost surely under any counter-strategy for player 1.

Next, assume ⋆=c and let $T_{0}=\{s\mid\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,c,T)}] = 0\}$. To optimise rew(r,c,T), we fix $\sigma^{c}_{1}\in\varSigma_{1}$ that uses an action A _opt(s) in s and ensures that T ₀ is reached almost surely. For player 2, any strategy $\sigma^{c}_{2}\in\varSigma_{2}$ is optimal if it picks an action from A _opt(s) in s.

Proof of correctness of definitions of strategies

Given a state s and a strategy σ ₁ for player 1, we denote:

$$\mathit{err}^{\sigma_1}(s) = \frac{\min_{\sigma_2\in\varSigma_2}\mathbb{E}^{\sigma_1,\sigma _2}_{\mathcal{G}_C,s}[{\mathit{rew}(r,\star,T)}]}{ \mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}\bigl[{\mathit{rew}(r,\star,T)}\bigr]} $$

where we assume $\mathit{err}^{\sigma_{1}}(s)=1$ if the denominator is 0. Observe that we have $\mathit{err}^{\sigma_{1}}(s)\cdot\mathbb {E}^{\max,\min }_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\star,T)}] = \min_{\sigma_{2}\in \varSigma _{2}}\mathbb{E}^{\sigma_{1},\sigma_{2}}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,\star ,T)}]$.

Let ⋆=c. We prove that the maximiser’s strategy $\sigma=\sigma^{c}_{1}$ defined above is optimal. Assume, for a contradiction, that it is not, i.e. err ^σ(s)<1 for some s. For all s, we have:

(4)

and, for all s∈S ₂, there must be an action a such that:

(5)

Fix s such that err ^σ(s)<1 is minimal. Thanks to (3), (4) and (5), we get that the value must also be minimal for all successors of s. However, this implies that T ₀ is not reached with probability equal to 1 because, in every s′∈T ₀, we have err ^σ(s′)=1.

The other cases ($\sigma_{2}^{c}$, $\sigma_{1}^{\infty}$ and $\sigma_{1}^{\infty}$) can be proved analogously.

Appendix B: Proofs of correctness for Sect. 4.3

In this section, we prove the correctness of the methods given in Sect. 4.3 for computing rew(r,⋆,T) for the cases ⋆={c,∞,0}.

3.1 B.1 Proof of correctness for ⋆=c

Let us first consider the states with infinite value. Recall that we denote by inf(a _rew) the set of paths that visit a state with positive reward infinitely often (and thus get infinite reward). If, for a state s, there is σ ₁∈Σ ₁ such that the probability $\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathit {inf}(a_{\mathit{rew}}))$ is positive for all σ ₂∈Σ ₂, then the strategy σ ₁ itself yields the infinite reward. In the other direction, suppose that for every σ ₁∈Σ ₁ there is some σ ₂∈Σ ₂ such that $\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathit {inf}(a_{\mathit{rew}}))$ is equal to zero. It is straightforward to extend the results of [22] and prove that, for every σ ₁, a strategy σ ₂ exists which in addition ensures that the expected number of visits to a state satisfying a _rew is finite and bounded from above. The rest follows easily because the rewards assigned to states are also bounded from above.

Let us now consider finite values. Because of the assumption that no reward is accumulated after visiting a target state, we can change the random variable and use ∑_j∈ℕ r(st _λ(j)) instead of rew(r,c,T). It can be shown by induction that the expected value w.r.t. this variable can be obtained as lim_i→∞ f _s(i) where:

$$ f_{s}(i) = \begin{cases} 0 &\text{if }i=0\\ r(s) + \mathrm{opt}^s_{a\in Act(s)}\sum_{s'\in a_S}\Delta (s,a)(s')\cdot f_{s'}(i-1)&\text{otherwise} \end{cases} $$

(6)

We can then apply the Kleene fixpoint theorem and prove that lim_i→∞ f _s(i) is equal to the least fixpoint of (2).

3.2 B.2 Proof of correctness for ⋆=∞

First, observe that if a state s is assigned infinite value in the initial step, then we indeed have $\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}]=\infty$ by definition. We prove the correctness for the other values. Let u:S→ℚ be a function that assigns to each s a value such that $u(s) \ge\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}]$. Recall that we compute values of (2) by value iteration, i.e. we compute:

$$ f(s) (i) = \begin{cases} 0 &\text{if }s\in T\\ u(s) &\text{if }i=0\\ r(s) + \mathrm{opt}_{a\in A(s)}^s\sum_{s'\in S}\Delta(s,a)(s')\cdot f(s')(i-1)&\text{otherwise} \end{cases} $$

for sufficiently large i, and we show that $\lim_{i\rightarrow \infty}f(s)(i) = \mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,\infty ,T)}]$.

Let us consider auxiliary functions ${\mathit{rew}_{u}^{i}}$ which assign numbers to paths as follows:

$$ {\mathit{rew}_{u}^{i}}(\lambda) = \begin{cases} \sum_{j< k} r(\mathit{st}_{\lambda}(j)) & \exists k\le i : \mathit{st}_{\lambda}(k)\in T\wedge \forall j<k : \mathit{st}_{\lambda}(j)\notin T,\\[6pt] \sum_{j< i} r(\mathit{st}_{\lambda}(j)) + u(\mathit {st}_{\lambda}(i)) & \text{otherwise.} \end{cases} $$

Intuitively, the function ${\mathit{rew}_{u}^{i}}$ alters the definition of rew(r,∞,T) by assigning rewards given by r for the first i steps, and then assigning the reward given by u, if the target has not been reached yet. One can easily prove by induction that the value of f(s)(i) is equal to $\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}_{u}^{i}}]$.

We need to show that $\lim_{i\rightarrow\infty} f(s)(i) \ge\mathbb {E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]$. This can be done by inductively showing that $f(s)(i) \ge\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}]$ for every i. The base case i=0 follows from the definition of f and u, and the inductive steps follow by monotonicity of the function f.

Furthermore, we show that $\lim_{i\rightarrow\infty} f(s)(i) \le \mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]$. Let σ _min∈Σ ₂ be a memoryless strategy satisfying $\max_{\sigma\in\varSigma_{1}}\mathbb{E}^{\sigma,\sigma_{\min }}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}] =$ $\mathbb {E}^{\max,\min }_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]$, i.e. σ _min is the optimal minimising strategy for player 2. Let $\tau(i) = \min_{\sigma\in\varSigma_{1}} \mathrm{Pr}_{s}^{\sigma, \sigma _{\min}} (\{\lambda\in{\varOmega_{{\mathcal{G}_{C},s}}} \mid \exists j\le i : \mathit{st}_{\lambda}(j)\in T\})$ be the minimal probability with which we end in T within i steps when playing according to σ _min. We have lim_i→∞ τ(i)=1, because otherwise player 1 would have a strategy to prevent the target from being reached almost surely and the reward obtained would be infinite. Thus, we have $\max_{\sigma\in\varSigma_{1}} \mathbb{E}^{\sigma,\sigma_{\min }}_{s}[{\mathit{rew}_{u}^{i}}] \le\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,\infty,T)}] + (1-\tau(i))\cdot K$ where K=max_s∈S u(s). As we let i go to ∞, the second summand diminishes, and so $f(s)(i)=\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit {rew}_{u}^{i}} ]\le \mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,\infty,T)}]$.

3.3 B.3 Proof of correctness for ⋆=0

Lemma 1

$\sup_{\sigma_{1}\in\varSigma_{1}}\inf_{\sigma_{2}\in \varSigma_{2}}\mathbb{E}^{\sigma_{1}, \sigma_{2}}_{s}[{\mathit{rew}(r,0,T )}]=\infty$ iff there is σ ₁∈Σ ₁ such that for all $\sigma_{2}{\in }\varSigma_{2}\ \mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma _{2}}(\mathit {inf}^{t}(a_{\mathit{rew}}))>0$.

Proof

In the direction ⇐, let q∈ℝ be any number. Player 1’s strategy σ to ensure that the expected reward achieved is at least q works as follows. Suppose σ ₁ is such that $\mathrm{Pr}_{\mathcal {G}_{C},s}^{\sigma_{1}, \sigma _{2}}(\mathit{inf}^{t}(a_{\mathit{rew}}))>p$ for all σ ₂. By [22], we can safely assume that p>0. The strategy σ mimics a strategy σ ₁∈Σ ₁ if the history λ satisfies ${\mathrm{r}(\lambda)} < \frac {q}{p\cdot x^{|S|}}$ where x is the minimal probability that occurs in the game. When r(λ) exceeds this bound and the formula P _>0[F t] is satisfied in the last state of λ, the strategy σ changes its behaviour and maximises the probability to reach T. Because memoryless deterministic strategies are sufficient for both players for reachability queries, σ can ensure that T is reached with probability at least x ^|S| from λ. The rest is a simple computation.

Let us analyse the direction ⇒. Similarly to the ⋆=c case, we can show that, if for every σ ₁∈Σ ₁ there is σ ₂∈Σ ₂ such that $\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathit {inf}(a^{t}_{\mathit{rew}}))$ is equal to zero, then there is σ ₂ which ensures that the expected number of visits to a state satisfying $a^{t}_{\mathit{rew}}$ is finite. The rest follows as in ⋆=c; we only need to further consider that if the state satisfies a _rew but not P _>0[F t] (i.e. it gets nonzero reward but is not labelled with $a^{t}_{\mathit{rew}}$), then the reward achievable by player 1 in such a state is 0. □

Given the state s, we denote the set of actions which can be taken by the strategy which achieved maximum probability to reach T by A(s,T). We first show that, if player 1 wants to maximise the expected reward w.r.t. rew(r,0,T) using only actions from A(s,T) in each state, he can do so using a memoryless deterministic strategy.

Lemma 2

Let $\varSigma_{1}^{T}\subseteq\varSigma_{1}$ contain all strategies that use only the actions from A(s,T) and $\forall\sigma_{1} \in\varSigma_{1}^{T}{:}\min_{\sigma_{2}\in \varSigma_{2}} \mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma _{2}}(\mathsf{F} t)=\mathrm{Pr}_{\mathcal{G} _{C},s}^{\max, \min}(\mathsf{F} t)$. There is a memoryless deterministic strategy $\sigma_{1}^{*}\in\varSigma_{1}^{T}$ satisfying:

$$\min_{\sigma_2\in\varSigma_2} \mathbb{E}^{\sigma_1^*,\sigma _2}_{\mathcal{G}_C,s}\bigl[{ \mathit{rew}(r,0,T)}\bigr] = \max_{\sigma_1\in\varSigma_1^T} \min_{\sigma_2\in\varSigma_2} \mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit {rew}(r,0,T)}\bigr] . $$

Proof

Assume the game is restricted so that the only actions available in s are A(s,T) for all s. We first create a new reward structure r′ defined by $r'(s)=r(s) \cdot\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)$. We show that, for all $\sigma_{1}\in\varSigma_{1}^{T}$ and σ ₂∈Σ ₂ with $\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathsf{F} t)=\mathrm{Pr}_{\mathcal{G} _{C},s}^{\max, \min}(\mathsf{F} t)$, we have that:

$$\mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit{rew} \bigl(r',c,T\bigr)}\bigr] = \mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s} \bigl[{\mathit {rew}(r,0,T)}\bigr]\,, $$

from which the lemma follows directly, as memoryless deterministic strategies suffice for achieving the optimal value of rew(r′,c,T) (see the proof in Appendix A.2).

Let ${\varOmega_{{\mathcal{G}_{C},s}}}(T) \stackrel{{\mathrm{def}}}{=} \{ \lambda\in{\varOmega_{{\mathcal{G}_{C},s}}} \mid\exists i : \mathit{st}_{\lambda}(i)\in T\}$, and t(λ)=min_i∈ℕ st _λ(i)∈T. For any strategy profile σ ₁,σ ₂ such that $\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma_{1}, \sigma_{2}}(\mathsf{F} t)=\mathrm{Pr}_{\mathcal{G} _{C},s}^{\max, \min}(\mathsf{F} t)$,

This completes the proof. □

Below, given a path h, we use $\mathbb{E}^{\sigma_{1},\sigma_{2}}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,0,T )}\mid h]$ to denote the conditional expectation of rew(r,0,T) on infinite paths initiated in h, i.e.:

$$\mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit {rew}(r,0,T)}\mid h \bigr] =\frac{\int_{\{\lambda\mid\lambda\text{ starts with } h\} }{\mathrm{r}(\lambda)}\,d\mathrm{Pr}_{\mathcal{G}_C,s}^{\sigma _1,\sigma_2}}{\mathrm{Pr}_{\mathcal{G}_C,s}^{\sigma_1,\sigma_2}( \{\lambda\mid \lambda\text{ starts with } h\}} $$

Lemma 3

For each state s∈S, there exists a finite-memory strategy σ ^∗ for player 1 which maximises the expected reward rew(r,0,T) from the state s. In particular, there exists some bound B such that for r(h)≥B, σ ^∗(h) becomes memoryless.

Proof

Fix two strategies σ ₁∈Σ ₁ and σ ₂∈Σ ₂. For each state s∈S and a path h=s ₀ a ₀ s ₁…s _n ending in s′ we have that:

where rew(r,0,T)+r(h) is a random variable assigning rew(r,0,T)(λ)+r(h) to a path h reaching T, and 0 otherwise; and where $\sigma_{i}^{h}(h') = \sigma_{i}(s_{0}a_{0}s_{1}\ldots s_{n-1}a_{n-1}{\cdot}h')$.

Given a state s, we use $\mathit{PR}^{\underline{\max}}_{s}$ to denote the maximal reachability probability to reach T under the strategies for which at s, actions in A(s,T) are disallowed for a single step, i.e.:

$$\mathit{PR}^{\underline{\max}}_{s} = \max_{a\in A(s)\setminus A(s,T)} \sum _{s'\in S} \Delta(s,a) \bigl(s'\bigr)\cdot \mathrm{Pr}_{\mathcal {G}_C,s'}^{\max,\min }(\mathsf{F} t) $$

Intuitively, $\mathit{PR}^{\underline{\max}}_{s}$ denotes the “second” maximal reachability probability. Below, we assume that A(s,T)≠Δ(s). Define:

$$B_s= \frac{\mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}[{\mathit {rew}(r,c,T )}]}{\mathrm{Pr}^{\max,\min}_{\mathcal{G}_C,s}(\mathsf {F} t)- \mathit{PR}^{\underline{\max}}_{s}} $$

Let B=max_s∈S B _s. We show that, on paths h ending in s and satisfying r(h)>B, no optimal strategy of player 1 can use actions from A(s)∖A(s,T) and, together with Lemma 2, we obtain the statement of this lemma.

Let h be a path ending in s _n∈S ₁ and satisfying r(h)>B. Assume σ ₁(h) deterministically chooses action from A(s)∖A(s,T) (for randomised choices the argument follows analogously). By above we have, for any σ ₂∈Σ ₂:

which contradicts that σ ₁ is optimal.

Clearly, the strategy optimising rew(r,0,T) is of finite-memory with upper bound B on the memory needed. □

By the equalities from the proof of Lemma 2 and by Lemma 3, the procedure described in step 2 of the algorithm on page 13 is correct. The procedure from step 3 of the algorithm is correct because, for all paths h, we have that:

$$\mathbb{E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{\mathit {rew}(r,0,T)}\mid h \bigr] =\max_{a\in A(s)}\sum_{s'\in S}\Delta(s,a) \bigl(s'\bigr)\cdot\mathbb {E}^{\sigma_1,\sigma_2}_{\mathcal{G}_C,s}\bigl[{ \mathit{rew}(r,0,T)}\mid h{\cdot }a{\cdot}s'\bigr] . $$

Appendix C: Proof of Theorem 1

Theorem 1(a) Let φ be a rPATL formula with no $\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi]$ operator and where k for the temporal operator U ^≤k is given in unary. The problem of deciding whether the formula is satisfied in s is in NP∩coNP .

Proof

By equivalences such as the one in equation (1) on page 8, we can assume that all probabilistic and reward operators only contain bounds ≥ or >, so in the proof we assume ⋈∈{>,≥}.

Let φ ₁,φ ₂,…,φ _n be the sequence of all state formulae occurring in φ. Also, if φ _i’s outermost operator is temporal, let C _i denote the outermost coalition in φ _i, and $\varSigma^{C}_{j}$ denote the set of all memoryless deterministic strategies for player j in the coalition game $\mathcal{G}_{C}$.

We show that the problem is in NP∩coNP by describing a polynomial-size certificate c that allows us to check that a formula is (not) satisfied. The certificate c is a function that assigns an element of $\varSigma^{C_{i}}_{1}\cup\varSigma^{C_{i}}_{2}$ to each tuple (i,s) where s∈S and φ _i is a formula whose outermost operator is temporal:

If φ _i≡〈〈C〉〉 P _⋈q[ψ] and s⊨φ _i, then: c(i,s)=σ ₁ for $\sigma_{1}\in\varSigma^{C}_{1}$ such that $\min_{\sigma_{2}\in\varSigma_{2}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q $ holds.
If φ _i≡〈〈C〉〉 P _⋈q[ψ] and $s\not\models\varphi_{i}$, then: c(i,s)=σ ₂ for $\sigma_{2}\in\varSigma^{C}_{2}$ such that $\max_{\sigma_{1}\in\varSigma_{1}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q $ does not hold.
If $\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$ and s⊨φ _i, then: c(i,s)=σ ₁ for $\sigma_{1}\in\varSigma^{C}_{1}$ such that $\min_{\sigma_{2}\in\varSigma_{2}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x $ holds.
If $\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$ and $s\not \models\varphi_{i}$, then: c(i,s)=σ ₂ for $\sigma_{2}\in\varSigma^{C}_{2}$ such that $\max_{\sigma_{1}\in\varSigma_{1}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x $ does not hold.

The existence of the strategies assigned by c follows from Theorem 2 and from Appendix A.2.

To check the certificate in polynomial time, we compute Sat(φ′) for all state subformulae φ′ of φ, traversing the parse tree of φ bottom-up. Suppose that we are analysing a formula φ′ and that we have computed Sat(φ″) for all state subformulae φ″ of φ′. If φ′ is an atomic proposition or its outermost operator is a boolean connective, we construct Sat(φ″) in the obvious way. Otherwise:

$${\mathit{Sat}}\bigl(\varphi'\bigr)=\bigl\{s \mid c(i,s)\text{ is a strategy for the first player in the coalition game}\bigr\}. $$

We verify that our choice of Sat(φ′) is correct as follows. For all s∈Sat(φ′), we construct an MDP from the appropriate coalition game by fixing the decisions of the first player according to c(i,s), and in polynomial time we check that the minimal probability (or reward) in the resulting MDP exceeds the bound given by the outermost operator of φ′ (see Theorem 2). If s∉Sat(φ′), then we fix the decisions of the second player according to c(i,s) and proceed analogously, computing the maximal probabilities. □

Theorem 1(b) Model checking an arbitrary rPATL formula is in NEXP∩coNEXP .

Proof

The proof is similar to that for Theorem 1(a) above. We only need to extend the certificate from the proof to provide a witnessing strategy for formulae of the form $\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi]$. This is straightforward since, in the proof of Lemma 3, we showed that players need only strategies of exponential size.

In Lemma 3 we have shown that, for the optimal strategy, it suffices to play a deterministic memoryless strategy after a certain reward bound B has been reached and, before that, the strategy needs to remember only the reward accumulated along the history. The rewards are integers, therefore, the strategy in a state may need a different action for each value of reward below B, and one action for reward which is greater or equal to B. So, the overall size of the memory will be $\mathcal{O}(|S|\times B)$. A deterministic strategy suffices in this case; observe that one could ‘embed’ the memory into the game by constructing a new game where the set of states is S×{0,…,B+r _max−1}∪{s _f}. The transition relation is preserved for states (s,k) where k<B, and states (s,k) where k≥B have a transition to s _f only. The reward structure r assigns reward R _(s,k) to states where k≥B, which can be computed using step 2 of the algorithm for ⋆=0 in Sect. 4, and 0 to all other states. Then, the deterministic memoryless strategy that maximises rew(r,c,{s _f}) in this new game will also be an optimal strategy in the original game (but requiring memory of size B). The size of B can be at most exponential in the size of $\mathcal {G}$, i.e. from Lemma 3 it follows that the size of B for a state s is bounded by

$$ B_s= \frac{\mathbb{E}^{\max,\min}_{\mathcal{G}_C,s}[{\mathit {rew}(r,c,T )}]}{\mathrm{Pr}^{\max,\min}_{\mathcal{G}_C,s}(\mathsf {F} t)- \mathit{PR}^{\underline{\max}}_{s}}. $$

(7)

We claim that all $\mathbb{E}^{\max,\min}_{\mathcal {G}_{C},s}[{\mathit{rew}(r,c,T)}]$, $\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)$ and $\mathit{PR}^{\underline{\max}}_{s}$ can be represented as fractions of integers whose binary representation is polynomial in the size of the input, from which the bound on the size of B _s follows. For $\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit {rew}(r,c,T)}]$ (or $\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)$, $\mathit{PR}^{\underline{\max}}_{s}$), fixing the optimal strategies for both players we can construct a linear program whose size is polynomial in the size of input, and whose solution is equal to $\mathbb{E}^{\max,\min}_{\mathcal{G}_{C},s}[{\mathit{rew}(r,c,T)}]$ (or $\mathrm{Pr}^{\max,\min}_{\mathcal{G}_{C},s}(\mathsf{F} t)$, $\mathit {PR}^{\underline{\max}}_{s}$, respectively). Because the solution of the linear program can be represented as a fraction of two integers of polynomial binary representations, we get the claim. Therefore, B _s is at most exponential in the size of $\mathcal{G}$. □

Appendix D: Proof of correctness for Sect. 4.4

Price-bounded coalitions

In the proof of Theorem 1 (when all coalitions were fixed), we exploited the fact that there is an exponential size certificate c, which is a function that assigns an element of $\varSigma^{C_{i}}_{1}\cup \varSigma^{C_{i}}_{2}$ to each tuple (i,s) where s∈S and φ _i is a formula whose outermost operator is temporal.

We extend this approach by changing the certificate c so that it returns an element of $\varSigma^{C}_{1}\cup\varSigma^{C}_{2}$ to each tuple (i,s,C), where s∈S, φ _i is a formula whose outermost operator is temporal, C⊆Π, and ⋈∈{>,≥}:

If φ _i≡〈〈C〉〉 P _⋈q[ψ] or φ _i≡〈〈?〉〉_≤y P _⋈q[ψ], and s⊨〈〈C〉〉 P _⋈q[ψ], then: c(i,s,C)=σ ₁ for $\sigma_{1}\in\varSigma^{C}_{1}$ such that $\min_{\sigma_{2}\in\varSigma_{2}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q $ holds.
If φ _i≡〈〈C〉〉 P _⋈q[ψ] or φ _i≡〈〈?〉〉_≤y P _⋈q[ψ] , and $s\not\models\langle\!\langle C \rangle\!\rangle \,\mathsf {P}_{\bowtie q} [\psi]$, then: c(i,s,C)=σ ₂ for $\sigma_{2}\in\varSigma^{C}_{2}$ such that $\max_{\sigma_{1}\in\varSigma_{1}}\mathrm{Pr}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma _{2}}(\psi) \bowtie q $ does not hold.
If $\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$ or $\varphi_{i}\equiv\langle\!\langle ? \rangle\!\rangle_{\le y} \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$, and $s\models \langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$, then: c(i,s,C)=σ ₁ for $\sigma_{1}\in\varSigma^{C}_{1}$ such that $\min_{\sigma_{2}\in\varSigma_{2}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x $ holds.
If $\varphi_{i}\equiv\langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$ or $\varphi_{i}\equiv\langle\!\langle ? \rangle\!\rangle_{\le y} \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$, and $s\not\models \langle\!\langle C \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$, then: c(i,s,C)=σ ₂ for $\sigma_{2}{\in} \varSigma^{C}_{2}$ such that $\max_{\sigma_{1}\in\varSigma_{1}}\mathbb{E}_{\mathcal{G}_{C},s}^{\sigma _{1},\sigma_{2}}[{\mathit{rew}(r,\star,{\mathit{Sat}}(\phi))}]\bowtie x $ does not hold.
If φ _i≡〈〈C′〉〉 P _⋈q[ψ] φ _i≡〈〈C′〉〉 P _⋈q[ψ], $\varphi_{i}\equiv\langle\!\langle C' \rangle\!\rangle \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star}\phi]$, or $\varphi_{i}\equiv\langle\!\langle C' \rangle\!\rangle \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{\star }\phi]$, and C′≠C, then c(i,s,C) returns an arbitrary memoryless deterministic strategy from $\varSigma^{C'}_{1}\cup\varSigma^{C'}_{2}$.

As before, the existence of strategies assigned by c follows from Theorem 2 and from Appendix A.2: for all formulae but $\langle\!\langle C \rangle\!\rangle \,\mathsf {R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi ]$ and $\langle\!\langle ? \rangle\!\rangle \,\mathsf{R}^{r}_{\bowtie x} [\mathsf{F}^{0}\phi]$, memoryless deterministic strategies exist; and, for the aforementioned formulae, exponential memory deterministic strategies suffice.

To check the certificate in polynomial time (in the worst-case size of c, which is exponential in the size of the model), we compute Sat(φ′) for all state subformulae φ′ of φ, traversing the parse tree of φ bottom-up. Suppose that we are analysing a formula φ′ and that we have computed Sat(φ″) for all state subformulae φ″ of φ′. If φ′ is an atomic proposition or its outermost operator is a boolean connective, we construct Sat(φ′) in the obvious way. Otherwise, if the outermost coalition in φ′ is fixed to C:

$${\mathit{Sat}}\bigl(\varphi'\bigr)=\bigl\{s \mid c(i,s,C) \in \varSigma_1^C\bigr\} $$

and, if the outermost coalition in φ′ is not specified, but a coalition of price ≤y is required:

$${\mathit{Sat}}\bigl(\varphi'\bigr)=\biggl\{s \mid\exists C \subseteq\varPi: \sum_{\gamma\in C} p(\gamma) \le y \text{ and } c(i,s,C)\in\varSigma_1^C\biggr\}. $$

We verify that our choice of Sat(φ′) is correct as follows. For all s∈Sat(φ′), we construct an MDP from the appropriate coalition game by fixing the decisions of the first player according to c(i,s,C) and in polynomial time we check that the minimal probability (or reward) in the resulting MDP exceeds the bound given by the outermost operator of φ′ (see Theorem 2). If s∉Sat(φ′), and the outermost coalition of φ′ is C, then we fix the decisions of the second player according to c(i,s,C) and proceed analogously, computing the maximal probabilities. If s∉Sat(φ′), and the outermost coalition of φ′ is not specified, but required to be of price at most y, we need to construct MDPs from the coalition games $\mathcal{G}_{C}$ for all C where ∑_γ∈C p(γ)≤y by fixing the decisions of the second player and computing the maximal probabilities. There are only polynomially many (in the size of c) possible choices of C (the number of different coalitions is exponential in the size of $\mathcal{G}$, but the certificate is exponential in $\mathcal{G}$ too), and each choice can be checked in polynomial time.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, T., Forejt, V., Kwiatkowska, M. et al. Automatic verification of competitive stochastic systems. Form Methods Syst Des 43, 61–92 (2013). https://doi.org/10.1007/s10703-013-0183-7

Download citation

Published: 16 February 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s10703-013-0183-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic verification of competitive stochastic systems

Abstract

Access this article

Similar content being viewed by others

PRISM-games: A Model Checker for Stochastic Multi-Player Games

Automated Verification of Concurrent Stochastic Games

Automatic verification of concurrent stochastic systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix: Proofs

Theorem 2

Appendix A: Proofs for optimality of expected rewards

2.1 A.1 Finite-memory strategies for ⋆=0

Remark

2.2 A.2 Memoryless strategies for ⋆={∞,c}

Proof of correctness of definitions of strategies

Appendix B: Proofs of correctness for Sect. 4.3

3.1 B.1 Proof of correctness for ⋆=c

3.2 B.2 Proof of correctness for ⋆=∞

3.3 B.3 Proof of correctness for ⋆=0

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Appendix C: Proof of Theorem 1

Proof

Proof

Appendix D: Proof of correctness for Sect. 4.4

Price-bounded coalitions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation