Abstract
We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents’ candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.
Similar content being viewed by others
References
Adam B., Dekel E. (1993) Hierarchies of beliefs and common knowledge. International Journal of Game Theory 59(1): 189–198
Aumann R.J. (1999) Interactive epistemology i: Knowledge. International Journal of Game Theory, 28(3): 263–300
Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In Sixteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 478–485). Stockhom, Sweeden.
Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Thirteenth Conference on Artificial Intelligence (AAAI) (pp. 1168–1175). Portland, Oregon.
Camerer C. (2003) Behavioral game theory: Experiments in strategic interaction. Princeton University Press, Princeton New Jersey
Charnes J.M., Shenoy P. (2004) Multistage monte carlo methods for solving influence diagrams using local computation. Management Science 50(3): 405–418
Dennett D. (1986) Intentional systems. MIT Press, Brainstorms
Doshi, P., & Gmytrasiewicz, P. J. (2005). Approximating state estimation in multiagent settings using particle filters. In Autonomous Agents and Multi-agent Systems Conference (AAMAS) (pp. 320–327). Utrecht, Netherlands.
Doshi, P., & Gmytrasiewicz, P. J. (2005). A particle filtering based approach to approximating interactive pomdps. In Twentieth Conference on Artificial Intelligence (AAAI) (pp. 969–974). Pittsburg, PA.
Fehr E., Gachter S. (2000) Cooperation and punishment in public good experiments. American Economic Review 90(4): 980–994
Fudenberg D., Levine D.K. (1998) The theory of learning in games. The MIT Press, Cambridge MA
Fudenberg, D., & Tirole, J. (1991). Game theory. MIT Press.
Gal, Y., & Pfeffer, A. (2003). A language for modeling agent’s decision-making processes in games. In Autonomous Agents and Multi-Agents Systems Conference (AAMAS) (pp. 265–272). Melbourne, Australia.
Gmytrasiewicz P., Doshi P. (2005) A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research (JAIR) 24: 49–79
Gmytrasiewicz P.J., Durfee E.H. (2000) Rational coordination in multi-agent environments. Journal of Autonomous Agents and Multi-Agent Systems 3(4): 319–350
Guestrin, C., Koller, D., & Parr, R. (2001). Solving factored pomdps with linear value functions. In Workshop on Planning under Uncertainty and Incomplete Information, IJCAI. Seattle, Washington.
Harsanyi J.C. (1967) Games with incomplete information played by bayesian players. Management Science 14(3): 159–182
Howard, R. A., & Matheson, J. E. (1984). Influence diagrams. In Readings on the Principles and Applications of Decision Analysis (pp. 721–762).
Kaelbling L., Littman M., Cassandra A. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence Journal 101(1–2): 99–134
Koller, D., & Milch, B. (2001). Multi-agent influence diagrams for representing and solving games. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1027–1034). Seattle, Washington.
Littman, M. (1994). Markov games as a framework for multiagent reinforcement learning. In International Conference on Machine Learning (ICML) (pp. 157–163). New Brunswick, New Jersey.
MacQueen J. (1967) Some methods for classification and analysis of multivariate observations. In: LeCam L.M., Neyman J.(eds) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probablity. UC Press, Berkeley, CA, pp 281–297
Mertens J.F., Zamir S. (1985) Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory 14: 1–29
Nair, R., Tambe, M., Yokoo, M., Pynadath, D., & Marsella, S. (2003). Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 705–711). Acapulco, Mexico.
Nilsson, D., & Lauritzen, S. (2000). Evaluating influence diagrams using limids. In Uncertainty in Artificial Intelligence (UAI) (pp. 436–445). Stanford, California.
Pearl J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan-Kaufmann: Los Altos, California.
Pineau J., Gordon G., Thrun S. (2006) Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research (JAIR) 27: 335–380
Polich, K., & Gmytrasiewicz, P. (2006). Interactive dynamic influence diagrams. In Game Theory and Decision Theory (GTDT) Workshop, AAMAS. Hakodate, Japan.
Pynadath, D., & Marsella, S. (2007). Minimal mental models. In Twenty-Second Conference on Artificial Intelligence (AAAI) (pp. 1038–1044). Canada, Vancouver.
Rathnas, B., Doshi, P., & Gmytrasiewicz, P. J. (2006). Exact solutions to interactive pomdps using behavioral equivalence. In Autonomous Agents and Multi-Agents Systems Conference (AAMAS) (pp. 1025–1032). Hakodate, Japan.
Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd edn). Prentice Hall.
Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-agent Systems. doi:10.1007/s10458-007-9026-5.
Shachter R.D. (1986) Evaluating influence diagrams. Operations Research 34(6): 871–882
Smallwood R., Sondik E. (1973) The optimal control of partially observable markov decision processes over a finite horizon. Operations Research (OR) 21: 1071–1088
Suryadi, D., & Gmytrasiewicz, P. (1999). Learning models of other agents using influence diagrams. In International Conference on User Modeling (pp. 223–232).
Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man, and Cybernetics 20(2): 365–379
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Doshi, P., Zeng, Y. & Chen, Q. Graphical models for interactive POMDPs: representations and solutions. Auton Agent Multi-Agent Syst 18, 376–416 (2009). https://doi.org/10.1007/s10458-008-9064-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-008-9064-7