Introduction to Nested Markov Models

Shpitser, Ilya; Evans, Robin J.; Richardson, Thomas S.; Robins, James M.

doi:10.2333/bhmk.41.3

Introduction to Nested Markov Models

Invited paper
Published: 15 January 2014

Volume 41, pages 3–39, (2014)
Cite this article

Behaviormetrika Aims and scope Submit manuscript

Ilya Shpitser¹,
Robin J. Evans²,
Thomas S. Richardson³ &
…
James M. Robins⁴

513 Accesses
19 Citations
Explore all metrics

Abstract

Graphical models provide a principled way to take advantage of independence constraints for probabilistic and causal modeling, while giving an intuitive graphical description of “qualitative features” useful for these tasks. A popular graphical model, known as a Bayesian network, represents joint distributions by means of a directed acyclic graph (DAG). DAGs provide a natural representation of conditional independence constraints, and also have a simple causal interpretation. When all variables are observed, the associated statistical models have many attractive properties. However, in many practical data analyses unobserved variables may be present. In general, the set of marginal distributions obtained from a DAG model with hidden variables is a much more complicated statistical model: the likelihood of the marginal is often intractable; the model may contain singularities. There are also an infinite number of such models to consider.

It is possible to avoid these difficulties by modeling the observed marginal directly. One strategy is to define a model by means of conditional independence constraints induced on the observed marginal by the hidden variable DAG; we call this the ordinary Markov model. This model will be a supermodel that contains the set of marginal distributions obtained from the original DAG. Richardson and Spirtes (2002) and Evans and Richardson (2013a) gave parametrizations of this model in the Gaussian and discrete case, respectively.

However, it has long been known that hidden variable DAG models also imply nonparametric constraints which generalize conditional independences; these are sometimes called “Verma Constraints”. In this paper we describe a natural extension of the ordinary Markov approach, whereby both conditional independences and these generalized constraints are used to define a nested Markov model. The binary nested Markov model may be parametrized via a simple extension of the binary parametrization of the ordinary Markov model of Evans and Richardson (2013a). We also give evidence for a characterization of nested Markov equivalence for models with four observed variables. A consequence of this characterization is that, in some instances, most structural features of hidden variable DAGs can be recovered exactly when a single generalized independence constraint holds under the distribution of the observed variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1):147–169.
Article Google Scholar
Ali, A., Richardson, T. S., and Spirtes, P. (2009). Markov equivalence for ancestral graphs. Annals of Statistics, 37:2808–2837.
Article MathSciNet MATH Google Scholar
Beal, M. J. and Ghahramani, Z. (2004). Variational Bayesian learning of directed graphical models with hidden variables. Bayesian Analysis, 1(1):1–44.
MathSciNet MATH Google Scholar
Bergsma, W. P. and Rudas, T. (2002). Marginal models for categorical data. Annals of Statistics, 30(1):140–159.
Article MathSciNet MATH Google Scholar
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554.
MathSciNet MATH Google Scholar
Claassen, T., Mooij, J. M. and Heskes, T. (2013) Learning Sparse Causal Models is not NP-hard, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13), abs/1309.6824
Google Scholar
Colombo, D., Maathuis, M. H., Kalisch, M. and Richardson T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Annals of Statistics, 40: pp.294–321.
Article MathSciNet MATH Google Scholar
Cooper, G. F. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.
MATH Google Scholar
Drton, M. (2009). Likelihood ratio tests and singularities. Annals of Statistics, 37(2):979–1012.
Article MathSciNet MATH Google Scholar
Drton, M. and Plummer, M. (2013). A Bayesian information criterion for singular models. ArXiv eprints.
Google Scholar
Drton, M., Sturmfels, B., and Sullivant, S. (2009). Lectures on Algebraic Statistics, volume 40. Birkhäuser, Basel.
Book MATH Google Scholar
Evans, R. J. (2011). Parameterizations of Discrete Graphical Models. PhD thesis, Department of Statistics, University of Washington.
Google Scholar
Evans, R. J. (2012). Graphical methods for inequality constraints in marginalized DAGs. In 22nd Workshop on Machine Learning and Signal Processing.
Google Scholar
Evans, R. J. and Richardson, T. S. (2010). Maximum likelihood fitting of acyclic directed mixed graphs to binary data. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, AUAI Press.
Google Scholar
Evans, R. J. and Richardson, T. S. (2013a). Marginal log-linear parameters for graphical Markov models. Journal of the Royal Statistical Society: Series B, 75(4):743–768.
Article MathSciNet MATH Google Scholar
Evans, R. J. and Richardson, T. S. (2013b). Markovian acyclic directed mixed graphs for discrete data. Annals of Statistics, accepted for publication. abs/1301.6624.
Google Scholar
Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the 14th International Conference on Machine Learning, pages 125–133. Morgan Kaufmann.
Google Scholar
Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3/4):601–620.
Article Google Scholar
Geiger, D. and Meek, C. (1998). Graphical models and exponential families. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 156–165.
Google Scholar
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
MATH Google Scholar
Lauritzen, S. L. (1996). Graphical Models. Oxford, U.K.: Clarendon.
MATH Google Scholar
Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle. Excerpts reprinted (1990) in English. Statistical Science, 5:463–472.
Google Scholar
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo.
MATH Google Scholar
Pearl, J. (1995). On the testability of causal models with latent and instrumental variables. In Proceedings of the Eleventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), pages 435–443, San Francisco, CA. Morgan Kaufmann.
Google Scholar
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.
MATH Google Scholar
Pourret, O., Nä0im, P., and Marcot, B. (2008). Bayesian Networks: A Practical Guide to Applications. Wiley.
Book MATH Google Scholar
Richardson, T. S. (2003). Markov properties for acyclic directed mixed graphs. The Scandinavian Journal of Statistics, 30(1):145–157.
Article MathSciNet MATH Google Scholar
Richardson, T. S., Robins, J. M., and Shpitser, I. (2012). Nested Markov properties for acyclic directed mixed graphs. Presented at the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12).
Google Scholar
Richardson, T. S. and Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30(4):962–1030.
Article MathSciNet MATH Google Scholar
Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modeling, 7:1393–1512.
Article MathSciNet MATH Google Scholar
Robins, J. M. (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. In Sechrest, L., Freeman, H., and Mulley, A., editors, Health Service Research Methodology: A Focus on AIDS, pages 113–159. NCHSR, U.S. Public Health Service.
Google Scholar
Robins, J. M. (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika, 79:321–334.
Article MathSciNet MATH Google Scholar
Robins, J. M. (1997). Causal inference from complex longitudinal data. In Berkane, M., editor, Latent variable modelling and applications to causality, number 120 in Lecture notes in statistics, pages 69–117. Springer-Verlag, New York.
Chapter Google Scholar
Robins, J. M. (1999). Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models. In Glymour, C. and Cooper, G., editors, Computation, Causation, and Discovery, pages 349–405. Menlo Park, CA, CAmbridge, MA: AAAI Press/The MIT Press.
Google Scholar
Robins, J. M. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Halloran, M. and Berry, D., editors, Statistical Models in Epidemiology, the Environment, and Clinical Trials, volume 116 of The IMA Volumes in Mathematics and its Applications, pages 95–133. Springer New York.
Google Scholar
Robins, J. M. and Richardson, T. S. (2010). Alternative graphical causal models and the identification of direct effects. In P. Shrout, K. Keyes, and K. Ornstein, editors, Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. Chapter 6, pages 1–52, Oxford University Press.
Google Scholar
Robins, J. M., Scheines, R., Spirtes, P., and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika, 90(3):491–515.
Article MathSciNet MATH Google Scholar
Robins, J. M. and Wasserman, L. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, pages 309–420. Morgan Kaufmann.
Google Scholar
Robins, J. M. and Wasserman, L. (1999). On the impossibility of inferring causation from association without background knowledge. In Computation, Causation, and Discovery, pages 305–321. Menlo Park, CA, Cambridge, MA: AAAI Press/The MIT Press.
Google Scholar
Roweis, S. and Ghahramani, Z. (1999). A unifying review of linear Gaussian models. Neural Computation, 11(2):305–345.
Article Google Scholar
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology, 66:688–701.
Article Google Scholar
Rusakov, D. and Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6:1–35.
MathSciNet MATH Google Scholar
Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6:461–464.
Article MathSciNet MATH Google Scholar
Settimi, R. and Smith, J. Q. (1998). On the geometry of Bayesian graphical models with hidden variables. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 472–479.
Google Scholar
Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030.
MathSciNet MATH Google Scholar
Shpitser, I., Evans, R. J., Richardson, T. S., and Robins, J. M. (2013). Sparse nested Markov models with log-linear parameters. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13). AUAI Press.
Google Scholar
Shpitser, I. and Pearl, J. (2006). Identification of conditional interventional distributions. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, (UAI-2006). AUAI Press.
Google Scholar
Shpitser, I. and Pearl, J. (2008). Dormant independence. Technical Report R-340, Cognitive Systems Laboratory, University of California, Los Angeles.
Google Scholar
Shpitser, I., Richardson, T. S., and Robins, J. M. (2009). Testing edges by truncations. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, volume 21, pages 1957–1963.
Google Scholar
Shpitser, I., Richardson, T. S., and Robins, J. M. (2011). An efficient algorithm for computing interventional distributions in latent variable causal models. In 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). AUAI Press.
Google Scholar
Shpitser, I., Richardson, T. S., Robins, J. M., and Evans, R. J. (2012). Parameter and structure learning in nested Markov models. In Proceedings of the Causal Structure Learning Workshop of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12).
Google Scholar
Shwe, M. A., Middleton, B., Heckerman, D. E., Henrion, M., Horvitz, E. J., Lehmann, H. P., and Cooper, G. F. (1991). Probabilistic diagnosis using a reformulation of the INTERN-IST- 1/QMR knowledge base. Methods of Information in Medicine, 30(4):241–267.
Article Google Scholar
Spirtes, P., Glymour, C., and Scheines, R. (1993) Causation, Prediction, and Search. Springer Verlag, New York.
Book MATH Google Scholar
Tian, J. and Pearl, J. (2002). On the testable implications of causal models with hidden variables. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI-02), pages 519–527. AUAI Press.
Google Scholar
Verma, T. S. and Pearl, J. (1990). Equivalence and synthesis of causal models. Technical Report R-150, Department of Computer Science, University of California, Los Angeles.
Google Scholar
Zhang, N. L. and Poole, D. (1994). A simple approach to Bayesian network computations. In Tenth Canadian Conference on AI, pages 171–178.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Southampton, USA
Ilya Shpitser
University of Oxford, USA
Robin J. Evans
University of Washington, USA
Thomas S. Richardson
Harvard University, USA
James M. Robins

Authors

Ilya Shpitser
View author publications
You can also search for this author in PubMed Google Scholar
Robin J. Evans
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Richardson
View author publications
You can also search for this author in PubMed Google Scholar
James M. Robins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilya Shpitser.

About this article

Cite this article

Shpitser, I., Evans, R.J., Richardson, T.S. et al. Introduction to Nested Markov Models. Behaviormetrika 41, 3–39 (2014). https://doi.org/10.2333/bhmk.41.3

Download citation

Received: 22 August 2013
Revised: 14 January 2014
Published: 15 January 2014
Issue Date: January 2014
DOI: https://doi.org/10.2333/bhmk.41.3

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Nested Markov Models

Abstract

Access this article

Similar content being viewed by others

A survey of Bayesian Network structure learning

A Systematic Review of Hidden Markov Models and Their Applications

Confidence distributions and hypothesis testing

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Navigation

Introduction to Nested Markov Models

Abstract

Access this article

Similar content being viewed by others

A survey of Bayesian Network structure learning

A Systematic Review of Hidden Markov Models and Their Applications

Confidence distributions and hypothesis testing

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Search

Navigation