Skip to main content
Log in

Introduction to Nested Markov Models

  • Invited paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

Graphical models provide a principled way to take advantage of independence constraints for probabilistic and causal modeling, while giving an intuitive graphical description of “qualitative features” useful for these tasks. A popular graphical model, known as a Bayesian network, represents joint distributions by means of a directed acyclic graph (DAG). DAGs provide a natural representation of conditional independence constraints, and also have a simple causal interpretation. When all variables are observed, the associated statistical models have many attractive properties. However, in many practical data analyses unobserved variables may be present. In general, the set of marginal distributions obtained from a DAG model with hidden variables is a much more complicated statistical model: the likelihood of the marginal is often intractable; the model may contain singularities. There are also an infinite number of such models to consider.

It is possible to avoid these difficulties by modeling the observed marginal directly. One strategy is to define a model by means of conditional independence constraints induced on the observed marginal by the hidden variable DAG; we call this the ordinary Markov model. This model will be a supermodel that contains the set of marginal distributions obtained from the original DAG. Richardson and Spirtes (2002) and Evans and Richardson (2013a) gave parametrizations of this model in the Gaussian and discrete case, respectively.

However, it has long been known that hidden variable DAG models also imply nonparametric constraints which generalize conditional independences; these are sometimes called “Verma Constraints”. In this paper we describe a natural extension of the ordinary Markov approach, whereby both conditional independences and these generalized constraints are used to define a nested Markov model. The binary nested Markov model may be parametrized via a simple extension of the binary parametrization of the ordinary Markov model of Evans and Richardson (2013a). We also give evidence for a characterization of nested Markov equivalence for models with four observed variables. A consequence of this characterization is that, in some instances, most structural features of hidden variable DAGs can be recovered exactly when a single generalized independence constraint holds under the distribution of the observed variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1):147–169.

    Article  Google Scholar 

  • Ali, A., Richardson, T. S., and Spirtes, P. (2009). Markov equivalence for ancestral graphs. Annals of Statistics, 37:2808–2837.

    Article  MathSciNet  MATH  Google Scholar 

  • Beal, M. J. and Ghahramani, Z. (2004). Variational Bayesian learning of directed graphical models with hidden variables. Bayesian Analysis, 1(1):1–44.

    MathSciNet  MATH  Google Scholar 

  • Bergsma, W. P. and Rudas, T. (2002). Marginal models for categorical data. Annals of Statistics, 30(1):140–159.

    Article  MathSciNet  MATH  Google Scholar 

  • Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554.

    MathSciNet  MATH  Google Scholar 

  • Claassen, T., Mooij, J. M. and Heskes, T. (2013) Learning Sparse Causal Models is not NP-hard, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13), abs/1309.6824

    Google Scholar 

  • Colombo, D., Maathuis, M. H., Kalisch, M. and Richardson T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Annals of Statistics, 40: pp.294–321.

    Article  MathSciNet  MATH  Google Scholar 

  • Cooper, G. F. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.

    MATH  Google Scholar 

  • Drton, M. (2009). Likelihood ratio tests and singularities. Annals of Statistics, 37(2):979–1012.

    Article  MathSciNet  MATH  Google Scholar 

  • Drton, M. and Plummer, M. (2013). A Bayesian information criterion for singular models. ArXiv eprints.

    Google Scholar 

  • Drton, M., Sturmfels, B., and Sullivant, S. (2009). Lectures on Algebraic Statistics, volume 40. Birkhäuser, Basel.

    Book  MATH  Google Scholar 

  • Evans, R. J. (2011). Parameterizations of Discrete Graphical Models. PhD thesis, Department of Statistics, University of Washington.

    Google Scholar 

  • Evans, R. J. (2012). Graphical methods for inequality constraints in marginalized DAGs. In 22nd Workshop on Machine Learning and Signal Processing.

    Google Scholar 

  • Evans, R. J. and Richardson, T. S. (2010). Maximum likelihood fitting of acyclic directed mixed graphs to binary data. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, AUAI Press.

    Google Scholar 

  • Evans, R. J. and Richardson, T. S. (2013a). Marginal log-linear parameters for graphical Markov models. Journal of the Royal Statistical Society: Series B, 75(4):743–768.

    Article  MathSciNet  MATH  Google Scholar 

  • Evans, R. J. and Richardson, T. S. (2013b). Markovian acyclic directed mixed graphs for discrete data. Annals of Statistics, accepted for publication. abs/1301.6624.

    Google Scholar 

  • Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the 14th International Conference on Machine Learning, pages 125–133. Morgan Kaufmann.

    Google Scholar 

  • Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3/4):601–620.

    Article  Google Scholar 

  • Geiger, D. and Meek, C. (1998). Graphical models and exponential families. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 156–165.

    Google Scholar 

  • Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.

    MATH  Google Scholar 

  • Lauritzen, S. L. (1996). Graphical Models. Oxford, U.K.: Clarendon.

    MATH  Google Scholar 

  • Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle. Excerpts reprinted (1990) in English. Statistical Science, 5:463–472.

    Google Scholar 

  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo.

    MATH  Google Scholar 

  • Pearl, J. (1995). On the testability of causal models with latent and instrumental variables. In Proceedings of the Eleventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), pages 435–443, San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.

    MATH  Google Scholar 

  • Pourret, O., Nä0im, P., and Marcot, B. (2008). Bayesian Networks: A Practical Guide to Applications. Wiley.

    Book  MATH  Google Scholar 

  • Richardson, T. S. (2003). Markov properties for acyclic directed mixed graphs. The Scandinavian Journal of Statistics, 30(1):145–157.

    Article  MathSciNet  MATH  Google Scholar 

  • Richardson, T. S., Robins, J. M., and Shpitser, I. (2012). Nested Markov properties for acyclic directed mixed graphs. Presented at the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12).

    Google Scholar 

  • Richardson, T. S. and Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30(4):962–1030.

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modeling, 7:1393–1512.

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J. M. (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. In Sechrest, L., Freeman, H., and Mulley, A., editors, Health Service Research Methodology: A Focus on AIDS, pages 113–159. NCHSR, U.S. Public Health Service.

    Google Scholar 

  • Robins, J. M. (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika, 79:321–334.

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J. M. (1997). Causal inference from complex longitudinal data. In Berkane, M., editor, Latent variable modelling and applications to causality, number 120 in Lecture notes in statistics, pages 69–117. Springer-Verlag, New York.

    Chapter  Google Scholar 

  • Robins, J. M. (1999). Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models. In Glymour, C. and Cooper, G., editors, Computation, Causation, and Discovery, pages 349–405. Menlo Park, CA, CAmbridge, MA: AAAI Press/The MIT Press.

    Google Scholar 

  • Robins, J. M. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Halloran, M. and Berry, D., editors, Statistical Models in Epidemiology, the Environment, and Clinical Trials, volume 116 of The IMA Volumes in Mathematics and its Applications, pages 95–133. Springer New York.

    Google Scholar 

  • Robins, J. M. and Richardson, T. S. (2010). Alternative graphical causal models and the identification of direct effects. In P. Shrout, K. Keyes, and K. Ornstein, editors, Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. Chapter 6, pages 1–52, Oxford University Press.

    Google Scholar 

  • Robins, J. M., Scheines, R., Spirtes, P., and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika, 90(3):491–515.

    Article  MathSciNet  MATH  Google Scholar 

  • Robins, J. M. and Wasserman, L. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, pages 309–420. Morgan Kaufmann.

    Google Scholar 

  • Robins, J. M. and Wasserman, L. (1999). On the impossibility of inferring causation from association without background knowledge. In Computation, Causation, and Discovery, pages 305–321. Menlo Park, CA, Cambridge, MA: AAAI Press/The MIT Press.

    Google Scholar 

  • Roweis, S. and Ghahramani, Z. (1999). A unifying review of linear Gaussian models. Neural Computation, 11(2):305–345.

    Article  Google Scholar 

  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology, 66:688–701.

    Article  Google Scholar 

  • Rusakov, D. and Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6:1–35.

    MathSciNet  MATH  Google Scholar 

  • Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6:461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Settimi, R. and Smith, J. Q. (1998). On the geometry of Bayesian graphical models with hidden variables. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 472–479.

    Google Scholar 

  • Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030.

    MathSciNet  MATH  Google Scholar 

  • Shpitser, I., Evans, R. J., Richardson, T. S., and Robins, J. M. (2013). Sparse nested Markov models with log-linear parameters. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13). AUAI Press.

    Google Scholar 

  • Shpitser, I. and Pearl, J. (2006). Identification of conditional interventional distributions. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, (UAI-2006). AUAI Press.

    Google Scholar 

  • Shpitser, I. and Pearl, J. (2008). Dormant independence. Technical Report R-340, Cognitive Systems Laboratory, University of California, Los Angeles.

    Google Scholar 

  • Shpitser, I., Richardson, T. S., and Robins, J. M. (2009). Testing edges by truncations. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, volume 21, pages 1957–1963.

    Google Scholar 

  • Shpitser, I., Richardson, T. S., and Robins, J. M. (2011). An efficient algorithm for computing interventional distributions in latent variable causal models. In 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). AUAI Press.

    Google Scholar 

  • Shpitser, I., Richardson, T. S., Robins, J. M., and Evans, R. J. (2012). Parameter and structure learning in nested Markov models. In Proceedings of the Causal Structure Learning Workshop of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12).

    Google Scholar 

  • Shwe, M. A., Middleton, B., Heckerman, D. E., Henrion, M., Horvitz, E. J., Lehmann, H. P., and Cooper, G. F. (1991). Probabilistic diagnosis using a reformulation of the INTERN-IST- 1/QMR knowledge base. Methods of Information in Medicine, 30(4):241–267.

    Article  Google Scholar 

  • Spirtes, P., Glymour, C., and Scheines, R. (1993) Causation, Prediction, and Search. Springer Verlag, New York.

    Book  MATH  Google Scholar 

  • Tian, J. and Pearl, J. (2002). On the testable implications of causal models with hidden variables. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI-02), pages 519–527. AUAI Press.

    Google Scholar 

  • Verma, T. S. and Pearl, J. (1990). Equivalence and synthesis of causal models. Technical Report R-150, Department of Computer Science, University of California, Los Angeles.

    Google Scholar 

  • Zhang, N. L. and Poole, D. (1994). A simple approach to Bayesian network computations. In Tenth Canadian Conference on AI, pages 171–178.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilya Shpitser.

About this article

Cite this article

Shpitser, I., Evans, R.J., Richardson, T.S. et al. Introduction to Nested Markov Models. Behaviormetrika 41, 3–39 (2014). https://doi.org/10.2333/bhmk.41.3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2333/bhmk.41.3

Navigation