Abstract
Graphical models provide a principled way to take advantage of independence constraints for probabilistic and causal modeling, while giving an intuitive graphical description of “qualitative features” useful for these tasks. A popular graphical model, known as a Bayesian network, represents joint distributions by means of a directed acyclic graph (DAG). DAGs provide a natural representation of conditional independence constraints, and also have a simple causal interpretation. When all variables are observed, the associated statistical models have many attractive properties. However, in many practical data analyses unobserved variables may be present. In general, the set of marginal distributions obtained from a DAG model with hidden variables is a much more complicated statistical model: the likelihood of the marginal is often intractable; the model may contain singularities. There are also an infinite number of such models to consider.
It is possible to avoid these difficulties by modeling the observed marginal directly. One strategy is to define a model by means of conditional independence constraints induced on the observed marginal by the hidden variable DAG; we call this the ordinary Markov model. This model will be a supermodel that contains the set of marginal distributions obtained from the original DAG. Richardson and Spirtes (2002) and Evans and Richardson (2013a) gave parametrizations of this model in the Gaussian and discrete case, respectively.
However, it has long been known that hidden variable DAG models also imply nonparametric constraints which generalize conditional independences; these are sometimes called “Verma Constraints”. In this paper we describe a natural extension of the ordinary Markov approach, whereby both conditional independences and these generalized constraints are used to define a nested Markov model. The binary nested Markov model may be parametrized via a simple extension of the binary parametrization of the ordinary Markov model of Evans and Richardson (2013a). We also give evidence for a characterization of nested Markov equivalence for models with four observed variables. A consequence of this characterization is that, in some instances, most structural features of hidden variable DAGs can be recovered exactly when a single generalized independence constraint holds under the distribution of the observed variables.
Similar content being viewed by others
References
Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1):147–169.
Ali, A., Richardson, T. S., and Spirtes, P. (2009). Markov equivalence for ancestral graphs. Annals of Statistics, 37:2808–2837.
Beal, M. J. and Ghahramani, Z. (2004). Variational Bayesian learning of directed graphical models with hidden variables. Bayesian Analysis, 1(1):1–44.
Bergsma, W. P. and Rudas, T. (2002). Marginal models for categorical data. Annals of Statistics, 30(1):140–159.
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554.
Claassen, T., Mooij, J. M. and Heskes, T. (2013) Learning Sparse Causal Models is not NP-hard, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13), abs/1309.6824
Colombo, D., Maathuis, M. H., Kalisch, M. and Richardson T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Annals of Statistics, 40: pp.294–321.
Cooper, G. F. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347.
Drton, M. (2009). Likelihood ratio tests and singularities. Annals of Statistics, 37(2):979–1012.
Drton, M. and Plummer, M. (2013). A Bayesian information criterion for singular models. ArXiv eprints.
Drton, M., Sturmfels, B., and Sullivant, S. (2009). Lectures on Algebraic Statistics, volume 40. Birkhäuser, Basel.
Evans, R. J. (2011). Parameterizations of Discrete Graphical Models. PhD thesis, Department of Statistics, University of Washington.
Evans, R. J. (2012). Graphical methods for inequality constraints in marginalized DAGs. In 22nd Workshop on Machine Learning and Signal Processing.
Evans, R. J. and Richardson, T. S. (2010). Maximum likelihood fitting of acyclic directed mixed graphs to binary data. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, AUAI Press.
Evans, R. J. and Richardson, T. S. (2013a). Marginal log-linear parameters for graphical Markov models. Journal of the Royal Statistical Society: Series B, 75(4):743–768.
Evans, R. J. and Richardson, T. S. (2013b). Markovian acyclic directed mixed graphs for discrete data. Annals of Statistics, accepted for publication. abs/1301.6624.
Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the 14th International Conference on Machine Learning, pages 125–133. Morgan Kaufmann.
Friedman, N., Linial, M., Nachman, I., and Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3/4):601–620.
Geiger, D. and Meek, C. (1998). Graphical models and exponential families. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 156–165.
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
Lauritzen, S. L. (1996). Graphical Models. Oxford, U.K.: Clarendon.
Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle. Excerpts reprinted (1990) in English. Statistical Science, 5:463–472.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo.
Pearl, J. (1995). On the testability of causal models with latent and instrumental variables. In Proceedings of the Eleventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), pages 435–443, San Francisco, CA. Morgan Kaufmann.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.
Pourret, O., Nä0im, P., and Marcot, B. (2008). Bayesian Networks: A Practical Guide to Applications. Wiley.
Richardson, T. S. (2003). Markov properties for acyclic directed mixed graphs. The Scandinavian Journal of Statistics, 30(1):145–157.
Richardson, T. S., Robins, J. M., and Shpitser, I. (2012). Nested Markov properties for acyclic directed mixed graphs. Presented at the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12).
Richardson, T. S. and Spirtes, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30(4):962–1030.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modeling, 7:1393–1512.
Robins, J. M. (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. In Sechrest, L., Freeman, H., and Mulley, A., editors, Health Service Research Methodology: A Focus on AIDS, pages 113–159. NCHSR, U.S. Public Health Service.
Robins, J. M. (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika, 79:321–334.
Robins, J. M. (1997). Causal inference from complex longitudinal data. In Berkane, M., editor, Latent variable modelling and applications to causality, number 120 in Lecture notes in statistics, pages 69–117. Springer-Verlag, New York.
Robins, J. M. (1999). Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models. In Glymour, C. and Cooper, G., editors, Computation, Causation, and Discovery, pages 349–405. Menlo Park, CA, CAmbridge, MA: AAAI Press/The MIT Press.
Robins, J. M. (2000). Marginal structural models versus structural nested models as tools for causal inference. In Halloran, M. and Berry, D., editors, Statistical Models in Epidemiology, the Environment, and Clinical Trials, volume 116 of The IMA Volumes in Mathematics and its Applications, pages 95–133. Springer New York.
Robins, J. M. and Richardson, T. S. (2010). Alternative graphical causal models and the identification of direct effects. In P. Shrout, K. Keyes, and K. Ornstein, editors, Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. Chapter 6, pages 1–52, Oxford University Press.
Robins, J. M., Scheines, R., Spirtes, P., and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika, 90(3):491–515.
Robins, J. M. and Wasserman, L. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, pages 309–420. Morgan Kaufmann.
Robins, J. M. and Wasserman, L. (1999). On the impossibility of inferring causation from association without background knowledge. In Computation, Causation, and Discovery, pages 305–321. Menlo Park, CA, Cambridge, MA: AAAI Press/The MIT Press.
Roweis, S. and Ghahramani, Z. (1999). A unifying review of linear Gaussian models. Neural Computation, 11(2):305–345.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology, 66:688–701.
Rusakov, D. and Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6:1–35.
Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6:461–464.
Settimi, R. and Smith, J. Q. (1998). On the geometry of Bayesian graphical models with hidden variables. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 472–479.
Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030.
Shpitser, I., Evans, R. J., Richardson, T. S., and Robins, J. M. (2013). Sparse nested Markov models with log-linear parameters. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI-13). AUAI Press.
Shpitser, I. and Pearl, J. (2006). Identification of conditional interventional distributions. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, (UAI-2006). AUAI Press.
Shpitser, I. and Pearl, J. (2008). Dormant independence. Technical Report R-340, Cognitive Systems Laboratory, University of California, Los Angeles.
Shpitser, I., Richardson, T. S., and Robins, J. M. (2009). Testing edges by truncations. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, volume 21, pages 1957–1963.
Shpitser, I., Richardson, T. S., and Robins, J. M. (2011). An efficient algorithm for computing interventional distributions in latent variable causal models. In 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). AUAI Press.
Shpitser, I., Richardson, T. S., Robins, J. M., and Evans, R. J. (2012). Parameter and structure learning in nested Markov models. In Proceedings of the Causal Structure Learning Workshop of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12).
Shwe, M. A., Middleton, B., Heckerman, D. E., Henrion, M., Horvitz, E. J., Lehmann, H. P., and Cooper, G. F. (1991). Probabilistic diagnosis using a reformulation of the INTERN-IST- 1/QMR knowledge base. Methods of Information in Medicine, 30(4):241–267.
Spirtes, P., Glymour, C., and Scheines, R. (1993) Causation, Prediction, and Search. Springer Verlag, New York.
Tian, J. and Pearl, J. (2002). On the testable implications of causal models with hidden variables. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI-02), pages 519–527. AUAI Press.
Verma, T. S. and Pearl, J. (1990). Equivalence and synthesis of causal models. Technical Report R-150, Department of Computer Science, University of California, Los Angeles.
Zhang, N. L. and Poole, D. (1994). A simple approach to Bayesian network computations. In Tenth Canadian Conference on AI, pages 171–178.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Shpitser, I., Evans, R.J., Richardson, T.S. et al. Introduction to Nested Markov Models. Behaviormetrika 41, 3–39 (2014). https://doi.org/10.2333/bhmk.41.3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.2333/bhmk.41.3