skip to main content
survey
Public Access

A Survey of Learning Causality with Data: Problems and Methods

Authors Info & Claims
Published:22 July 2020Publication History
Skip Abstract Section

Abstract

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from—or the same as—the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.

References

  1. Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dionissi Aliprantis. 2015. A distinction between causal effects in structural and rubin causal models. (2015).Google ScholarGoogle Scholar
  3. Michael Anderson and Jeremy Magruder. 2012. Learning from the cloud: Regression discontinuity estimates of the effects of an online review database. Eng. J. 122 (Oct. 2012), 957--989.Google ScholarGoogle Scholar
  4. Bryan Andrews, Joseph Ramsey, and Gregory F. Cooper. 2019. Learning high-dimensional directed acyclic graphs with mixed data-types. Proc. Mach. Learn. Res. 104 (2019).Google ScholarGoogle Scholar
  5. Joshua D. Angrist and Guido W. Imbens. 1995. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J. Am. Stat. Assoc. 90, 430 (1995), 431--442.Google ScholarGoogle ScholarCross RefCross Ref
  6. Joshua D. Angrist, Guido W. Imbens, and Donald B. Rubin. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91, 434 (1996), 444--455.Google ScholarGoogle ScholarCross RefCross Ref
  7. Joshua D. Angrist and Victor Lavy. 1999. Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114, 2 (1999), 533--575.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sinan Aral and Christos Nicolaides. 2017. Exercise contagion in a global social network. Nat. Commun. 8 (2017), 14753.Google ScholarGoogle ScholarCross RefCross Ref
  9. Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).Google ScholarGoogle Scholar
  10. Susan Athey and Guido W. Imbens. 2015. Machine learning methods for estimating heterogeneous causal effects. Stat 1050, 5 (2015).Google ScholarGoogle Scholar
  11. Susan Athey, Guido W. Imbens, and Stefan Wager. 2018. Approximate residual balancing: Debiased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B 80, 4 (2018), 597--623.Google ScholarGoogle ScholarCross RefCross Ref
  12. Peter C. Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46, 3 (2011), 399--424.Google ScholarGoogle ScholarCross RefCross Ref
  13. Davide Bacciu, Terence A. Etchells, Paulo J. G. Lisboa, and Joe Whittaker. 2013. Efficient identification of independence networks using mutual information. Comput. Stat. 28, 2 (2013), 621--646.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mohammad Taha Bahadori, Krzysztof Chalupka, Edward Choi, Robert Chen, Walter F. Stewart, and Jimeng Sun. 2017. Causal regularization. arXiv preprint arXiv:1702.02604 (2017).Google ScholarGoogle Scholar
  15. Eytan Bakshy, Dean Eckles, and Michael S. Bernstein. 2014. Designing and deploying online field experiments. In Proceedings of the Annual Conference on the World Wide Web (WWW’14). ACM, 283--292.Google ScholarGoogle Scholar
  16. Elias Bareinboim, Andrew Forney, and Judea Pearl. 2015. Bandits with unobserved confounders: A causal approach. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’15). 1342--1350.Google ScholarGoogle Scholar
  17. Elias Bareinboim and Judea Pearl. 2012. Transportability of causal effects: Completeness results. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  18. Elias Bareinboim and Jin Tian. 2015. Recovering causal effects from selection bias. In Proceedings of the AAAI Conference on Artifial Intelligence (AAAI’15). 3475--3481.Google ScholarGoogle Scholar
  19. John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’06). 120--128.Google ScholarGoogle ScholarCross RefCross Ref
  20. Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. J. Mach. Learn. Res. 14, 1 (2013), 3207--3260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Donald T. Campbell. 1969. Reforms as experiments. Am. Psychol. 24, 4 (1969), 409.Google ScholarGoogle ScholarCross RefCross Ref
  22. Christopher Carpenter and Carlos Dobkin. 2009. The effect of alcohol consumption on mortality: Regression discontinuity evidence from the minimum drinking age. American Economic Journal: Applied Economics 1, 1 (2009), 164--82.Google ScholarGoogle ScholarCross RefCross Ref
  23. Matias D. Cattaneo, Nicolás Idrobo, and Rocío Titiunik. 2017. A practical introduction to regression discontinuity designs. Cambridge Elements: Quantitative and Computational Methods for Social Science, Cambridge University Press (2017).Google ScholarGoogle ScholarCross RefCross Ref
  24. Lu Cheng, Raha Moraffah, Ruocheng Guo, K. S. Candan, Adrienne Raglin, and Huan Liu. 2019. A practical data repository for causal learning with big data. In Proceedings of the BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’19).Google ScholarGoogle Scholar
  25. David Maxwell Chickering. 1996. Learning Bayesian networks is NP-complete. In Learning from Data. Springer, 121--130.Google ScholarGoogle Scholar
  26. David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov (2002), 507--554.Google ScholarGoogle Scholar
  27. David M. Chickering, Dan Geiger, David Heckerman, et al. 1994. Learning Bayesian Networks Is NP-hard. Technical Report. Citeseer.Google ScholarGoogle Scholar
  28. Hugh A. Chipman, Edward I. George, Robert E. McCulloch, et al. 2010. BART: Bayesian additive regression trees. Ann. Appl. Stat. 4, 1 (2010), 266--298.Google ScholarGoogle ScholarCross RefCross Ref
  29. Tianjiao Chu and Clark Glymour. 2008. Search for additive nonlinear time series causal models. J. Mach. Learn. Res. 9, May (2008), 967--991.Google ScholarGoogle Scholar
  30. Diego Colombo and Marloes H. Maathuis. 2014. Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 15, 1 (2014), 3741--3782.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. (2012), 294--321.Google ScholarGoogle Scholar
  32. Thomas D. Cook, Donald Thomas Campbell, and William Shadish. 2002. Experimental and Quasi-experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston.Google ScholarGoogle Scholar
  33. Ruifei Cui, Perry Groot, and Tom Heskes. 2016. Copula PC algorithm for causal discovery from mixed data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’16). Springer, 377--392.Google ScholarGoogle ScholarCross RefCross Ref
  34. Hal Daumé III. 2009. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815 (2009).Google ScholarGoogle Scholar
  35. Martijn de Jongh and Marek J. Druzdzel. 2009. A comparison of structural distance measures for causal Bayesian network models. In Recent Advances in Intelligent Information Systems, Challenging Problems of Science, Computer Science Series (2009), 443--456.Google ScholarGoogle Scholar
  36. Rajeev H. Dehejia and Sadek Wahba. 1999. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. J. Am. Stat. Assoc. 94, 448 (1999), 1053--1062.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. 2019. Deep anomaly detection on attributed networks. In Proceedings of the SIAM International Conference on Data Mining (SDM’19). SIAM, 594--602.Google ScholarGoogle ScholarCross RefCross Ref
  38. Imme Ebert-Uphoff and Yi Deng. 2012. Causal discovery for climate research using graphical models. J. Clim. 25, 17 (2012), 5648--5665.Google ScholarGoogle ScholarCross RefCross Ref
  39. Andrew C. Eggers, Ronny Freier, Veronica Grembi, and Tommaso Nannicini. 2018. Regression discontinuity designs based on population thresholds: Pitfalls and solutions. Am. J. Pol. Sci. 62, 1 (2018), 210--229.Google ScholarGoogle ScholarCross RefCross Ref
  40. Michael Eichler. 2012. Causal inference in time series analysis. Causality: Statistical Perspectives and Applications (2012), 327--354.Google ScholarGoogle ScholarCross RefCross Ref
  41. Doris Entner, Patrik Hoyer, and Peter Spirtes. 2013. Data-driven covariate selection for nonparametric estimation of causal effects. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’13). 256--264.Google ScholarGoogle Scholar
  42. Doris Entner and Patrik O. Hoyer. 2010. On causal discovery from time series data using FCI. In Proceedings of the International Conference on Probalilistic Graphical Models (PGM’10). 121--128.Google ScholarGoogle Scholar
  43. Andrew Forney, Judea Pearl, and Elias Bareinboim. 2017. Counterfactual data-fusion for online reinforcement learners. In Proceedings of the International Conference on Machine Learning (ICML’17). 1156--1164.Google ScholarGoogle Scholar
  44. Constantine E. Frangakis and Donald B. Rubin. 2002. Principal stratification in causal inference. Biometrics 58, 1 (2002), 21--29.Google ScholarGoogle ScholarCross RefCross Ref
  45. Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, and Bernhard Schölkopf. 2008. Kernel measures of conditional dependence. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’08). 489--496.Google ScholarGoogle Scholar
  46. Michele Jonsson Funk, Daniel Westreich, Chris Wiesen, Til Stürmer, M. Alan Brookhart, and Marie Davidian. 2011. Doubly robust estimation of causal effects. Am. J. Epidemiol. 173, 7 (2011), 761--767.Google ScholarGoogle ScholarCross RefCross Ref
  47. Bin Gao and Yuehua Cui. 2015. Learning directed acyclic graphical structures with genetical genomics data. Bioinformatics 31, 24 (2015), 3953--3960.Google ScholarGoogle Scholar
  48. Andrew Gelman. 2011. Causality and statistical learning. Am. J. Sociol. 117, 3 (2011), 955--966.Google ScholarGoogle ScholarCross RefCross Ref
  49. Andrew Gelman and Guido Imbens. 2019. Why high-order polynomials should not be used in regression discontinuity designs. Journal of Business 8 Economic Statistics 37, 3 (2019), 447--456.Google ScholarGoogle ScholarCross RefCross Ref
  50. Mingming Gong, Kun Zhang, Bernhard Schölkopf, Clark Glymour, and Dacheng Tao. 2017. Causal discovery from temporally aggregated time series. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’17).Google ScholarGoogle Scholar
  51. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’14). 2672--2680.Google ScholarGoogle Scholar
  53. Xing Sam Gu and Paul R Rosenbaum. 1993. Comparison of multivariate matching methods: Structures, distances, and algorithms. J. Comput. Graph Stat. 2, 4 (1993), 405--420.Google ScholarGoogle Scholar
  54. Ruocheng Guo, Jundong Li, and Huan Liu. 2018. INITIATOR: Noise-contrastive estimation for marked temporal point process. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 2191--2197.Google ScholarGoogle ScholarCross RefCross Ref
  55. Ruocheng Guo, Jundong Li, and Huan Liu. 2020. Counterfactual evaluation of treatment assignment functions with networked observational data. In Proceedings of the SIAM International Conference on Data Mining (SDM’20). SIAM, 271--279.Google ScholarGoogle ScholarCross RefCross Ref
  56. Ruocheng Guo, Jundong Li, and Huan Liu. 2020. Learning individual causal effects from networked observational data. In Proceedings of the ACM International Web Search and Data Mining Conference (WSDM’20). ACM, 232--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ruocheng Guo, Yichuan Li, Jundong Li, K. Selçuk Candan, Adrienne Raglin, and Huan Liu. 2020. IGNITE: A minimax game toward learning individual treatment effects from networked observational data. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  58. P. Richard Hahn, Carlos M. Carvalho, David Puelz, Jingyu He, et al. 2018. Regularization and confounding in linear regression for treatment effect estimation. Bayes. Anal. 13, 1 (2018), 163--182.Google ScholarGoogle ScholarCross RefCross Ref
  59. P. Richard Hahn, Jared S. Murray, and Carlos Carvalho. 2017. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. arXiv preprint arXiv:1706.09523 (2017).Google ScholarGoogle Scholar
  60. Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Pol. Anal. 20, 1 (2012), 25--46.Google ScholarGoogle ScholarCross RefCross Ref
  61. Alain Hauser and Peter Bühlmann. 2015. Jointly interventional and observational data: Estimation of interventional Markov equivalence classes of directed acyclic graphs. J. R. Stat. Soc. Ser. B 77, 1 (2015), 291--318.Google ScholarGoogle ScholarCross RefCross Ref
  62. David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.Google ScholarGoogle ScholarCross RefCross Ref
  63. David Heckerman, Christopher Meek, and Gregory Cooper. 2006. A Bayesian approach to causal discovery. In Innovations in Machine Learning. Springer, 1--28.Google ScholarGoogle Scholar
  64. Miguel Ángel Hernán, Babette Brumback, and James M. Robins. 2000. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology (2000), 561--570.Google ScholarGoogle Scholar
  65. Jennifer L. Hill. 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph Stat. 20, 1 (2011), 217--240.Google ScholarGoogle ScholarCross RefCross Ref
  66. Keisuke Hirano, Guido W. Imbens, and Geert Ridder. 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 4 (2003), 1161--1189.Google ScholarGoogle ScholarCross RefCross Ref
  67. Paul W. Holland. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81, 396 (1986), 945--960.Google ScholarGoogle ScholarCross RefCross Ref
  68. Patrik O. Hoyer, Aapo Hyvarinen, Richard Scheines, Peter L. Spirtes, Joseph Ramsey, Gustavo Lacerda, and Shohei Shimizu. 2012. Causal discovery of linear acyclic models with arbitrary distributions. arXiv preprint arXiv:1206.3260 (2012).Google ScholarGoogle Scholar
  69. Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’09). 689--696.Google ScholarGoogle Scholar
  70. Aapo Hyvärinen and Erkki Oja. 2000. Independent component analysis: Algorithms and applications. Neur. Netw. 13, 4-5 (2000), 411--430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, and Patrik O. Hoyer. 2010. Estimation of a structural vector autoregression model using non-gaussianity. J. Mach. Learn. Res. 11, 5 (2010), 1709--1731.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Kosuke Imai and Marc Ratkovic. 2014. Covariate balancing propensity score. J. R. Stat. Soc. Ser. B 76, 1 (2014), 243--263.Google ScholarGoogle ScholarCross RefCross Ref
  73. Kosuke Imai, Marc Ratkovic, et al. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7, 1 (2013), 443--470.Google ScholarGoogle ScholarCross RefCross Ref
  74. Guido W. Imbens. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 86, 1 (2004), 4--29.Google ScholarGoogle ScholarCross RefCross Ref
  75. Dominik Janzing and Bernhard Schölkopf. 2015. Semi-supervised interpolation in an anticausal learning scenario. J. Mach. Learn. Res. 16, 1 (2015), 1923--1948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Marshall M. Joffe, Thomas R. Ten Have, Harold I. Feldman, and Stephen E. Kimmel. 2004. Model selection, confounder control, and marginal structural models: Review and new applications. Am. Stat. 58, 4 (2004), 272--279.Google ScholarGoogle ScholarCross RefCross Ref
  77. Fredrik Johansson, Uri Shalit, and David Sontag. 2016. Learning representations for counterfactual inference. In Proceedings of the International Conference on Machine Learning (ICML’16). 3020--3029.Google ScholarGoogle Scholar
  78. Markus Kalisch and Peter Bühlmann. 2007. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, Mar (2007), 613--636.Google ScholarGoogle Scholar
  79. Nathan Kallus, Aahlad Manas Puli, and Uri Shalit. 2018. Removing hidden confounding by experimental grounding. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10888--10897.Google ScholarGoogle Scholar
  80. Hyunseung Kang, Anru Zhang, T. Tony Cai, and Dylan S. Small. 2016. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Am. Stat. Assoc. 111, 513 (2016), 132--144.Google ScholarGoogle ScholarCross RefCross Ref
  81. Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google ScholarGoogle Scholar
  82. Murat Kocaoglu, Alex Dimakis, and Sriram Vishwanath. 2017. Cost-optimal learning of causal graphs. In Proceedings of the International Conference on Machine Learning (ICML’17). 1875--1884.Google ScholarGoogle Scholar
  83. Kun Kuang, Peng Cui, Susan Athey, Ruoxuan Xiong, and Bo Li. 2018. Stable prediction across unknown environments. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 1617--1626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 265--274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Matt J. Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 4066--4076.Google ScholarGoogle Scholar
  86. Robert J. LaLonde. 1986. Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. (1986), 604--620.Google ScholarGoogle Scholar
  87. Finnian Lattimore, Tor Lattimore, and Mark D. Reid. 2016. Causal bandits: Learning good interventions via causal inference. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’16). 1181--1189.Google ScholarGoogle Scholar
  88. Thuc Duy Le, Tao Hoang, Jiuyong Li, Lin Liu, and Huawen Liu. 2015. A fast PC algorithm for high dimensional causal discovery with multi-core PCs. arXiv preprint arXiv:1502.02454 (2015).Google ScholarGoogle Scholar
  89. Thuc Duy Le, Lin Liu, Anna Tsykin, Gregory J. Goodall, Bing Liu, Bing-Yu Sun, and Jiuyong Li. 2013. Inferring microRNA--mRNA causal regulatory relationships from expression data. Bioinformatics 29, 6 (2013), 765--771.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’90). 396--404.Google ScholarGoogle Scholar
  91. Jason D. Lee and Trevor J. Hastie. 2015. Learning the structure of mixed graphical models. J. Comput. Graph. Stat. 24, 1 (2015), 230--253.Google ScholarGoogle ScholarCross RefCross Ref
  92. Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Comput. Surv. 50, 6 (2017), 94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Jundong Li, Ruocheng Guo, Chenghao Liu, and Huan Liu. 2019. Adaptive unsupervised feature selection on attributed networks. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 92--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Jundong Li, Osmar R. Zaïane, and Alvaro Osornio-Vargas. 2014. Discovering statistically significant co-location rules in datasets with extended spatial objects. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery (DaWaK’14). 124--135.Google ScholarGoogle ScholarCross RefCross Ref
  95. Yichuan Li, Ruocheng Guo, Weiying Wang, and Huan Liu. 2019. Causal learning in question quality improvement. In Proceedings of the BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’20).Google ScholarGoogle Scholar
  96. David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. 2017. Discovering causal signals in images. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  97. Christos Louizos, Uri Shalit, Joris M. Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal effect inference with deep latent-variable models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’17). 6446--6456.Google ScholarGoogle Scholar
  98. Jared K. Lunceford and Marie Davidian. 2004. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med. 23, 19 (2004), 2937--2960.Google ScholarGoogle ScholarCross RefCross Ref
  99. Miguel Hernan and James M. Robins. forthcoming. Causal Inference. CRC Boca Raton, FL.Google ScholarGoogle Scholar
  100. Daniel Malinsky and David Danks. 2018. Causal discovery algorithms: A practical guide. Philos. Compass 13, 1 (2018), e12470.Google ScholarGoogle ScholarCross RefCross Ref
  101. Subramani Mani and Gregory F. Cooper. 2000. Causal discovery from medical textual data. In Proceedings of the AMIA Symposium. 542.Google ScholarGoogle Scholar
  102. Ericsson Marin, Ruocheng Guo, and Paulo Shakarian. 2017. Temporal analysis of influence to predict usersâ adoption in online social networks. In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, 254--261.Google ScholarGoogle ScholarCross RefCross Ref
  103. Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the Conference on the International Speech Comunication Association (INTERSPEECH’10).Google ScholarGoogle ScholarCross RefCross Ref
  104. Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Raha Moraffah, Mansooreh Karami, Ruocheng Guo, Adrienne Ragliny, and Huan Liu. 2020. Causal interpretability for machine learning--problems, methods and evaluation. arXiv preprint arXiv:2003.03934 (2020).Google ScholarGoogle Scholar
  106. Stephen L. Morgan and Christopher Winship. 2015. Counterfactuals and Causal Inference. Cambridge University Press.Google ScholarGoogle Scholar
  107. Preetam Nandy, Alain Hauser, Marloes H. Maathuis, et al. 2018. High-dimensional consistency in score-based and hybrid structure learning. Ann. Stat. 46, 6A (2018), 3151--3183.Google ScholarGoogle ScholarCross RefCross Ref
  108. Jersey Neyman. 1923. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczn. Nauk Rolniczych 10 (1923), 1--51.Google ScholarGoogle Scholar
  109. Cross-Disorder Group of the Psychiatric Genomics Consortium et al. 2013. Identification of risk loci with shared effects on five major psychiatric disorders: A genome-wide analysis. The Lancet 381, 9875 (2013), 1371--1379.Google ScholarGoogle Scholar
  110. Michael J. Paul. 2017. Feature selection as causal inference: Experiments with text classification. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL’17). 163--172.Google ScholarGoogle ScholarCross RefCross Ref
  111. Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.Google ScholarGoogle ScholarCross RefCross Ref
  112. Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics Surveys 3 (2009), 96--146.Google ScholarGoogle ScholarCross RefCross Ref
  113. Judea Pearl. 2009. Causality. Cambridge University Press.Google ScholarGoogle Scholar
  114. Judea Pearl. 2018. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016 (2018).Google ScholarGoogle Scholar
  115. Judea Pearl and Elias Bareinboim. 2011. Transportability of causal and statistical relations: A formal approach. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  117. Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B Stat. Methodol. 78, 5 (2016), 947--1012.Google ScholarGoogle ScholarCross RefCross Ref
  118. Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2013. Causal inference on time series using restricted structural equation models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’13). 154--162.Google ScholarGoogle Scholar
  119. Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Thai T. Pham and Yuanyuan Shen. 2017. A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform. arXiv preprint arXiv:1706.02795 (2017).Google ScholarGoogle Scholar
  121. Vineet K. Raghu, Allen Poon, and Panayiotis V. Benos. 2018. Evaluation of causal structure learning methods on mixed data types. Proc. Mach. Learn. Res. 92 (2018), 48.Google ScholarGoogle Scholar
  122. Vineeth Rakesh, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, and Huan Liu. 2018. Linked causal variational autoencoder for inferring paired spillover effects. In Proceedings of the Conference on Information and Knowledge Management (CIKM’18). ACM, 1679--1682.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.Google ScholarGoogle ScholarCross RefCross Ref
  124. Joseph D. Ramsey. 2014. A scalable conditional independence test for nonlinear, non-Gaussian data. arXiv preprint arXiv:1401.5031 (2014).Google ScholarGoogle Scholar
  125. Thomas Richardson. 1996. A discovery algorithm for directed cyclic graphs. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’96). Morgan Kaufmann Publishers Inc., 454--461.Google ScholarGoogle Scholar
  126. Thomas S. Richardson and James M. Robins. [n.d.]. Single world intervention graphs: A primer. ([n.d.]).Google ScholarGoogle Scholar
  127. James M. Robins, Miguel Ángel Hernán, and Babette Brumback. 2000. Marginal structural models and causal inference in epidemiology. Epidemiology (2000), 550--560.Google ScholarGoogle ScholarCross RefCross Ref
  128. Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. 2015. A causal perspective on domain adaptation. Stat 1050 (2015), 19.Google ScholarGoogle Scholar
  129. Teemu Roos, Tomi Silander, Petri Kontkanen, and Petri Myllymaki. 2008. Bayesian network structure learning using factorized NML universal models. In Proceedings of the Information Theory and Applications Workshop (ITA Workshop’08). 272--276.Google ScholarGoogle ScholarCross RefCross Ref
  130. Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41--55.Google ScholarGoogle ScholarCross RefCross Ref
  131. Donald B. Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies.J. Educ. Psychol. 66, 5 (1974), 688.Google ScholarGoogle ScholarCross RefCross Ref
  132. Soumajyoti Sarkar, Ruocheng Guo, and Paulo Shakarian. 2019. Using network motifs to characterize temporal network evolution leading to diffusion inhibition. Soc. Netw. Anal. Min. 9, 1 (2019), 14.Google ScholarGoogle ScholarCross RefCross Ref
  133. Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471 (2012).Google ScholarGoogle Scholar
  134. Gideon Schwarz et al. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.Google ScholarGoogle ScholarCross RefCross Ref
  135. Andrew J. Sedgewick, Ivy Shi, Rory M. Donovan, and Panayiotis V. Benos. 2016. Learning mixed graphical models with separate sparsity parameters and stability-based model selection. BMC Bioinform. 17, 5 (2016), S175.Google ScholarGoogle ScholarCross RefCross Ref
  136. Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. (2013), 2263--2291.Google ScholarGoogle Scholar
  137. Eleni Sgouritsa, Dominik Janzing, Philipp Hennig, and Bernhard Schölkopf. 2015. Inference of cause and effect with unsupervised inverse regression. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’15). 847--855.Google ScholarGoogle Scholar
  138. Uri Shalit, Fredrik D. Johansson, and David Sontag. 2017. Estimating individual treatment effect: Generalization bounds and algorithms. In Proceedings of the International Conference on Machine Learning (ICML’17). 3076--3085.Google ScholarGoogle Scholar
  139. Zheyan Shen, Peng Cui, Kun Kuang, and Bo Li. 2017. On image classification: Correlation vs causality. arXiv preprint arXiv:1708.06656 (2017).Google ScholarGoogle Scholar
  140. Shohei Shimizu. 2014. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika 41, 1 (2014), 65--98.Google ScholarGoogle ScholarCross RefCross Ref
  141. Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct (2006), 2003--2030.Google ScholarGoogle Scholar
  142. Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. 2011. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. J. Mach. Learn. Res. 12 (2011), 1225--1248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. Ricardo Silva. 2016. Observational-interventional priors for dose-response learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’16). 1561--1569.Google ScholarGoogle Scholar
  144. Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. MIT Press.Google ScholarGoogle Scholar
  145. Peter Spirtes, Christopher Meek, and Thomas Richardson. 1995. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’95). 499--506.Google ScholarGoogle Scholar
  146. Peter Spirtes and Kun Zhang. 2016. Causal discovery and inference: Concepts and recent methodological advances. In Applied Informatics, Vol. 3. 3.Google ScholarGoogle ScholarCross RefCross Ref
  147. Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Matt Taddy, Hedibert Freitas Lopes, and Matt Gardner. 2016. Scalable semiparametric inference for the means of heavy-tailed distributions. arXiv preprint arXiv:1602.08066 (2016).Google ScholarGoogle Scholar
  149. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  150. Michalis Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian process latent variable model. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). 844--851.Google ScholarGoogle Scholar
  151. Panos Toulis, Alexander Volfovsky, and Edoardo M. Airoldi. 2018. Propensity score methodology in the presence of network entanglement between treatments *. arXiv preprint arXiv:1801.07310 (2018).Google ScholarGoogle Scholar
  152. Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.Google ScholarGoogle Scholar
  153. Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Mark J. Van Der Laan and Daniel Rubin. 2006. Targeted maximum likelihood learning. Int. J. Biostat. 2, 1 (2006).Google ScholarGoogle ScholarCross RefCross Ref
  155. Tyler J. VanderWeele. 2011. Principal stratification--uses and limitations. Int. J. Biostat. 7, 1 (2011), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  156. Stefan Wager and Susan Athey. 2017. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. just-accepted (2017).Google ScholarGoogle Scholar
  157. Yuhao Wang, Liam Solus, Karren Yang, and Caroline Uhler. 2017. Permutation-based causal inference algorithms with interventions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’17). 5822--5831.Google ScholarGoogle Scholar
  158. Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 1 (1996), 69--101.Google ScholarGoogle ScholarCross RefCross Ref
  159. Man Leung Wong, Shing Yan Lee, and Kwong Sak Leung. 2002. A hybrid approach to discover Bayesian networks from databases using evolutionary programming. In Proceedings of the IEEE International Confernece on Data Mining (ICDM’02). IEEE, 498--505.Google ScholarGoogle Scholar
  160. Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, and Ken-ichi Kawarabayashi. 2018. Causal bandits with propagating inference. arXiv preprint arXiv:1806.02252 (2018).Google ScholarGoogle Scholar
  161. Xuan Yin and Liangjie Hong. 2019. The identification and estimation of direct and indirect effects in A/B tests through causal mediation analysis. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Junzhe Zhang and Elias Bareinboim. 2017. Transfer learning in multi-armed bandit: A causal approach. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AMMAS’17). 1778--1780.Google ScholarGoogle ScholarCross RefCross Ref
  163. Kun Zhang and Aapo Hyvärinen. 2009. On the identifiability of the post-nonlinear causal model. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’09). 647--655.Google ScholarGoogle Scholar
  164. Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775 (2012).Google ScholarGoogle Scholar
  165. Dawei Zhou, Jingrui He, Hongxia Yang, and Wei Fan. 2018. Sparc: Self-paced network representation for few-shot rare category characterization. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 2807--2816.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Survey of Learning Causality with Data: Problems and Methods

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 53, Issue 4
        July 2021
        831 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/3410467
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 July 2020
        • Online AM: 7 May 2020
        • Accepted: 1 April 2020
        • Revised: 1 January 2020
        • Received: 1 September 2018
        Published in csur Volume 53, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format