A Survey of Learning Causality with Data: Problems and Methods

Authors:
Ruocheng Guo

Computer Science and Engineering, Arizona State University, Tempe, AZ

Computer Science and Engineering, Arizona State University, Tempe, AZ

0000-0002-8522-6142
View Profile

,
Lu Cheng

Computer Science and Engineering, Arizona State University, Tempe, AZ

Computer Science and Engineering, Arizona State University, Tempe, AZ
View Profile

,
Jundong Li

Department of Electrical and Computer Engineering, Computer Science 8 School of Data Science, University of Virginia, Charlottesville, VA, USA

Department of Electrical and Computer Engineering, Computer Science 8 School of Data Science, University of Virginia, Charlottesville, VA, USA

0000-0002-1878-817X
View Profile

,
P. Richard Hahn

Department of Mathematics and Statistics, Arizona State University, Tempe, AZ

Department of Mathematics and Statistics, Arizona State University, Tempe, AZ
View Profile

,
Huan Liu

Computer Science and Engineering, Arizona State University, Tempe, AZ

Computer Science and Engineering, Arizona State University, Tempe, AZ
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 53 Issue 4Article No.: 75pp 1–37https://doi.org/10.1145/3397269

Published:22 July 2020Publication History

ACM Computing Surveys

Abstract

This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from—or the same as—the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.

References

Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688.Google ScholarDigital Library
Dionissi Aliprantis. 2015. A distinction between causal effects in structural and rubin causal models. (2015).Google Scholar
Michael Anderson and Jeremy Magruder. 2012. Learning from the cloud: Regression discontinuity estimates of the effects of an online review database. Eng. J. 122 (Oct. 2012), 957--989.Google Scholar
Bryan Andrews, Joseph Ramsey, and Gregory F. Cooper. 2019. Learning high-dimensional directed acyclic graphs with mixed data-types. Proc. Mach. Learn. Res. 104 (2019).Google Scholar
Joshua D. Angrist and Guido W. Imbens. 1995. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J. Am. Stat. Assoc. 90, 430 (1995), 431--442.Google ScholarCross Ref
Joshua D. Angrist, Guido W. Imbens, and Donald B. Rubin. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91, 434 (1996), 444--455.Google ScholarCross Ref
Joshua D. Angrist and Victor Lavy. 1999. Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114, 2 (1999), 533--575.Google ScholarCross Ref
Sinan Aral and Christos Nicolaides. 2017. Exercise contagion in a global social network. Nat. Commun. 8 (2017), 14753.Google ScholarCross Ref
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).Google Scholar
Susan Athey and Guido W. Imbens. 2015. Machine learning methods for estimating heterogeneous causal effects. Stat 1050, 5 (2015).Google Scholar
Susan Athey, Guido W. Imbens, and Stefan Wager. 2018. Approximate residual balancing: Debiased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B 80, 4 (2018), 597--623.Google ScholarCross Ref
Peter C. Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46, 3 (2011), 399--424.Google ScholarCross Ref
Davide Bacciu, Terence A. Etchells, Paulo J. G. Lisboa, and Joe Whittaker. 2013. Efficient identification of independence networks using mutual information. Comput. Stat. 28, 2 (2013), 621--646.Google ScholarDigital Library
Mohammad Taha Bahadori, Krzysztof Chalupka, Edward Choi, Robert Chen, Walter F. Stewart, and Jimeng Sun. 2017. Causal regularization. arXiv preprint arXiv:1702.02604 (2017).Google Scholar
Eytan Bakshy, Dean Eckles, and Michael S. Bernstein. 2014. Designing and deploying online field experiments. In Proceedings of the Annual Conference on the World Wide Web (WWW’14). ACM, 283--292.Google Scholar
Elias Bareinboim, Andrew Forney, and Judea Pearl. 2015. Bandits with unobserved confounders: A causal approach. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’15). 1342--1350.Google Scholar
Elias Bareinboim and Judea Pearl. 2012. Transportability of causal effects: Completeness results. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Elias Bareinboim and Jin Tian. 2015. Recovering causal effects from selection bias. In Proceedings of the AAAI Conference on Artifial Intelligence (AAAI’15). 3475--3481.Google Scholar
John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’06). 120--128.Google ScholarCross Ref
Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. J. Mach. Learn. Res. 14, 1 (2013), 3207--3260.Google ScholarDigital Library
Donald T. Campbell. 1969. Reforms as experiments. Am. Psychol. 24, 4 (1969), 409.Google ScholarCross Ref
Christopher Carpenter and Carlos Dobkin. 2009. The effect of alcohol consumption on mortality: Regression discontinuity evidence from the minimum drinking age. American Economic Journal: Applied Economics 1, 1 (2009), 164--82.Google ScholarCross Ref
Matias D. Cattaneo, Nicolás Idrobo, and Rocío Titiunik. 2017. A practical introduction to regression discontinuity designs. Cambridge Elements: Quantitative and Computational Methods for Social Science, Cambridge University Press (2017).Google ScholarCross Ref
Lu Cheng, Raha Moraffah, Ruocheng Guo, K. S. Candan, Adrienne Raglin, and Huan Liu. 2019. A practical data repository for causal learning with big data. In Proceedings of the BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’19).Google Scholar
David Maxwell Chickering. 1996. Learning Bayesian networks is NP-complete. In Learning from Data. Springer, 121--130.Google Scholar
David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov (2002), 507--554.Google Scholar
David M. Chickering, Dan Geiger, David Heckerman, et al. 1994. Learning Bayesian Networks Is NP-hard. Technical Report. Citeseer.Google Scholar
Hugh A. Chipman, Edward I. George, Robert E. McCulloch, et al. 2010. BART: Bayesian additive regression trees. Ann. Appl. Stat. 4, 1 (2010), 266--298.Google ScholarCross Ref
Tianjiao Chu and Clark Glymour. 2008. Search for additive nonlinear time series causal models. J. Mach. Learn. Res. 9, May (2008), 967--991.Google Scholar
Diego Colombo and Marloes H. Maathuis. 2014. Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 15, 1 (2014), 3741--3782.Google ScholarDigital Library
Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. (2012), 294--321.Google Scholar
Thomas D. Cook, Donald Thomas Campbell, and William Shadish. 2002. Experimental and Quasi-experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston.Google Scholar
Ruifei Cui, Perry Groot, and Tom Heskes. 2016. Copula PC algorithm for causal discovery from mixed data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’16). Springer, 377--392.Google ScholarCross Ref
Hal Daumé III. 2009. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815 (2009).Google Scholar
Martijn de Jongh and Marek J. Druzdzel. 2009. A comparison of structural distance measures for causal Bayesian network models. In Recent Advances in Intelligent Information Systems, Challenging Problems of Science, Computer Science Series (2009), 443--456.Google Scholar
Rajeev H. Dehejia and Sadek Wahba. 1999. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. J. Am. Stat. Assoc. 94, 448 (1999), 1053--1062.Google ScholarCross Ref
Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. 2019. Deep anomaly detection on attributed networks. In Proceedings of the SIAM International Conference on Data Mining (SDM’19). SIAM, 594--602.Google ScholarCross Ref
Imme Ebert-Uphoff and Yi Deng. 2012. Causal discovery for climate research using graphical models. J. Clim. 25, 17 (2012), 5648--5665.Google ScholarCross Ref
Andrew C. Eggers, Ronny Freier, Veronica Grembi, and Tommaso Nannicini. 2018. Regression discontinuity designs based on population thresholds: Pitfalls and solutions. Am. J. Pol. Sci. 62, 1 (2018), 210--229.Google ScholarCross Ref
Michael Eichler. 2012. Causal inference in time series analysis. Causality: Statistical Perspectives and Applications (2012), 327--354.Google ScholarCross Ref
Doris Entner, Patrik Hoyer, and Peter Spirtes. 2013. Data-driven covariate selection for nonparametric estimation of causal effects. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’13). 256--264.Google Scholar
Doris Entner and Patrik O. Hoyer. 2010. On causal discovery from time series data using FCI. In Proceedings of the International Conference on Probalilistic Graphical Models (PGM’10). 121--128.Google Scholar
Andrew Forney, Judea Pearl, and Elias Bareinboim. 2017. Counterfactual data-fusion for online reinforcement learners. In Proceedings of the International Conference on Machine Learning (ICML’17). 1156--1164.Google Scholar
Constantine E. Frangakis and Donald B. Rubin. 2002. Principal stratification in causal inference. Biometrics 58, 1 (2002), 21--29.Google ScholarCross Ref
Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, and Bernhard Schölkopf. 2008. Kernel measures of conditional dependence. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’08). 489--496.Google Scholar
Michele Jonsson Funk, Daniel Westreich, Chris Wiesen, Til Stürmer, M. Alan Brookhart, and Marie Davidian. 2011. Doubly robust estimation of causal effects. Am. J. Epidemiol. 173, 7 (2011), 761--767.Google ScholarCross Ref
Bin Gao and Yuehua Cui. 2015. Learning directed acyclic graphical structures with genetical genomics data. Bioinformatics 31, 24 (2015), 3953--3960.Google Scholar
Andrew Gelman. 2011. Causality and statistical learning. Am. J. Sociol. 117, 3 (2011), 955--966.Google ScholarCross Ref
Andrew Gelman and Guido Imbens. 2019. Why high-order polynomials should not be used in regression discontinuity designs. Journal of Business 8 Economic Statistics 37, 3 (2019), 447--456.Google ScholarCross Ref
Mingming Gong, Kun Zhang, Bernhard Schölkopf, Clark Glymour, and Dacheng Tao. 2017. Causal discovery from temporally aggregated time series. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’17).Google Scholar
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarDigital Library
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’14). 2672--2680.Google Scholar
Xing Sam Gu and Paul R Rosenbaum. 1993. Comparison of multivariate matching methods: Structures, distances, and algorithms. J. Comput. Graph Stat. 2, 4 (1993), 405--420.Google Scholar
Ruocheng Guo, Jundong Li, and Huan Liu. 2018. INITIATOR: Noise-contrastive estimation for marked temporal point process. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 2191--2197.Google ScholarCross Ref
Ruocheng Guo, Jundong Li, and Huan Liu. 2020. Counterfactual evaluation of treatment assignment functions with networked observational data. In Proceedings of the SIAM International Conference on Data Mining (SDM’20). SIAM, 271--279.Google ScholarCross Ref
Ruocheng Guo, Jundong Li, and Huan Liu. 2020. Learning individual causal effects from networked observational data. In Proceedings of the ACM International Web Search and Data Mining Conference (WSDM’20). ACM, 232--240.Google ScholarDigital Library
Ruocheng Guo, Yichuan Li, Jundong Li, K. SelÃ§uk Candan, Adrienne Raglin, and Huan Liu. 2020. IGNITE: A minimax game toward learning individual treatment effects from networked observational data. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’20).Google ScholarCross Ref
P. Richard Hahn, Carlos M. Carvalho, David Puelz, Jingyu He, et al. 2018. Regularization and confounding in linear regression for treatment effect estimation. Bayes. Anal. 13, 1 (2018), 163--182.Google ScholarCross Ref
P. Richard Hahn, Jared S. Murray, and Carlos Carvalho. 2017. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. arXiv preprint arXiv:1706.09523 (2017).Google Scholar
Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Pol. Anal. 20, 1 (2012), 25--46.Google ScholarCross Ref
Alain Hauser and Peter Bühlmann. 2015. Jointly interventional and observational data: Estimation of interventional Markov equivalence classes of directed acyclic graphs. J. R. Stat. Soc. Ser. B 77, 1 (2015), 291--318.Google ScholarCross Ref
David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.Google ScholarCross Ref
David Heckerman, Christopher Meek, and Gregory Cooper. 2006. A Bayesian approach to causal discovery. In Innovations in Machine Learning. Springer, 1--28.Google Scholar
Miguel Ángel Hernán, Babette Brumback, and James M. Robins. 2000. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology (2000), 561--570.Google Scholar
Jennifer L. Hill. 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph Stat. 20, 1 (2011), 217--240.Google ScholarCross Ref
Keisuke Hirano, Guido W. Imbens, and Geert Ridder. 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 4 (2003), 1161--1189.Google ScholarCross Ref
Paul W. Holland. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81, 396 (1986), 945--960.Google ScholarCross Ref
Patrik O. Hoyer, Aapo Hyvarinen, Richard Scheines, Peter L. Spirtes, Joseph Ramsey, Gustavo Lacerda, and Shohei Shimizu. 2012. Causal discovery of linear acyclic models with arbitrary distributions. arXiv preprint arXiv:1206.3260 (2012).Google Scholar
Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’09). 689--696.Google Scholar
Aapo Hyvärinen and Erkki Oja. 2000. Independent component analysis: Algorithms and applications. Neur. Netw. 13, 4-5 (2000), 411--430.Google ScholarDigital Library
Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, and Patrik O. Hoyer. 2010. Estimation of a structural vector autoregression model using non-gaussianity. J. Mach. Learn. Res. 11, 5 (2010), 1709--1731.Google ScholarDigital Library
Kosuke Imai and Marc Ratkovic. 2014. Covariate balancing propensity score. J. R. Stat. Soc. Ser. B 76, 1 (2014), 243--263.Google ScholarCross Ref
Kosuke Imai, Marc Ratkovic, et al. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7, 1 (2013), 443--470.Google ScholarCross Ref
Guido W. Imbens. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 86, 1 (2004), 4--29.Google ScholarCross Ref
Dominik Janzing and Bernhard Schölkopf. 2015. Semi-supervised interpolation in an anticausal learning scenario. J. Mach. Learn. Res. 16, 1 (2015), 1923--1948.Google ScholarDigital Library
Marshall M. Joffe, Thomas R. Ten Have, Harold I. Feldman, and Stephen E. Kimmel. 2004. Model selection, confounder control, and marginal structural models: Review and new applications. Am. Stat. 58, 4 (2004), 272--279.Google ScholarCross Ref
Fredrik Johansson, Uri Shalit, and David Sontag. 2016. Learning representations for counterfactual inference. In Proceedings of the International Conference on Machine Learning (ICML’16). 3020--3029.Google Scholar
Markus Kalisch and Peter Bühlmann. 2007. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, Mar (2007), 613--636.Google Scholar
Nathan Kallus, Aahlad Manas Puli, and Uri Shalit. 2018. Removing hidden confounding by experimental grounding. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10888--10897.Google Scholar
Hyunseung Kang, Anru Zhang, T. Tony Cai, and Dylan S. Small. 2016. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Am. Stat. Assoc. 111, 513 (2016), 132--144.Google ScholarCross Ref
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
Murat Kocaoglu, Alex Dimakis, and Sriram Vishwanath. 2017. Cost-optimal learning of causal graphs. In Proceedings of the International Conference on Machine Learning (ICML’17). 1875--1884.Google Scholar
Kun Kuang, Peng Cui, Susan Athey, Ruoxuan Xiong, and Bo Li. 2018. Stable prediction across unknown environments. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 1617--1626.Google ScholarDigital Library
Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 265--274.Google ScholarDigital Library
Matt J. Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 4066--4076.Google Scholar
Robert J. LaLonde. 1986. Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. (1986), 604--620.Google Scholar
Finnian Lattimore, Tor Lattimore, and Mark D. Reid. 2016. Causal bandits: Learning good interventions via causal inference. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’16). 1181--1189.Google Scholar
Thuc Duy Le, Tao Hoang, Jiuyong Li, Lin Liu, and Huawen Liu. 2015. A fast PC algorithm for high dimensional causal discovery with multi-core PCs. arXiv preprint arXiv:1502.02454 (2015).Google Scholar
Thuc Duy Le, Lin Liu, Anna Tsykin, Gregory J. Goodall, Bing Liu, Bing-Yu Sun, and Jiuyong Li. 2013. Inferring microRNA--mRNA causal regulatory relationships from expression data. Bioinformatics 29, 6 (2013), 765--771.Google ScholarDigital Library
Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’90). 396--404.Google Scholar
Jason D. Lee and Trevor J. Hastie. 2015. Learning the structure of mixed graphical models. J. Comput. Graph. Stat. 24, 1 (2015), 230--253.Google ScholarCross Ref
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Comput. Surv. 50, 6 (2017), 94.Google ScholarDigital Library
Jundong Li, Ruocheng Guo, Chenghao Liu, and Huan Liu. 2019. Adaptive unsupervised feature selection on attributed networks. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 92--100.Google ScholarDigital Library
Jundong Li, Osmar R. Zaïane, and Alvaro Osornio-Vargas. 2014. Discovering statistically significant co-location rules in datasets with extended spatial objects. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery (DaWaK’14). 124--135.Google ScholarCross Ref
Yichuan Li, Ruocheng Guo, Weiying Wang, and Huan Liu. 2019. Causal learning in question quality improvement. In Proceedings of the BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’20).Google Scholar
David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. 2017. Discovering causal signals in images. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarCross Ref
Christos Louizos, Uri Shalit, Joris M. Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal effect inference with deep latent-variable models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’17). 6446--6456.Google Scholar
Jared K. Lunceford and Marie Davidian. 2004. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med. 23, 19 (2004), 2937--2960.Google ScholarCross Ref
Miguel Hernan and James M. Robins. forthcoming. Causal Inference. CRC Boca Raton, FL.Google Scholar
Daniel Malinsky and David Danks. 2018. Causal discovery algorithms: A practical guide. Philos. Compass 13, 1 (2018), e12470.Google ScholarCross Ref
Subramani Mani and Gregory F. Cooper. 2000. Causal discovery from medical textual data. In Proceedings of the AMIA Symposium. 542.Google Scholar
Ericsson Marin, Ruocheng Guo, and Paulo Shakarian. 2017. Temporal analysis of influence to predict usersâ adoption in online social networks. In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, 254--261.Google ScholarCross Ref
Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the Conference on the International Speech Comunication Association (INTERSPEECH’10).Google ScholarCross Ref
Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.Google ScholarDigital Library
Raha Moraffah, Mansooreh Karami, Ruocheng Guo, Adrienne Ragliny, and Huan Liu. 2020. Causal interpretability for machine learning--problems, methods and evaluation. arXiv preprint arXiv:2003.03934 (2020).Google Scholar
Stephen L. Morgan and Christopher Winship. 2015. Counterfactuals and Causal Inference. Cambridge University Press.Google Scholar
Preetam Nandy, Alain Hauser, Marloes H. Maathuis, et al. 2018. High-dimensional consistency in score-based and hybrid structure learning. Ann. Stat. 46, 6A (2018), 3151--3183.Google ScholarCross Ref
Jersey Neyman. 1923. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczn. Nauk Rolniczych 10 (1923), 1--51.Google Scholar
Cross-Disorder Group of the Psychiatric Genomics Consortium et al. 2013. Identification of risk loci with shared effects on five major psychiatric disorders: A genome-wide analysis. The Lancet 381, 9875 (2013), 1371--1379.Google Scholar
Michael J. Paul. 2017. Feature selection as causal inference: Experiments with text classification. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL’17). 163--172.Google ScholarCross Ref
Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.Google ScholarCross Ref
Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics Surveys 3 (2009), 96--146.Google ScholarCross Ref
Judea Pearl. 2009. Causality. Cambridge University Press.Google Scholar
Judea Pearl. 2018. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016 (2018).Google Scholar
Judea Pearl and Elias Bareinboim. 2011. Transportability of causal and statistical relations: A formal approach. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarCross Ref
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B Stat. Methodol. 78, 5 (2016), 947--1012.Google ScholarCross Ref
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2013. Causal inference on time series using restricted structural equation models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’13). 154--162.Google Scholar
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.Google ScholarDigital Library
Thai T. Pham and Yuanyuan Shen. 2017. A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform. arXiv preprint arXiv:1706.02795 (2017).Google Scholar
Vineet K. Raghu, Allen Poon, and Panayiotis V. Benos. 2018. Evaluation of causal structure learning methods on mixed data types. Proc. Mach. Learn. Res. 92 (2018), 48.Google Scholar
Vineeth Rakesh, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, and Huan Liu. 2018. Linked causal variational autoencoder for inferring paired spillover effects. In Proceedings of the Conference on Information and Knowledge Management (CIKM’18). ACM, 1679--1682.Google ScholarDigital Library
Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.Google ScholarCross Ref
Joseph D. Ramsey. 2014. A scalable conditional independence test for nonlinear, non-Gaussian data. arXiv preprint arXiv:1401.5031 (2014).Google Scholar
Thomas Richardson. 1996. A discovery algorithm for directed cyclic graphs. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’96). Morgan Kaufmann Publishers Inc., 454--461.Google Scholar
Thomas S. Richardson and James M. Robins. [n.d.]. Single world intervention graphs: A primer. ([n.d.]).Google Scholar
James M. Robins, Miguel Ángel Hernán, and Babette Brumback. 2000. Marginal structural models and causal inference in epidemiology. Epidemiology (2000), 550--560.Google ScholarCross Ref
Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. 2015. A causal perspective on domain adaptation. Stat 1050 (2015), 19.Google Scholar
Teemu Roos, Tomi Silander, Petri Kontkanen, and Petri Myllymaki. 2008. Bayesian network structure learning using factorized NML universal models. In Proceedings of the Information Theory and Applications Workshop (ITA Workshop’08). 272--276.Google ScholarCross Ref
Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41--55.Google ScholarCross Ref
Donald B. Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies.J. Educ. Psychol. 66, 5 (1974), 688.Google ScholarCross Ref
Soumajyoti Sarkar, Ruocheng Guo, and Paulo Shakarian. 2019. Using network motifs to characterize temporal network evolution leading to diffusion inhibition. Soc. Netw. Anal. Min. 9, 1 (2019), 14.Google ScholarCross Ref
Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471 (2012).Google Scholar
Gideon Schwarz et al. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.Google ScholarCross Ref
Andrew J. Sedgewick, Ivy Shi, Rory M. Donovan, and Panayiotis V. Benos. 2016. Learning mixed graphical models with separate sparsity parameters and stability-based model selection. BMC Bioinform. 17, 5 (2016), S175.Google ScholarCross Ref
Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. (2013), 2263--2291.Google Scholar
Eleni Sgouritsa, Dominik Janzing, Philipp Hennig, and Bernhard Schölkopf. 2015. Inference of cause and effect with unsupervised inverse regression. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’15). 847--855.Google Scholar
Uri Shalit, Fredrik D. Johansson, and David Sontag. 2017. Estimating individual treatment effect: Generalization bounds and algorithms. In Proceedings of the International Conference on Machine Learning (ICML’17). 3076--3085.Google Scholar
Zheyan Shen, Peng Cui, Kun Kuang, and Bo Li. 2017. On image classification: Correlation vs causality. arXiv preprint arXiv:1708.06656 (2017).Google Scholar
Shohei Shimizu. 2014. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika 41, 1 (2014), 65--98.Google ScholarCross Ref
Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct (2006), 2003--2030.Google Scholar
Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. 2011. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. J. Mach. Learn. Res. 12 (2011), 1225--1248.Google ScholarDigital Library
Ricardo Silva. 2016. Observational-interventional priors for dose-response learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’16). 1561--1569.Google Scholar
Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. MIT Press.Google Scholar
Peter Spirtes, Christopher Meek, and Thomas Richardson. 1995. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’95). 499--506.Google Scholar
Peter Spirtes and Kun Zhang. 2016. Causal discovery and inference: Concepts and recent methodological advances. In Applied Informatics, Vol. 3. 3.Google ScholarCross Ref
Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarDigital Library
Matt Taddy, Hedibert Freitas Lopes, and Matt Gardner. 2016. Scalable semiparametric inference for the means of heavy-tailed distributions. arXiv preprint arXiv:1602.08066 (2016).Google Scholar
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.Google ScholarCross Ref
Michalis Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian process latent variable model. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). 844--851.Google Scholar
Panos Toulis, Alexander Volfovsky, and Edoardo M. Airoldi. 2018. Propensity score methodology in the presence of network entanglement between treatments *. arXiv preprint arXiv:1801.07310 (2018).Google Scholar
Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.Google Scholar
Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.Google ScholarDigital Library
Mark J. Van Der Laan and Daniel Rubin. 2006. Targeted maximum likelihood learning. Int. J. Biostat. 2, 1 (2006).Google ScholarCross Ref
Tyler J. VanderWeele. 2011. Principal stratification--uses and limitations. Int. J. Biostat. 7, 1 (2011), 1--14.Google ScholarCross Ref
Stefan Wager and Susan Athey. 2017. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. just-accepted (2017).Google Scholar
Yuhao Wang, Liam Solus, Karren Yang, and Caroline Uhler. 2017. Permutation-based causal inference algorithms with interventions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’17). 5822--5831.Google Scholar
Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 1 (1996), 69--101.Google ScholarCross Ref
Man Leung Wong, Shing Yan Lee, and Kwong Sak Leung. 2002. A hybrid approach to discover Bayesian networks from databases using evolutionary programming. In Proceedings of the IEEE International Confernece on Data Mining (ICDM’02). IEEE, 498--505.Google Scholar
Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, and Ken-ichi Kawarabayashi. 2018. Causal bandits with propagating inference. arXiv preprint arXiv:1806.02252 (2018).Google Scholar
Xuan Yin and Liangjie Hong. 2019. The identification and estimation of direct and indirect effects in A/B tests through causal mediation analysis. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM.Google ScholarDigital Library
Junzhe Zhang and Elias Bareinboim. 2017. Transfer learning in multi-armed bandit: A causal approach. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AMMAS’17). 1778--1780.Google ScholarCross Ref
Kun Zhang and Aapo Hyvärinen. 2009. On the identifiability of the post-nonlinear causal model. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’09). 647--655.Google Scholar
Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775 (2012).Google Scholar
Dawei Zhou, Jingrui He, Hongxia Yang, and Wei Fan. 2018. Sparc: Self-paced network representation for few-shot rare category characterization. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 2807--2816.Google ScholarDigital Library

Index Terms

A Survey of Learning Causality with Data: Problems and Methods
1. Computing methodologies
  1. Artificial intelligence
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Disentangling causality: assumptions in causal discovery and inference
Abstract
Causality has been a burgeoning field of research leading to the point where the literature abounds with different components addressing distinct parts of causality. For researchers, it has been increasingly difficult to discern the assumptions ...
Read More
Open problems in causal structure learning: A case study of COVID-19 in the UK
Abstract
Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation provided by these algorithms enables transparency and explainability, which is necessary ...
Read More
Causal Inference and Causal Machine Learning with Practical Applications: The paper highlights the concepts of Causal Inference and Causal ML along with different implementation techniques
CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

One of the most important research areas in Machine Learning is to build prescriptive models. This requires understanding and measurement of the causal impact of any proposed treatment, followed by designing optimal strategy based on such causal ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 53, Issue 4
July 2021
831 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3410467
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 July 2020
- Online AM: 7 May 2020
- Accepted: 1 April 2020
- Revised: 1 January 2020
- Received: 1 September 2018
Published in csur Volume 53, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Causal machine learning
causal discovery
causal inference
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 10,693
  Total Downloads
- Downloads (Last 12 months)2,741
- Downloads (Last 6 weeks)290
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Survey of Learning Causality with Data: Problems and Methods

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Disentangling causality: assumptions in causal discovery and inference

Open problems in causal structure learning: A case study of COVID-19 in the UK

Causal Inference and Causal Machine Learning with Practical Applications: The paper highlights the concepts of Causal Inference and Causal ML along with different implementation techniques