Abstract
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from—or the same as—the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
- Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688.Google ScholarDigital Library
- Dionissi Aliprantis. 2015. A distinction between causal effects in structural and rubin causal models. (2015).Google Scholar
- Michael Anderson and Jeremy Magruder. 2012. Learning from the cloud: Regression discontinuity estimates of the effects of an online review database. Eng. J. 122 (Oct. 2012), 957--989.Google Scholar
- Bryan Andrews, Joseph Ramsey, and Gregory F. Cooper. 2019. Learning high-dimensional directed acyclic graphs with mixed data-types. Proc. Mach. Learn. Res. 104 (2019).Google Scholar
- Joshua D. Angrist and Guido W. Imbens. 1995. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J. Am. Stat. Assoc. 90, 430 (1995), 431--442.Google ScholarCross Ref
- Joshua D. Angrist, Guido W. Imbens, and Donald B. Rubin. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91, 434 (1996), 444--455.Google ScholarCross Ref
- Joshua D. Angrist and Victor Lavy. 1999. Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114, 2 (1999), 533--575.Google ScholarCross Ref
- Sinan Aral and Christos Nicolaides. 2017. Exercise contagion in a global social network. Nat. Commun. 8 (2017), 14753.Google ScholarCross Ref
- Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019).Google Scholar
- Susan Athey and Guido W. Imbens. 2015. Machine learning methods for estimating heterogeneous causal effects. Stat 1050, 5 (2015).Google Scholar
- Susan Athey, Guido W. Imbens, and Stefan Wager. 2018. Approximate residual balancing: Debiased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B 80, 4 (2018), 597--623.Google ScholarCross Ref
- Peter C. Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46, 3 (2011), 399--424.Google ScholarCross Ref
- Davide Bacciu, Terence A. Etchells, Paulo J. G. Lisboa, and Joe Whittaker. 2013. Efficient identification of independence networks using mutual information. Comput. Stat. 28, 2 (2013), 621--646.Google ScholarDigital Library
- Mohammad Taha Bahadori, Krzysztof Chalupka, Edward Choi, Robert Chen, Walter F. Stewart, and Jimeng Sun. 2017. Causal regularization. arXiv preprint arXiv:1702.02604 (2017).Google Scholar
- Eytan Bakshy, Dean Eckles, and Michael S. Bernstein. 2014. Designing and deploying online field experiments. In Proceedings of the Annual Conference on the World Wide Web (WWW’14). ACM, 283--292.Google Scholar
- Elias Bareinboim, Andrew Forney, and Judea Pearl. 2015. Bandits with unobserved confounders: A causal approach. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’15). 1342--1350.Google Scholar
- Elias Bareinboim and Judea Pearl. 2012. Transportability of causal effects: Completeness results. In Proceedings of the 26th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Elias Bareinboim and Jin Tian. 2015. Recovering causal effects from selection bias. In Proceedings of the AAAI Conference on Artifial Intelligence (AAAI’15). 3475--3481.Google Scholar
- John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’06). 120--128.Google ScholarCross Ref
- Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. J. Mach. Learn. Res. 14, 1 (2013), 3207--3260.Google ScholarDigital Library
- Donald T. Campbell. 1969. Reforms as experiments. Am. Psychol. 24, 4 (1969), 409.Google ScholarCross Ref
- Christopher Carpenter and Carlos Dobkin. 2009. The effect of alcohol consumption on mortality: Regression discontinuity evidence from the minimum drinking age. American Economic Journal: Applied Economics 1, 1 (2009), 164--82.Google ScholarCross Ref
- Matias D. Cattaneo, Nicolás Idrobo, and Rocío Titiunik. 2017. A practical introduction to regression discontinuity designs. Cambridge Elements: Quantitative and Computational Methods for Social Science, Cambridge University Press (2017).Google ScholarCross Ref
- Lu Cheng, Raha Moraffah, Ruocheng Guo, K. S. Candan, Adrienne Raglin, and Huan Liu. 2019. A practical data repository for causal learning with big data. In Proceedings of the BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’19).Google Scholar
- David Maxwell Chickering. 1996. Learning Bayesian networks is NP-complete. In Learning from Data. Springer, 121--130.Google Scholar
- David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov (2002), 507--554.Google Scholar
- David M. Chickering, Dan Geiger, David Heckerman, et al. 1994. Learning Bayesian Networks Is NP-hard. Technical Report. Citeseer.Google Scholar
- Hugh A. Chipman, Edward I. George, Robert E. McCulloch, et al. 2010. BART: Bayesian additive regression trees. Ann. Appl. Stat. 4, 1 (2010), 266--298.Google ScholarCross Ref
- Tianjiao Chu and Clark Glymour. 2008. Search for additive nonlinear time series causal models. J. Mach. Learn. Res. 9, May (2008), 967--991.Google Scholar
- Diego Colombo and Marloes H. Maathuis. 2014. Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 15, 1 (2014), 3741--3782.Google ScholarDigital Library
- Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. (2012), 294--321.Google Scholar
- Thomas D. Cook, Donald Thomas Campbell, and William Shadish. 2002. Experimental and Quasi-experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston.Google Scholar
- Ruifei Cui, Perry Groot, and Tom Heskes. 2016. Copula PC algorithm for causal discovery from mixed data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’16). Springer, 377--392.Google ScholarCross Ref
- Hal Daumé III. 2009. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815 (2009).Google Scholar
- Martijn de Jongh and Marek J. Druzdzel. 2009. A comparison of structural distance measures for causal Bayesian network models. In Recent Advances in Intelligent Information Systems, Challenging Problems of Science, Computer Science Series (2009), 443--456.Google Scholar
- Rajeev H. Dehejia and Sadek Wahba. 1999. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. J. Am. Stat. Assoc. 94, 448 (1999), 1053--1062.Google ScholarCross Ref
- Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. 2019. Deep anomaly detection on attributed networks. In Proceedings of the SIAM International Conference on Data Mining (SDM’19). SIAM, 594--602.Google ScholarCross Ref
- Imme Ebert-Uphoff and Yi Deng. 2012. Causal discovery for climate research using graphical models. J. Clim. 25, 17 (2012), 5648--5665.Google ScholarCross Ref
- Andrew C. Eggers, Ronny Freier, Veronica Grembi, and Tommaso Nannicini. 2018. Regression discontinuity designs based on population thresholds: Pitfalls and solutions. Am. J. Pol. Sci. 62, 1 (2018), 210--229.Google ScholarCross Ref
- Michael Eichler. 2012. Causal inference in time series analysis. Causality: Statistical Perspectives and Applications (2012), 327--354.Google ScholarCross Ref
- Doris Entner, Patrik Hoyer, and Peter Spirtes. 2013. Data-driven covariate selection for nonparametric estimation of causal effects. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’13). 256--264.Google Scholar
- Doris Entner and Patrik O. Hoyer. 2010. On causal discovery from time series data using FCI. In Proceedings of the International Conference on Probalilistic Graphical Models (PGM’10). 121--128.Google Scholar
- Andrew Forney, Judea Pearl, and Elias Bareinboim. 2017. Counterfactual data-fusion for online reinforcement learners. In Proceedings of the International Conference on Machine Learning (ICML’17). 1156--1164.Google Scholar
- Constantine E. Frangakis and Donald B. Rubin. 2002. Principal stratification in causal inference. Biometrics 58, 1 (2002), 21--29.Google ScholarCross Ref
- Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, and Bernhard Schölkopf. 2008. Kernel measures of conditional dependence. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’08). 489--496.Google Scholar
- Michele Jonsson Funk, Daniel Westreich, Chris Wiesen, Til Stürmer, M. Alan Brookhart, and Marie Davidian. 2011. Doubly robust estimation of causal effects. Am. J. Epidemiol. 173, 7 (2011), 761--767.Google ScholarCross Ref
- Bin Gao and Yuehua Cui. 2015. Learning directed acyclic graphical structures with genetical genomics data. Bioinformatics 31, 24 (2015), 3953--3960.Google Scholar
- Andrew Gelman. 2011. Causality and statistical learning. Am. J. Sociol. 117, 3 (2011), 955--966.Google ScholarCross Ref
- Andrew Gelman and Guido Imbens. 2019. Why high-order polynomials should not be used in regression discontinuity designs. Journal of Business 8 Economic Statistics 37, 3 (2019), 447--456.Google ScholarCross Ref
- Mingming Gong, Kun Zhang, Bernhard Schölkopf, Clark Glymour, and Dacheng Tao. 2017. Causal discovery from temporally aggregated time series. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’17).Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’14). 2672--2680.Google Scholar
- Xing Sam Gu and Paul R Rosenbaum. 1993. Comparison of multivariate matching methods: Structures, distances, and algorithms. J. Comput. Graph Stat. 2, 4 (1993), 405--420.Google Scholar
- Ruocheng Guo, Jundong Li, and Huan Liu. 2018. INITIATOR: Noise-contrastive estimation for marked temporal point process. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’18). 2191--2197.Google ScholarCross Ref
- Ruocheng Guo, Jundong Li, and Huan Liu. 2020. Counterfactual evaluation of treatment assignment functions with networked observational data. In Proceedings of the SIAM International Conference on Data Mining (SDM’20). SIAM, 271--279.Google ScholarCross Ref
- Ruocheng Guo, Jundong Li, and Huan Liu. 2020. Learning individual causal effects from networked observational data. In Proceedings of the ACM International Web Search and Data Mining Conference (WSDM’20). ACM, 232--240.Google ScholarDigital Library
- Ruocheng Guo, Yichuan Li, Jundong Li, K. Selçuk Candan, Adrienne Raglin, and Huan Liu. 2020. IGNITE: A minimax game toward learning individual treatment effects from networked observational data. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’20).Google ScholarCross Ref
- P. Richard Hahn, Carlos M. Carvalho, David Puelz, Jingyu He, et al. 2018. Regularization and confounding in linear regression for treatment effect estimation. Bayes. Anal. 13, 1 (2018), 163--182.Google ScholarCross Ref
- P. Richard Hahn, Jared S. Murray, and Carlos Carvalho. 2017. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. arXiv preprint arXiv:1706.09523 (2017).Google Scholar
- Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Pol. Anal. 20, 1 (2012), 25--46.Google ScholarCross Ref
- Alain Hauser and Peter Bühlmann. 2015. Jointly interventional and observational data: Estimation of interventional Markov equivalence classes of directed acyclic graphs. J. R. Stat. Soc. Ser. B 77, 1 (2015), 291--318.Google ScholarCross Ref
- David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.Google ScholarCross Ref
- David Heckerman, Christopher Meek, and Gregory Cooper. 2006. A Bayesian approach to causal discovery. In Innovations in Machine Learning. Springer, 1--28.Google Scholar
- Miguel Ángel Hernán, Babette Brumback, and James M. Robins. 2000. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology (2000), 561--570.Google Scholar
- Jennifer L. Hill. 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph Stat. 20, 1 (2011), 217--240.Google ScholarCross Ref
- Keisuke Hirano, Guido W. Imbens, and Geert Ridder. 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 4 (2003), 1161--1189.Google ScholarCross Ref
- Paul W. Holland. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81, 396 (1986), 945--960.Google ScholarCross Ref
- Patrik O. Hoyer, Aapo Hyvarinen, Richard Scheines, Peter L. Spirtes, Joseph Ramsey, Gustavo Lacerda, and Shohei Shimizu. 2012. Causal discovery of linear acyclic models with arbitrary distributions. arXiv preprint arXiv:1206.3260 (2012).Google Scholar
- Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’09). 689--696.Google Scholar
- Aapo Hyvärinen and Erkki Oja. 2000. Independent component analysis: Algorithms and applications. Neur. Netw. 13, 4-5 (2000), 411--430.Google ScholarDigital Library
- Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, and Patrik O. Hoyer. 2010. Estimation of a structural vector autoregression model using non-gaussianity. J. Mach. Learn. Res. 11, 5 (2010), 1709--1731.Google ScholarDigital Library
- Kosuke Imai and Marc Ratkovic. 2014. Covariate balancing propensity score. J. R. Stat. Soc. Ser. B 76, 1 (2014), 243--263.Google ScholarCross Ref
- Kosuke Imai, Marc Ratkovic, et al. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7, 1 (2013), 443--470.Google ScholarCross Ref
- Guido W. Imbens. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 86, 1 (2004), 4--29.Google ScholarCross Ref
- Dominik Janzing and Bernhard Schölkopf. 2015. Semi-supervised interpolation in an anticausal learning scenario. J. Mach. Learn. Res. 16, 1 (2015), 1923--1948.Google ScholarDigital Library
- Marshall M. Joffe, Thomas R. Ten Have, Harold I. Feldman, and Stephen E. Kimmel. 2004. Model selection, confounder control, and marginal structural models: Review and new applications. Am. Stat. 58, 4 (2004), 272--279.Google ScholarCross Ref
- Fredrik Johansson, Uri Shalit, and David Sontag. 2016. Learning representations for counterfactual inference. In Proceedings of the International Conference on Machine Learning (ICML’16). 3020--3029.Google Scholar
- Markus Kalisch and Peter Bühlmann. 2007. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, Mar (2007), 613--636.Google Scholar
- Nathan Kallus, Aahlad Manas Puli, and Uri Shalit. 2018. Removing hidden confounding by experimental grounding. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10888--10897.Google Scholar
- Hyunseung Kang, Anru Zhang, T. Tony Cai, and Dylan S. Small. 2016. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Am. Stat. Assoc. 111, 513 (2016), 132--144.Google ScholarCross Ref
- Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Murat Kocaoglu, Alex Dimakis, and Sriram Vishwanath. 2017. Cost-optimal learning of causal graphs. In Proceedings of the International Conference on Machine Learning (ICML’17). 1875--1884.Google Scholar
- Kun Kuang, Peng Cui, Susan Athey, Ruoxuan Xiong, and Bo Li. 2018. Stable prediction across unknown environments. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 1617--1626.Google ScholarDigital Library
- Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017. Estimating treatment effect in the wild via differentiated confounder balancing. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 265--274.Google ScholarDigital Library
- Matt J. Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 4066--4076.Google Scholar
- Robert J. LaLonde. 1986. Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. (1986), 604--620.Google Scholar
- Finnian Lattimore, Tor Lattimore, and Mark D. Reid. 2016. Causal bandits: Learning good interventions via causal inference. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’16). 1181--1189.Google Scholar
- Thuc Duy Le, Tao Hoang, Jiuyong Li, Lin Liu, and Huawen Liu. 2015. A fast PC algorithm for high dimensional causal discovery with multi-core PCs. arXiv preprint arXiv:1502.02454 (2015).Google Scholar
- Thuc Duy Le, Lin Liu, Anna Tsykin, Gregory J. Goodall, Bing Liu, Bing-Yu Sun, and Jiuyong Li. 2013. Inferring microRNA--mRNA causal regulatory relationships from expression data. Bioinformatics 29, 6 (2013), 765--771.Google ScholarDigital Library
- Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’90). 396--404.Google Scholar
- Jason D. Lee and Trevor J. Hastie. 2015. Learning the structure of mixed graphical models. J. Comput. Graph. Stat. 24, 1 (2015), 230--253.Google ScholarCross Ref
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Comput. Surv. 50, 6 (2017), 94.Google ScholarDigital Library
- Jundong Li, Ruocheng Guo, Chenghao Liu, and Huan Liu. 2019. Adaptive unsupervised feature selection on attributed networks. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 92--100.Google ScholarDigital Library
- Jundong Li, Osmar R. Zaïane, and Alvaro Osornio-Vargas. 2014. Discovering statistically significant co-location rules in datasets with extended spatial objects. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery (DaWaK’14). 124--135.Google ScholarCross Ref
- Yichuan Li, Ruocheng Guo, Weiying Wang, and Huan Liu. 2019. Causal learning in question quality improvement. In Proceedings of the BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench’20).Google Scholar
- David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. 2017. Discovering causal signals in images. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarCross Ref
- Christos Louizos, Uri Shalit, Joris M. Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal effect inference with deep latent-variable models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’17). 6446--6456.Google Scholar
- Jared K. Lunceford and Marie Davidian. 2004. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med. 23, 19 (2004), 2937--2960.Google ScholarCross Ref
- Miguel Hernan and James M. Robins. forthcoming. Causal Inference. CRC Boca Raton, FL.Google Scholar
- Daniel Malinsky and David Danks. 2018. Causal discovery algorithms: A practical guide. Philos. Compass 13, 1 (2018), e12470.Google ScholarCross Ref
- Subramani Mani and Gregory F. Cooper. 2000. Causal discovery from medical textual data. In Proceedings of the AMIA Symposium. 542.Google Scholar
- Ericsson Marin, Ruocheng Guo, and Paulo Shakarian. 2017. Temporal analysis of influence to predict usersâ adoption in online social networks. In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, 254--261.Google ScholarCross Ref
- Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the Conference on the International Speech Comunication Association (INTERSPEECH’10).Google ScholarCross Ref
- Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.Google ScholarDigital Library
- Raha Moraffah, Mansooreh Karami, Ruocheng Guo, Adrienne Ragliny, and Huan Liu. 2020. Causal interpretability for machine learning--problems, methods and evaluation. arXiv preprint arXiv:2003.03934 (2020).Google Scholar
- Stephen L. Morgan and Christopher Winship. 2015. Counterfactuals and Causal Inference. Cambridge University Press.Google Scholar
- Preetam Nandy, Alain Hauser, Marloes H. Maathuis, et al. 2018. High-dimensional consistency in score-based and hybrid structure learning. Ann. Stat. 46, 6A (2018), 3151--3183.Google ScholarCross Ref
- Jersey Neyman. 1923. Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczn. Nauk Rolniczych 10 (1923), 1--51.Google Scholar
- Cross-Disorder Group of the Psychiatric Genomics Consortium et al. 2013. Identification of risk loci with shared effects on five major psychiatric disorders: A genome-wide analysis. The Lancet 381, 9875 (2013), 1371--1379.Google Scholar
- Michael J. Paul. 2017. Feature selection as causal inference: Experiments with text classification. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL’17). 163--172.Google ScholarCross Ref
- Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.Google ScholarCross Ref
- Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics Surveys 3 (2009), 96--146.Google ScholarCross Ref
- Judea Pearl. 2009. Causality. Cambridge University Press.Google Scholar
- Judea Pearl. 2018. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016 (2018).Google Scholar
- Judea Pearl and Elias Bareinboim. 2011. Transportability of causal and statistical relations: A formal approach. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarCross Ref
- Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B Stat. Methodol. 78, 5 (2016), 947--1012.Google ScholarCross Ref
- Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2013. Causal inference on time series using restricted structural equation models. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’13). 154--162.Google Scholar
- Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.Google ScholarDigital Library
- Thai T. Pham and Yuanyuan Shen. 2017. A deep causal inference approach to measuring the effects of forming group loans in online non-profit microfinance platform. arXiv preprint arXiv:1706.02795 (2017).Google Scholar
- Vineet K. Raghu, Allen Poon, and Panayiotis V. Benos. 2018. Evaluation of causal structure learning methods on mixed data types. Proc. Mach. Learn. Res. 92 (2018), 48.Google Scholar
- Vineeth Rakesh, Ruocheng Guo, Raha Moraffah, Nitin Agarwal, and Huan Liu. 2018. Linked causal variational autoencoder for inferring paired spillover effects. In Proceedings of the Conference on Information and Knowledge Management (CIKM’18). ACM, 1679--1682.Google ScholarDigital Library
- Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.Google ScholarCross Ref
- Joseph D. Ramsey. 2014. A scalable conditional independence test for nonlinear, non-Gaussian data. arXiv preprint arXiv:1401.5031 (2014).Google Scholar
- Thomas Richardson. 1996. A discovery algorithm for directed cyclic graphs. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’96). Morgan Kaufmann Publishers Inc., 454--461.Google Scholar
- Thomas S. Richardson and James M. Robins. [n.d.]. Single world intervention graphs: A primer. ([n.d.]).Google Scholar
- James M. Robins, Miguel Ángel Hernán, and Babette Brumback. 2000. Marginal structural models and causal inference in epidemiology. Epidemiology (2000), 550--560.Google ScholarCross Ref
- Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. 2015. A causal perspective on domain adaptation. Stat 1050 (2015), 19.Google Scholar
- Teemu Roos, Tomi Silander, Petri Kontkanen, and Petri Myllymaki. 2008. Bayesian network structure learning using factorized NML universal models. In Proceedings of the Information Theory and Applications Workshop (ITA Workshop’08). 272--276.Google ScholarCross Ref
- Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41--55.Google ScholarCross Ref
- Donald B. Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies.J. Educ. Psychol. 66, 5 (1974), 688.Google ScholarCross Ref
- Soumajyoti Sarkar, Ruocheng Guo, and Paulo Shakarian. 2019. Using network motifs to characterize temporal network evolution leading to diffusion inhibition. Soc. Netw. Anal. Min. 9, 1 (2019), 14.Google ScholarCross Ref
- Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471 (2012).Google Scholar
- Gideon Schwarz et al. 1978. Estimating the dimension of a model. Ann. Stat. 6, 2 (1978), 461--464.Google ScholarCross Ref
- Andrew J. Sedgewick, Ivy Shi, Rory M. Donovan, and Panayiotis V. Benos. 2016. Learning mixed graphical models with separate sparsity parameters and stability-based model selection. BMC Bioinform. 17, 5 (2016), S175.Google ScholarCross Ref
- Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. (2013), 2263--2291.Google Scholar
- Eleni Sgouritsa, Dominik Janzing, Philipp Hennig, and Bernhard Schölkopf. 2015. Inference of cause and effect with unsupervised inverse regression. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’15). 847--855.Google Scholar
- Uri Shalit, Fredrik D. Johansson, and David Sontag. 2017. Estimating individual treatment effect: Generalization bounds and algorithms. In Proceedings of the International Conference on Machine Learning (ICML’17). 3076--3085.Google Scholar
- Zheyan Shen, Peng Cui, Kun Kuang, and Bo Li. 2017. On image classification: Correlation vs causality. arXiv preprint arXiv:1708.06656 (2017).Google Scholar
- Shohei Shimizu. 2014. LiNGAM: Non-Gaussian methods for estimating causal structures. Behaviormetrika 41, 1 (2014), 65--98.Google ScholarCross Ref
- Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct (2006), 2003--2030.Google Scholar
- Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. 2011. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. J. Mach. Learn. Res. 12 (2011), 1225--1248.Google ScholarDigital Library
- Ricardo Silva. 2016. Observational-interventional priors for dose-response learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’16). 1561--1569.Google Scholar
- Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. MIT Press.Google Scholar
- Peter Spirtes, Christopher Meek, and Thomas Richardson. 1995. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’95). 499--506.Google Scholar
- Peter Spirtes and Kun Zhang. 2016. Causal discovery and inference: Concepts and recent methodological advances. In Applied Informatics, Vol. 3. 3.Google ScholarCross Ref
- Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarDigital Library
- Matt Taddy, Hedibert Freitas Lopes, and Matt Gardner. 2016. Scalable semiparametric inference for the means of heavy-tailed distributions. arXiv preprint arXiv:1602.08066 (2016).Google Scholar
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.Google ScholarCross Ref
- Michalis Titsias and Neil D. Lawrence. 2010. Bayesian Gaussian process latent variable model. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). 844--851.Google Scholar
- Panos Toulis, Alexander Volfovsky, and Edoardo M. Airoldi. 2018. Propensity score methodology in the presence of network entanglement between treatments *. arXiv preprint arXiv:1801.07310 (2018).Google Scholar
- Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.Google Scholar
- Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.Google ScholarDigital Library
- Mark J. Van Der Laan and Daniel Rubin. 2006. Targeted maximum likelihood learning. Int. J. Biostat. 2, 1 (2006).Google ScholarCross Ref
- Tyler J. VanderWeele. 2011. Principal stratification--uses and limitations. Int. J. Biostat. 7, 1 (2011), 1--14.Google ScholarCross Ref
- Stefan Wager and Susan Athey. 2017. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. just-accepted (2017).Google Scholar
- Yuhao Wang, Liam Solus, Karren Yang, and Caroline Uhler. 2017. Permutation-based causal inference algorithms with interventions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’17). 5822--5831.Google Scholar
- Gerhard Widmer and Miroslav Kubat. 1996. Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 1 (1996), 69--101.Google ScholarCross Ref
- Man Leung Wong, Shing Yan Lee, and Kwong Sak Leung. 2002. A hybrid approach to discover Bayesian networks from databases using evolutionary programming. In Proceedings of the IEEE International Confernece on Data Mining (ICDM’02). IEEE, 498--505.Google Scholar
- Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, and Ken-ichi Kawarabayashi. 2018. Causal bandits with propagating inference. arXiv preprint arXiv:1806.02252 (2018).Google Scholar
- Xuan Yin and Liangjie Hong. 2019. The identification and estimation of direct and indirect effects in A/B tests through causal mediation analysis. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM.Google ScholarDigital Library
- Junzhe Zhang and Elias Bareinboim. 2017. Transfer learning in multi-armed bandit: A causal approach. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AMMAS’17). 1778--1780.Google ScholarCross Ref
- Kun Zhang and Aapo Hyvärinen. 2009. On the identifiability of the post-nonlinear causal model. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’09). 647--655.Google Scholar
- Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775 (2012).Google Scholar
- Dawei Zhou, Jingrui He, Hongxia Yang, and Wei Fan. 2018. Sparc: Self-paced network representation for few-shot rare category characterization. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD’18). ACM, 2807--2816.Google ScholarDigital Library
Index Terms
- A Survey of Learning Causality with Data: Problems and Methods
Recommendations
Disentangling causality: assumptions in causal discovery and inference
AbstractCausality has been a burgeoning field of research leading to the point where the literature abounds with different components addressing distinct parts of causality. For researchers, it has been increasingly difficult to discern the assumptions ...
Open problems in causal structure learning: A case study of COVID-19 in the UK
AbstractCausal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation provided by these algorithms enables transparency and explainability, which is necessary ...
Causal Inference and Causal Machine Learning with Practical Applications: The paper highlights the concepts of Causal Inference and Causal ML along with different implementation techniques
CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)One of the most important research areas in Machine Learning is to build prescriptive models. This requires understanding and measurement of the causal impact of any proposed treatment, followed by designing optimal strategy based on such causal ...
Comments