Elsevier

Cognition

Volume 179, October 2018, Pages 266-297
Cognition

Successful structure learning from observational data

https://doi.org/10.1016/j.cognition.2018.06.003Get rights and content

Abstract

Previous work suggests that humans find it difficult to learn the structure of causal systems given observational data alone. We identify two conditions that enable successful structure learning from observational data: people succeed if the underlying causal system is deterministic, and if each pattern of observations has a single root cause. In four experiments, we show that either condition alone is sufficient to enable high levels of performance, but that performance is poor if neither condition applies. A fifth experiment suggests that neither determinism nor root sparsity takes priority over the other. Our data are broadly consistent with a Bayesian model that embodies a preference for structures that make the observed data not only possible but probable.

Introduction

Causal networks have been widely used as models of the mental representations that support causal reasoning. For example, an engineer’s knowledge of the local electricity system may take the form of a network in which the nodes represent power stations and the links in the network represent connections between stations. Causal networks of this kind may be learned in several ways. For example, an intervention at station A that also affects station B provides evidence for a directed link between A and B. Networks can also be learned via instruction: for example, a senior colleague might tell the engineer that A sends power to B. Here, however, we focus on whether and how causal networks can be learned from observational data. For example, the engineer might observe that A and B both have voltage spikes on some occasions, that B alone has voltage spikes on others, but that A is never the only station with voltage spikes (Fig. 1). Based on these observations alone, the engineer might infer that A sends power to B.

The problem in Fig. 1 is an instance of structure learning because it requires a choice between two distinct graph structures: one in which A sends a link to B and the other in which B sends a link to A. Structure learning can be distinguished from parameter learning problems that require inferences about the properties of links in a known causal structure (Danks, 2014, Jacobs and Kruschke, 2011). For example, an engineer who knows that station A sends a link to station B might need to learn about the fidelity with which signals at A are transmitted to B. Causal parameter learning is often studied experimentally using paradigms in which a focal effect is clearly distinguished from a set of potential causes, and the learning problem is to infer the strength of the relationship between each candidate cause and the effect (Lu et al., 2008, Sloman, 2005). Here, however, we focus on structure learning problems in which the variables are not presorted into potential causes and effects.

A consensus has emerged that people find causal structure learning to be difficult or impossible given observational data alone. For example, Fernbach and Sloman (2009) cite results obtained by Steyvers et al., 2003, Lagnado and Sloman, 2004, White, 2006 to support their claim that “observation of covariation is insufficient for most participants to recover causal structure” (p. 680). Here we challenge this consensus by identifying two conditions that enable successful structure learning from observational data alone. The first condition is causal determinism, and is satisfied if each variable is a deterministic function of its direct causes. The second condition is root sparsity, and is satisfied if each observation is the outcome of a single root cause. Both conditions simplify the structure-learning problem by reducing the number of possible explanations for a given set of observations.

Determinism and root sparsity have both previously been discussed in the literature on causal reasoning. Several lines of research suggest that people tend to assume that causes are deterministic or near-deterministic (Frosch and Johnson-Laird, 2011, Lu et al., 2008, Schulz and Sommerville, 2006, Yeung and Griffiths, 2015), and this assumption has informed previous studies of structure learning (Mayrhofer and Waldmann, 2011, Mayrhofer and Waldmann, 2016, White, 2006). Our work is related most closely to a previous study by White (2006), who asked participants to learn the structure of deterministic causal systems from observational data alone. White’s task proved to be difficult, and performance was poor even when White gave his participants explicit instructions about how to infer causal structure from observational data. In contrast, we find that our participants are reliably able to infer the structure of deterministic causal systems.

Although “root sparsity” is our own coinage, this term is related to a cluster of existing ideas. Some work on causal attribution suggests that people tend to prefer explanations that invoke a single root cause (Chi et al., 2012, Lombrozo, 2007, Pacer and Lombrozo, 2017), although Zemla, Sloman, Bechlivanidis, and Lagnado (2017) report the opposite finding. Many studies of causal parameter learning consider cases in which there are two potential causes of an effect: a focal cause and a background cause. In this setting learners seem to expect that exactly one of these potential causes is strong (Lu et al., 2008). Mayrhofer and Waldmann (2015) explore a related idea in their work on prior expectations in structure learning. One of the priors that they consider captures the idea that an effect has a single cause. The notion of root sparsity is also consistent with studies of structure learning that focus on the role of interventions. Several researchers in this literature suggest that people tend to succeed only when interventions are not accompanied by spurious changes. If this condition holds then all changes observed following an intervention can be traced back to a single root cause – that is, to the intervention (Fernbach and Sloman, 2009, Lagnado and Sloman, 2004). Rottman and Keil (2012) show that the same condition supports structure learning from observational data if the temporal sequence of the observations is known.

Our primary goal is to explore the extent to which determinism and root sparsity allow people to succeed at structure learning. We find that people perform well when determinism and root sparsity both apply, and that either condition alone is sufficient to produce high levels of performance. To help us understand our participants’ inferences, we compare these inferences to the predictions of several computational models. We initially focus on a model that we refer to as the Bayesian structure learner, or the BSL for short. The BSL serves as a normative benchmark that helps to evaluate the extent to which people succeed at structure learning. Previous discussions of structure learning have also considered Bayesian benchmarks, but Fernbach and Sloman (2009) suggest that there is “little reason to treat them as descriptively correct” (p. 681). In our setting, however, we find that people’s inferences align closely with the predictions of our Bayesian model in many cases.

The BSL model contrasts with previous statistical accounts of structure learning that are sensitive to patterns of conditional independence between variables (Pearl, 2000, Spirtes et al., 2001). Like several previous authors (Fernbach and Sloman, 2009, Mayrhofer and Waldmann, 2011), we believe that models that track patterns of conditional independence are often too powerful to capture inferences made by resource-bounded human learners. The BSL model uses statistical inference in a different way, and relies on a computation that assesses how much of a coincidence the available data would be with respect to different possible structures. It is therefore possible that people rely on a similar kind of statistical computation when approaching structure learning problems.

Section snippets

Four classes of causal networks

The causal systems that we consider are simple activation networks. Each network can be represented as a graph which may include cycles. Each node in the graph can be active or inactive, and the edges in the graph transmit activation from one node to another.

This paper will consider four qualitatively different classes of causal networks that are summarized in Table 1. The causal links in a network may be deterministic (D) or probabilistic (P), and root causes may be sparse (S) or non-sparse

Bayesian structure learning

We now describe a Bayesian approach to the problem of structure learning. The primary purpose of our Bayesian framework is to provide a benchmark for assessing how well people learn structures from the four classes just described.

Suppose that we observe a data set D generated from an unknown network G. Our framework can be applied to problems based on all four of the network classes in Fig. 2, but we will initially assume that the unknown network belongs to class DS: in other words, that causal

Overview of experiments

We designed a series of experiments in which participants learned networks from the four classes in Table 1. Our primary goal was to explore whether people would perform well at structure-learning when reasoning about systems with deterministic causal links and systems that satisfy the root sparsity condition. Previous research suggests that people often perform poorly on structure-learning tasks given observational data alone (Steyvers et al., 2003, White, 2006), but we suspected that

Experiment 1: Deterministic links, one root cause

Because structure learning given observational data alone is traditionally thought to be difficult, Experiment 1 explores the network class (DS) that makes this problem as easy as possible. Our task is based on activation networks with three nodes, and we included structure-learning problems based on all such networks that are qualitatively different. The space of these networks includes common-cause structures and common-effect structures, both of which have proved difficult to learn in

Experiment 2: Deterministic links, multiple root causes

Experiment 2 relaxes the root sparsity assumption and explores how well people learn networks from class DN. The difficulty order in Eq. (7) predicts that structure learning should be more difficult for class DN than for class DS, but we expected that people’s inferences about class DN would still be relatively accurate.

Experiment 3: Probabilistic links, one root cause

Experiment 3 is directly analogous to Experiment 1 except that the determinism assumption is relaxed. Instead of assuming that activation always propagates along causal links, participants were told that links in the underlying network were sometimes inactive and thus could fail to transmit activation. As for Experiment 1, participants were led to believe that root sparsity applied, which meant that at most one detector would spontaneously activate on any trial. The difficulty order in Eq. (7)

Experiment 4: Probabilistic links, multiple root causes

In three separate experiments we have found that people succeed at structure learning, and this finding contrasts with previous structure-learning studies that report relatively low levels of performance. These previous studies typically consider networks with probabilistic links and do not assume root sparsity. Experiment 4 follows suit and asks whether our experimental paradigm also leads to poor performance in the absence of determinism and root sparsity.

Experiment 5: Mostly deterministic links, mostly one root cause

Our results demonstrate that people are able to reason successfully about networks that violate the root sparsity assumption (Experiment 2) and networks that violate the determinism assumption (Experiment 3). Neither assumption is essential for structure learning to succeed, but it is possible that one assumption is psychologically privileged with respect to the other. Experiment 5 explored this possibility using a task that required participants to abandon either root sparsity or determinism,

Overall model comparison

Now that we have presented results from five separate experiments, we consider how well the BSL and the broken link models account for the complete set of data. The two models appear in the first two columns of Fig. 19, and the remaining columns are for alternative models that will be discussed in subsequent sections. The final row of Fig. 19 shows aggregate plots that summarize the performance of a given model across all five experiments. These aggregate results provide additional support for

Alternative models

Although the BSL model accounts well for many aspects of our data, at least two important questions remain to be addressed. The first concerns the role of the prior. We have relied on a uniform prior so far, but perhaps changing this prior would produce a Bayesian model that better captures people’s inferences.

The second question asks how well the BSL model performs relative to non-Bayesian approaches that could be tried. Researchers have developed many formal accounts of causal learning from

Discussion

We presented five experiments that explore structure learning from observational data. Our studies focused specifically on two assumptions: determinism and root sparsity. We found that both assumptions enabled successful structure learning (Experiments 1 through 3). If violations of both assumptions are common then performance is relatively poor (Experiment 4), but performance remains high if violations are possible but rare (Experiment 5).

The most pressing question raised by our results is why

Conclusion

Previous studies often report that structure learning from observational data is difficult. In contrast, our results suggest that people find structure learning relatively easy if causes are deterministic or if each observation has a single root cause. There may be additional factors that enable successful structure learning, and our work may lead to future efforts that comprehensively chart the conditions under which people perform well at structure learning from observational data.

Comparing

Declarations of interest

None.

Acknowledgments

Experiment 1 was presented at the 34th Annual Meeting of the Cognitive Science society. Experiments 2, 3, and 5 were written up as AR’s Master’s thesis completed at Georg-August University Göttingen under the supervision of RM and CK.

We thank Evan Metsky for help with running Experiments 3 and 4, and Alan Jern, York Hagmayer, Bob Rehder, and Michael Waldmann for suggestions about this work.

BD was supported by a training program in Neural Computation that was organized by CMU’s Center for the

References (54)

  • P.W. Cheng

    From covariation to causation: A causal power theory

    Psychological Review

    (1997)
  • M.T.H. Chi et al.

    Misconceived causal explanations for emergent processes

    Cognitive Science

    (2012)
  • D. Danks

    Unifying the mind: Cognitive representations as graphical models

    (2014)
  • A.P. Dawid et al.

    Hyper Markov laws in the statistical analysis of decomposable graphical models

    The Annals of Statistics

    (1993)
  • B. Deverett et al.

    Learning deterministic causal networks from observational data

  • F. Eberhardt et al.

    Confirmation in the cognitive sciences: The problematic case of Bayesian models

    Minds and Machines

    (2011)
  • P.M. Fernbach et al.

    Causal learning with local computations

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2009)
  • B. Fischhoff

    Hindsight is not equal to foresight: The effect of outcome knowledge on judgment under uncertainty

    Journal of Experimental Psychology: Human perception and performance

    (1975)
  • N. Friedman et al.

    Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks

    Machine Learning

    (2002)
  • S.A. Gelman

    The essential child: Origins of essentialism in everyday thought

    (2003)
  • A. Gopnik et al.

    A theory of causal learning in children: Causal maps and Bayes nets

    Psychological Review

    (2004)
  • A. Gopnik et al.

    Causal learning mechanisms in very young children: Two, three, and four-year-olds infer causal relations from patterns of variation and covariation

    Developmental Psychology

    (2001)
  • Y. Hagmayer et al.

    Hierarchical Bayesian models as formal models of causal reasoning

    Argument & Computation

    (2013)
  • J.Y. Halpern et al.

    Actual causation and the art of modelling

  • D. Heckerman et al.

    A Bayesian approach to causal discovery

  • F. Heider

    The psychology of interpersonal relations

    (1958)
  • R.A. Jacobs et al.

    Bayesian learning theory applied to human cognition

    Wiley Interdisciplinary Reviews: Cognitive Science

    (2011)
  • Cited by (6)

    1

    Present address: School of Psychological Sciences, University of Melbourne, Australia.

    View full text