Elsevier

Cognition

Volume 114, Issue 2, February 2010, Pages 165-196
Cognition

A probabilistic model of theory formation

https://doi.org/10.1016/j.cognition.2009.09.003Get rights and content

Abstract

Concept learning is challenging in part because the meanings of many concepts depend on their relationships to other concepts. Learning these concepts in isolation can be difficult, but we present a model that discovers entire systems of related concepts. These systems can be viewed as simple theories that specify the concepts that exist in a domain, and the laws or principles that relate these concepts. We apply our model to several real-world problems, including learning the structure of kinship systems and learning ontologies. We also compare its predictions to data collected in two behavioral experiments. Experiment 1 shows that our model helps to explain how simple theories are acquired and used for inductive inference. Experiment 2 suggests that our model provides a better account of theory discovery than a more traditional alternative that focuses on features rather than relations.

Introduction

Parent: A person who has begotten or borne a child.

Child: The offspring, male or female, of human parents.

The Oxford English Dictionary, 2nd edition, 1989.

Samuel Johnson acknowledges that his dictionary of 1755 is far from perfect, but suggests that “many seeming faults are to be imputed rather to the nature of the undertaking, than the negligence of the performer.” He argues, for instance, that “some explanations are unavoidably reciprocal or circular, as hind, the female of the stag; stag, the male of the hind.” Analogies between dictionary definitions and mental representations can only extend so far, but Johnson appears to have uncovered a general truth about the structure of human knowledge. Scholars from many disciplines have argued that concepts are organized into systems of relations, and that the meaning of a concept depends in part on its relationships to other concepts (Block, 1986, Carey, 1985, Field, 1977, Goldstone and Rogosky, 2002, Quillian, 1968, Quine and Ullian, 1978). To appreciate the basic idea, consider pairs of concepts like parent and child, disease and symptom, or life and death. In each case it is difficult to imagine how a learner could fully understand one member of the pair without also understanding the other. Systems of concepts, however, are often much more complex than mutually dependent pairs. Concepts like life and death, for instance, are embedded in a system that also includes concepts like growth, eating, energy and reproduction (Carey, 1985).

Systems of concepts capture some important aspects of human knowledge but also raise some challenging puzzles (Fodor & Lepore, 1992). Here we mention just two. First, it is natural to think that many concepts (including dog, tree and electron) are shared by many members of our society, but if the meaning of any concept depends on its role within an entire conceptual system, it is hard to understand how two individuals with different beliefs (and therefore different conceptual systems) could have any concepts in common (Fodor & Lepore, 1992). Second, a holistic approach to concept meaning raises a difficult acquisition problem. If the meaning of each concept depends on its role within a system of concepts, it is difficult to see how a learner might break into the system and acquire the concepts that it contains (Hempel, 1985, Woodfield, 1987). Goldstone and Rogosky (2002) recently presented a formal model that helps to address the first puzzle, and here we present a computational approach that helps to address the second puzzle.

Following prior usage in psychology (Carey, 1985) and artificial intelligence (Davis, 1990) we use the term theory to refer to a system that specifies a set of concepts and relationships between these concepts. Scientific theories are paradigmatic examples of the systems we will consider, but psychologists have argued that everyday knowledge is organized into intuitive theories that are similar to scientific theories in many respects (Carey, 1985, Murphy and Medin, 1985, Wellman and Gelman, 1992). Both kinds of theories are believed to play several important roles. As we have already seen, theories help to individuate concepts, and many kinds of concepts derive their meaning from the roles they play in theories. Theories allow learners to explain existing observations, and to make predictions about new observations. Finally, theories guide inductive inferences by restricting a learner’s attention to features and hypotheses that are relevant to the task at hand.

Theories may take many different forms, and the examples we focus on are related to the “framework theories” described by Wellman and Gelman (1992). Framework theories specify the fundamental concepts that exist in a domain and the possible relationships between these concepts. A framework theory of medicine, for example, might indicate that two of the fundamental concepts are chemicals and diseases, and that chemicals can cause diseases (Fig. 1). A “specific theory” is a more detailed account of the phenomena in some domain, and is typically constructed from concrete instances of the abstract categories provided by the framework theory. Extending our medical example, a specific theory might indicate that asbestos can cause lung cancer, where asbestos is a chemical and lung cancer is a disease. The framework theory therefore suggests that any specific correlation between asbestos exposure and lung cancer is better explained by a causal link from asbestos to lung cancer than a link in the opposite direction. Although researchers should eventually aim for models that can handle both framework theories and specific theories, working with framework theories is a useful first step. Framework theories are important since they capture some of our most fundamental knowledge, and in some cases they appear simple enough that we can begin to think about them computationally.

Three fundamental questions can be asked about theories: what are they, how are they used to make inductive inferences, and how are they acquired? Philosophers and psychologists have addressed all three questions (Carey, 1985, Hempel, 1972, Kuhn, 1970, Popper, 1935, Wellman and Gelman, 1998), but there have been few attempts to provide computational answers to these questions. Our work takes an initial step in this direction: we consider only relatively simple theories, but we specify these theories formally, we use these theories to make predictions about unobserved relationships between entities, and we show how these theories can be learned from raw relational data.

The first of our three fundamental questions requires us to formalize the notion of a theory. We explore the idea that framework theories can be represented as a probabilistic model which includes a set of categories and a matrix of parameters specifying relationships between those categories. Representations this simple will only be able to capture some aspects of framework theories, but working with simple representations allows us to develop tractable answers to our remaining two questions.

The second question asks how theories can be used for inductive inference. Each of our theories specifies the relationships between categories that are possible or likely, and predictions about unobserved relationships between entities are guided by inductive inferences about their category assignments. Since we represent theories as probabilistic models, Bayesian inference provides a principled framework for inferences about category assignments, relationships between categories and relationships between entities.

The final question—how are theories acquired?—is probably the most challenging of the three. Some philosophers suggest that this question will never be answered, and that there can be “no systematic, useful study of theory construction or discovery” (Newton-Smith, 1981, p. 125). To appreciate why theory acquisition is challenging, consider a case where the concepts belonging to a theory are not known in advance. Imagine a child who stumbles across a set of identical-looking metal objects. She starts to play with these objects and notices that some pairs seem to exert mysterious forces on each other when they come into close proximity. Eventually she discovers that there are three kinds of objects—call them magnets, magnetic objects and non-magnetic objects. She also discovers causal laws that capture the relationships between these concepts: magnets interact with magnets and magnetic objects, magnetic objects interact only with magnets, and non-magnetic objects do not interact with any other objects. Notice that the three hidden concepts and the causal laws are tightly coupled. The causal laws are only defined in terms of the concepts, and the concepts are only defined in terms of the causal relationships between them. This coupling raises a challenging learning problem. If the child already knew about the three concepts—suppose, for instance, that different kinds of objects were painted different colors—then discovering the relationships between the concepts would be simple. Similarly, a child who already knew the causal laws should find it easy to group the objects into categories. We consider the case where neither the concepts nor the causal laws are known. In general, a learner may not even know when there are new concepts to be discovered in a particular domain, let alone how many concepts there are or how they relate to one another. The approach we describe attempts to solve all of these acquisition problems simultaneously.

We suggested already that Bayesian inference can explain how theories are used for induction, and our approach to theory acquisition is founded on exactly the same principle. Given a formal characterization of a theory, we can set up a space of possible theories and define a prior distribution over this space. Bayesian inference then provides a normative strategy for selecting the theory in this space that is best supported by the available data. Many Bayesian accounts of human learning work with relatively simple representations, including regions in multidimensional space and sets of clusters (Anderson, 1991, Shepard, 1987, Tenenbaum and Griffiths, 2001). Our model demonstrates that the Bayesian approach to knowledge acquisition can be carried out even when the representations to be learned are richly structured, and are best described as relational theories.

Our approach draws on previous proposals about relational concepts and on existing models of categorization. Several researchers (Gentner and Kurtz, 2005, Markman and Stilwell, 2001) have emphasized that many concepts derive their content from their relationships to other concepts, but there have been few formal models that explain how these concepts might be learned (Markman & Stilwell, 2001). Most models of categorization take features as their input, and are able only to discover categories defined by characteristic patterns of features (Anderson, 1991, Medin and Schaffer, 1978, Nosofsky, 1986). Our approach brings these two research programs together. We build on formal techniques used by previous models—in particular, our approach extends Anderson’s rational model of categorization—but we go beyond existing categorization models by working with rich systems of relational data.

The next two sections introduce the simple kinds of theories that we consider in this paper. We then describe our formal approach and evaluate it in two ways. First we demonstrate that our model learns large-scale theories given real-world data that roughly approximate the kind of information available to human learners. In particular, we show that our model discovers theories related to folk biology and folk sociology, and a medical theory that captures relationships between ontological concepts. We then turn to empirical studies and describe two behavioral experiments where participants learn theories analogous to the simple theory of magnetism already described. Our model helps to explain how these simple theories are learned and used to support inductive inferences, and we show that our relational approach explains our data better than a feature-based model of categorization.

Section snippets

Theories and theory discovery

“Theory” is a term that is used both formally and informally across a broad range of disciplines, including psychology, philosophy, and computer science. No definition of this term is universally adopted, but here we work with the idea that a theory is a structured system of concepts that explains some existing set of observations and predicts future observations. In the magnetism example just described, the concepts are magnets, magnetic objects, and non-magnetic objects, and these concepts

Learning simple theories

Before introducing the details of our model, we describe the input that it takes and the output that it generates and provide an informal description of how it converts the input to the output. The input for each problem specifies relationships among the entities in a domain, and the output is a simple theory that we refer to as a relational system. Each relational system organizes a set of entities into categories and specifies the relationships between these categories. Suppose, for instance,

A probabilistic approach to theory discovery

We now provide a more formal description of the model sketched in the previous section. Each relational system in Fig. 2 can be formalized as a pair (z,η), where z is a partition of the entities into categories and η is a matrix that indicates how these categories relate to each other (Fig. 6). The matrix η can be visualized as a category graph: a graph over the categories where the edge between category A and category B has weight η(A,B), expressing the probability that A-entities will link to

Evaluating models of theory discovery

Formal accounts of theory discovery can make two contributions to cognitive science: they help to address questions about the learnability of theories, and they help to explain human behavior. We will evaluate our approach along both dimensions.

Consider first the learnability issue. Many philosophers have suggested that there can be no computational account of theory discovery (Newton-Smith, 1981), and Hempel’s (1985) version of this claim is especially relevant to our approach. Hempel points

Categorizing objects and features

Later sections of this paper will demonstrate that our model can handle rich systems of relations, but we begin with a very simple setting where the raw data are a matrix of objects by features. Many models of categorization work with information about objects and their features and attempt to organize the objects into mutually exclusive categories (Anderson, 1991, Medin and Schaffer, 1978). Learning about features, however, can also be important, and often it makes sense to organize both

Discovering ontologies

Intuitive theories vary in abstraction. Some theories capture detailed knowledge about specific topics such as illness and its causes, or the laws that govern the motion of physical objects. Others are more general theories that distinguish between basic ontological categories (e.g. agents, mental states, artifacts, substances, and events) and specify relationships between these categories (e.g. that agents can have mental states, or that artifacts are made out of substances). As mentioned

Discovering kinship theories

Humans are social creatures, and intuitive theories about social systems govern many of our interactions with each other. Social systems take many forms: we all know, for example, that families, companies, and friendship networks are organized differently, and that membership in any of these systems is associated with a characteristic set of rules and obligations. Social theories represent a particularly important test case for models of theory acquisition, since there is compelling evidence

Experiment 1: learning causal theories

As children learn about the structure of their world, they develop intuitive theories about animals and their properties, about relationships between entities from different ontological kinds, and about the kinship system of their social group. Our results so far demonstrate that our model discovers interpretable theories in each of these domains. Each of these theories includes many categories, and the medical and kinship theories also include many relations. Our model therefore helps to

Experiment 2: relations or features?

As mentioned already, psychologists have developed many models of categorization, including the contrast model (Medin & Schaffer, 1978), the generalized contrast model (Nosofsky, 1986), and Anderson’s rational model (Anderson, 1991). The most prominent modeling tradition has focused on a feature-based approach where the category assignment for a given object is determined by its features. We have argued instead for a relational approach where the category assignment for an object is determined

General discussion

Our work is motivated by three general questions: what are theories, how do they support inductive inference, and how are they acquired? We approached these questions by working with a space of very simple theories. Each of these theories specifies the categories that exist in a domain and the relationships that exist between these categories. We showed that theories in this family can be characterized as generative models that make inductive predictions about unobserved interactions between

Conclusion

We presented a model that discovers simple theories, or systems of related concepts. Our model simultaneously discovers the concepts that exist in a domain, and the laws or principles that capture relationships between these concepts. Most previous models of concept formation are able only to discover combinations of pre-existing concepts. Unlike these approaches, our model can discover entire systems of concepts that derive their meaning from their relationships to each other.

Our model

Acknowledgements

Earlier versions of this work were presented at the 25th Annual Conference of the Cognitive Science Society (2003), the 46th Annual Meeting of the Psychonomic Society (2005), and the 21st National Conference on Artificial Intelligence (AAAI-06). Our AAAI paper was prepared in collaboration with Takeshi Yamada and Naonori Ueda. We thank Steven Sloman for sharing his copy of the feature data described in Osherson et al. (1991) and Woodrow Denham for providing his Alyawarra data in

References (106)

  • D. Aldous

    Exchangeability and related topics

  • J.R. Anderson

    The adaptive nature of human categorization

    Psychological Review

    (1991)
  • C. Antoniak

    Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems

    The Annals of Statistics

    (1974)
  • N. Block

    Advertisement for a semantics for psychology

  • N. Block

    Conceptual role semantics

  • S. Carey

    Conceptual change in childhood

    (1985)
  • N. Chater

    Reconciling simplicity and likelihood principles in perceptual organization

    Psychological Review

    (1996)
  • D. Conklin et al.

    Complexity-based induction

    Machine Learning

    (1994)
  • E. Davis

    Representations of commonsense knowledge

    (1990)
  • L. De Raedt et al.

    Probabilistic logic learning

    ACM-SIGKDD Explorations

    (2003)
  • Denham, W. W. (1973). The detection of patterns in Alyawarra nonverbal behavior. Unpublished doctoral dissertation,...
  • Denham, W. W. (2001). Alyawarra ethnographic database: Numerical data documentation file, version 7....
  • W.W. Denham et al.

    Aranda and Alyawarra kinship: A quantitative argument for a double helix model

    American Ethnologist

    (1979)
  • W.W. Denham et al.

    Multiple measures of Alyawarra kinship

    Field Methods

    (2005)
  • I.S. Dhillon et al.

    Information-theoretic co-clustering

  • T. Ferguson

    A Bayesian analysis of some nonparametric problems

    The Annals of Statistics

    (1973)
  • H. Field

    Logic, meaning and conceptual role

    The Journal of Philosophy

    (1977)
  • N. Findler

    Automatic rule discovery for field work in anthropology

    Computers and the Humanities

    (1992)
  • J. Fodor et al.

    Why meaning (probably) isn’t conceptual role

    Mind and Language

    (1991)
  • J. Fodor et al.

    Holism: A shopper’s guide

    (1992)
  • J.A. Fodor

    The language of thought

    (1975)
  • A. Gelman et al.

    Bayesian data analysis

    (2003)
  • Gentner, D., & Kurtz, K. (2005). Relational categories. In W. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. W....
  • D. Gentner et al.

    Structure mapping in analogy and similarity

    American Psychologist

    (1997)
  • L. Getoor et al.

    Learning probabilistic models of link structure

    Journal of Machine Learning Research

    (2002)
  • R.L. Goldstone

    Isolated and interrelated concepts

    Memory and Cognition

    (1996)
  • T.L. Griffiths et al.

    Two proposals for causal grammars

  • N.R. Hanson

    Patterns of discovery

    (1958)
  • Hayes, P. J. (1985). The second naive physics manifesto. In J. R. Hobbs & R. C. Moore (Eds.), Formal theories of the...
  • C.G. Hempel

    Fundamentals of concept formation in empirical science

    (1972)
  • C.G. Hempel

    Thoughts on the limitations of discovery by computer

  • T. Hofmann et al.

    Latent class models for collaborative filtering

  • L. Hubert et al.

    Comparing partitions

    Journal of Classification

    (1985)
  • J.E. Hummel et al.

    A symbolic-connectionist theory of relational inference and generalization

    Psychological Review

    (2003)
  • S. Jain et al.

    A split-merge Markov chain Monte Carlo procedure for the Dirichlet Process mixture model

    Journal of Computational and Graphical Statistics

    (2004)
  • Johnson, S. (1755). A dictionary of the English language....
  • F.C. Keil

    Semantic and conceptual development

    (1979)
  • F.C. Keil

    On the emergence of semantic and conceptual distinctions

    Journal of Experimental Psychology: General

    (1983)
  • F.C. Keil

    The emergence of theoretical beliefs as constraints on concepts

  • F.C. Keil

    The growth of causal understandings of natural kinds

  • Cited by (67)

    • Clustering and the efficient use of cognitive resources

      2022, Journal of Mathematical Psychology
      Citation Excerpt :

      As a consequence, the human mind has to form abstractions, clustering these experiences together in a way that supports generalization. Psychological models have applied this lens to phenomena as diverse as categorization (Anderson, 1991; Sanborn et al., 2010), feature learning (Griffiths & Austerweil, 2008), theory formation (Kemp et al., 2010), classical conditioning (Gershman et al., 2010), and word segmentation (Goldwater et al., 2009). A key problem that arises in each of these models is knowing when to generate a new cluster—when an object, stimulus, or word is genuinely of a kind that has never been seen before.

    • The Child as Hacker

      2020, Trends in Cognitive Sciences
      Citation Excerpt :

      The learning as programming approach, however, is importantly different in providing learners the full expressive power of symbolic programs both theoretically (i.e., Turing completeness) and practically (i.e., freedom to adopt any formal syntax). This approach applies broadly to developmental phenomena, including counting [52], concept learning [13,53], function words [54], kinship [55], theory learning [56,57], lexical acquisition [23], question answering [15], semantics and pragmatics [25,58,59], recursive reasoning [60], sequence transformation [61], sequence prediction [18,62], structure learning [63], action concepts [64], perceptual understanding [14,65], and causality [66]. These applications build on a tradition of studying agents who understand the world by inferring computational processes that could have generated observed data, which is optimal in a certain sense [67,68], and aligns with rational constructivist models of development [69–72].

    • Testing a model of destination image formation: Application of Bayesian relational modelling and fsQCA

      2020, Journal of Business Research
      Citation Excerpt :

      Recent studies have addressed and compared different stages of destination images—i.e. non-, pre- and post-visit phases (Cherifi et al., 2014; Jani & Hwang, 2011; Martín-Santana et al., 2017). From the view of well-established concept learning theories (Kemp et al., 2006, 2010; Murphy, 2002; Murphy & Medin, 1985; Tenenbaum et al., 2011), such dynamics of the MR formation process can be conceptualised as an iterative process. These cognitive theories imply that, when individuals are exposed to a stimulus sent from primary or secondary ISs, they undertake inductive reasoning based on their prior knowledge about a destination (a specific place/country).

    • Ideologically motivated biases in a multiple issues opinion model

      2020, Physica A: Statistical Mechanics and its Applications
    View all citing articles on Scopus
    View full text