Elsevier

Cognitive Psychology

Volume 111, June 2019, Pages 80-102
Cognitive Psychology

Sample size, number of categories and sampling assumptions: Exploring some differences between categorization and generalization

https://doi.org/10.1016/j.cogpsych.2019.03.001Get rights and content

Highlights

  • Category frequency changes generalization differently in one- and two-category tasks.

  • Increasing category frequency tightens generalizations for a single category.

  • Increasing category frequency expands generalizations with two categories.

  • Sampling assumptions moderate the effect of frequency in the two-category case.

Abstract

Categorization and generalization are fundamentally related inference problems. Yet leading computational models of categorization (as exemplified by, e.g., Nosofsky, 1986) and generalization (as exemplified by, e.g., Tenenbaum and Griffiths, 2001) make qualitatively different predictions about how inference should change as a function of the number of items. Assuming all else is equal, categorization models predict that increasing the number of items in a category increases the chance of assigning a new item to that category; generalization models predict a decrease, or category tightening with additional exemplars. This paper investigates this discrepancy, showing that people do indeed perform qualitatively differently in categorization and generalization tasks even when all superficial elements of the task are kept constant. Furthermore, the effect of category frequency on generalization is moderated by assumptions about how the items are sampled. We show that neither model naturally accounts for the pattern of behavior across both categorization and generalization tasks, and discuss theoretical extensions of these frameworks to account for the importance of category frequency and sampling assumptions.

Introduction

Categorization and generalization are two fundamental and deeply related inductive problems. Categorization problems are characterized by people learning based on labeled items from two or more categories and determining which out of a set of labels should be applied to novel objects. Instead of two categories, generalization problems often focus on learning a single category based on items from the category and asking the learner to determine whether a novel object belongs in that category. The surface differences between the two tasks appear to be almost negligible and in one sense are purely a matter of framing. In a category learning task, where every object belongs to exactly one of two categories, it is possible to reduce both problems to the same inductive problem in which the goal is to determine whether the novel object belongs in one category or not.

Viewed from this perspective, one might expect categorization and generalization to be essentially identical. Both require the learner to make inferences about the extensions of categories, both predict people’s behavior on the basis of psychological theories about how categories are represented, and both depend on the learner forming some representation of the categories on the basis of a set of exemplars. Accordingly, one would expect that theories of categorization and theories of generalization should agree with each other, at least qualitatively, when describing the inferences people make. In this paper we investigate a surprising and robust disagreement between these two different inference problems and show how this difference is mirrored in existing theoretical accounts. Specifically, we show that increasing the category frequency has qualitatively different effects on human inductive inferences in categorization and generalization.

To illustrate why we might predict the effect of category frequency to differ across tasks, we consider each task separately. In a Dax-or-Wug categorization problem, increasing the number of Dax observations (holding other factors constant) pushes the category boundary away from the observed Dax exemplars. Theoretical models of categorization capture this frequency effect in a natural fashion. For example, the Generalized Context Model (GCM; Nosofsky, 1986) is an exemplar model of categorization that computes a response strength for the Dax category by summing the similarities between the novel object and every previous Dax exemplar. Accordingly, adding more Dax observations without adding any Wug exemplars will increase the strength of the Dax category, especially for items similar to the Dax observations or whose category label is ambiguous. An item that was previously equally likely to be classified as Dax or Wug will now appear more Dax-like because additional Dax exemplars have been added. To put it another way, the GCM predicts a category frequency effect in which the point of subjective equivalence (where the response strengths for the two categories are equal) is pushed from the Daxes and towards the Wugs.

Now consider a generalization problem in which a learner is shown several Daxes and asked to determine whether a novel item is also a Dax. What happens to people’s generalizations as we increase the number of Daxes? By analogy to the categorization problem one might suppose that more examples of Daxes would encourage people to generalize more broadly. However, formal models of generalization, such as the Bayesian approach taken by Tenenbaum and Griffiths (2001), predict precisely the opposite. As the learner encounters more Daxes they become more confident that the empirically observed variation in Daxes is entirely representative of the full range. When only a few Daxes have been seen, it is quite plausible to believe that a novel object is also a Dax, even if it is somewhat dissimilar to the previously encountered items. Observing one tiny Dax and one small Dax does not rule out the possibility that Daxes can be large; but if the learner has seen 100 Daxes, all small, the odds that Daxes can be large become much lower: if large Daxes were possible one should have encountered them by now. As a consequence, the learner in this situation shows very little generalization to new items that differ significantly in size.

Despite the apparent inconsistency, both the categorization and generalization literatures have found substantial empirical justification for the divergent category frequency effects each predicts. Although there is considerable variability in paradigms and in the precise quantity being manipulated (e.g., frequency of a single item or of unique category items), there appears to be a consistent pattern. Increasing frequency typically produces tightening in generalization tasks across a variety of experimental frameworks and contexts (Frank and Tenenbaum, 2011, Hsu and Griffiths, 2016, Lewis and Frank, 2016, Navarro and Perfors, 2010, Navarro et al., 2012, Sanjana and Tenenbaum, 2003, Tenenbaum, 1999, Tenenbaum, 2000, Vong et al., 2013, Xu and Tenenbaum, 2007b, Xu and Tenenbaum, 2007a). However, in categorization designs, the typical pattern of results suggests that increasing frequency leads to wider generalization. This occurs when a single item within a category is repeated in standard categorization designs (Harris et al., 2008, Nosofsky, 1991, Nosofsky, 1988b) as well as typicality judgments (Vandierendonck, 1988, Williams and Durso, 1986), whereas increasing the frequency of all categories leads to increased stability and generalization (Breen and Schvaneveldt, 1986, Homa et al., 1973, Homa and Vosburgh, 1976, Homa et al., 1987, Homa et al., 1991), and the expansion of category membership predictions (Barsalou, 1985), category size estimates (Beyth-Marom & Fischhoff, 1977), trait acceptance (Boseovski & Lee, 2006), and relative similarity (Polk, Behensky, Gonzalez, & Smith, 2002).

This is somewhat surprising: the implication is that the same manipulation (increasing sample size of the Dax category) causes the Dax category to expand when items from two categories are shown and the task is framed as a Dax-or-Wug problem, but causes it to tighten when items from one category are shown and the task is recast as Dax-or-not-Dax. It becomes more surprising when one realizes – as we demonstrate later – that neither model predicts this reversal. The original GCM predicts expansion in the categorization task, and a basic adaptation of the GCM to a generalization task continues to predict expansion. Similarly, the Bayesian generalization model predicts narrowing in the original problem and continues to do so when applied in a Dax-or-Wug style categorization task.

Given how puzzling the inconsistency appears, one might suppose that it could be resolved by showing that one of the two phenomena is an experimental artifact. Perhaps the difference can be attributed to different choices of stimuli, different choices of dependent measure, or different kinds of presentation. For example, some generalization tasks (e.g., Navarro et al., 2012, Vong et al., 2013) do not show people specific stimuli on a trial by trial basis, instead giving people a data visualization that graphically represents where the stimuli fall (e.g. hormone levels marked as dots on a line). Many categorization studies were not designed to investigate overall category frequency (e.g., Nosofsky, 1988b), and in most cases there are other variables (e.g., specific exemplar frequencies, category variability) that are varied at the same time. Accordingly, while the pattern in the literature does seem consistent, it is not easy to place the two kinds of experimental design on a common footing, nor is it simple to find “pure” effects of category frequency in the existing studies. Our goal in this paper is to present experiments that eliminate these differences and assess categorization and generalization experiments using a common experimental paradigm. By doing so, we hope to provide clear empirical evidence about whether people do in fact treat these problems in different ways, and why.

The structure of our paper is as follows. We begin with a more careful discussion of the theoretical issue, showing how the inconsistency between the two modeling approaches arises because of a fundamental difference in how they conceptualize the inference problem and is not due to superficial modeling choices like parameter settings. We then present two experiments that show that the effect of sample size is indeed different in the categorization task than in the generalization task, even when using common stimulus sets and response measures. We argue that the difference arises because there is a genuine difference between the two problems: figuring out how to generalize from one category is a qualitatively different kind of thing than figuring out how to assign an observation to one of two categories. Finally, in a third experiment, we show that these frequency effects are modulated by instructional manipulations that influence the prior beliefs about how items are sampled, suggesting a common cognitive mechanism.

Before continuing, given that there is some ambiguity about the meaning of terms like categorization and generalization, it is important to be precise about how we are using the terms. Throughout the paper we use categorization to refer to the inference problem in which items from more than one category are encountered during training, while generalization denotes the problem in which learners must make judgments when only seeing items from one category. Though it has sometimes been conflated in the literature, we consider the type of response people are asked to make to be orthogonal to the categorization-generalization distinction. We use the term forced choice task whenever the dependent measure is constructed from a forced choice decision (either Wug-or-Dax or Wug-or-Not-Wug). Conversely, a probability judgment task refers to situations where the dependent measure is a probability judgment of membership in a single category. Both response types can be applied to categorization and generalization designs.1

Section snippets

Models of generalization and categorization

We begin by systematically evaluating two specific models of categorization and generalization. Do they genuinely produce these different effects, and if so why?

On the categorization side, we focus on the generalized context model (GCM) of Nosofsky (1986). We choose this model because it is the archetypal model within the categorization literature. It has been used to account for a wide range of phenomena in categorization including item and category frequency effects (Nosofsky, 1988b,

Participants

We recruited 500 participants on Amazon Mechanical Turk and collected data from 499 participants (the data from one participant was not saved). Of the 499 total participants, 23 were excluded because they had previously participated in similar online experiments run by our lab. An additional 94 were excluded for failing to meet a predefined accuracy threshold for non-critical test stimuli, described below. The remaining 382 participants were included in all analyses. Participants ranged in age

Experiment 2: Generalizations about one target category

The superior performance of the GCM on a categorization task raises the possibility that it might also outperform the Bayesian approach on a generalization problem. Perhaps previous papers that found a tightening effect in generalization were false positives, or perhaps differences in experimental procedure can account for the difference in results. With this in mind, we conducted a second experiment in which people were shown examples from one category and asked to make generalizations about

Experiment 3: Manipulating sampling assumptions

This experiment replicates the Four exemplar condition of Experiment 1 as well as two variants of the original Twelve exemplar condition. In these two new conditions the set of training items is identical but the cover story for how items are sampled differs. The Twelve Random condition is designed to induce a weak sampling assumption by explaining that items are selected at random, independently from category membership. In contrast, the Twelve Helpful condition is designed to induce a strong

General discussion

Taken together, the results from Experiments 1 and 2 paint a clear picture: the effect of increasing sample size is qualitatively different in categorization tasks and generalization tasks. In an inductive generalization problem, the learner is presented with positive examples that belong to a single category, and asked to determine whether novel items also belong to this category. In the generalization context, the Bayesian generalization model developed by Tenenbaum and Griffiths (2001) makes

Acknowledgments

DJN received salary support from ARC grant FT110100431 and AFP from ARC grant DE120102378. Research costs and salary support for ATH were funded through ARC grants DP110104949 and DP150103280. KR was supported by an Australian Government Research Training Program Scholarship. Preliminary versions of this work were presented at the 48th and 50th Annual Meeting of the Society of Mathematical Psychology. We would like to thank Robert Nosofsky and two anonymous reviewers for their helpful comments

References (72)

  • J.R. Anderson

    The adaptive nature of human categorization

    Psychological Review

    (1991)
  • F.G. Ashby et al.

    Decision rules in the perception and categorization of multidimensional stimuli

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1988)
  • F.G. Ashby et al.

    Toward a unified theory of similarity and recognition

    Psychological Review

    (1988)
  • F.G. Ashby et al.

    Varieties of perceptual independence

    Psychological Review

    (1986)
  • L.W. Barsalou

    Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1985)
  • R. Beyth-Marom et al.

    Direct measures of availability and judgments of category frequency

    Bulletin of the Psychonomic Society

    (1977)
  • J.J. Boseovski et al.

    Children’s use of frequency information for trait categorization and behavioral prediction

    Developmental Psychology

    (2006)
  • T.J. Breen et al.

    Classification of empirically derived prototypes as a function of category experience

    Memory & Cognition

    (1986)
  • A.L. Cohen et al.

    Category variability, exemplar similarity, and perceptual classification

    Memory & Cognition

    (2001)
  • C. Donkin et al.

    A power-law model of psychological memory strength in short-and long-term recognition

    Psychological Science

    (2012)
  • R.L. Goldstone

    Influences of categorization on perceptual discrimination

    Journal of Experimental Psychology: General

    (1994)
  • N.D. Goodman et al.

    A rational analysis of rule-based concept learning

    Cognitive Science

    (2008)
  • H.D. Harris et al.

    Prior knowledge and exemplar frequency

    Memory & Cognition

    (2008)
  • G.E. Hawkins et al.

    A dynamic model of reasoning and memory

    Journal of Experimental Psychology: General

    (2016)
  • B.K. Hayes et al.

    The diversity effect in inductive reasoning depends on sampling assumptions

    Psychonomic Bulletin & Review

    (2019)
  • D. Homa et al.

    The changing composition of abstracted categories under manipulations of decisional change, choice difficulty, and category size

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1987)
  • D. Homa et al.

    Prototype abstraction and classification of new instances as a function of number of instances defining the prototype

    Journal of Experimental Psychology

    (1973)
  • D. Homa et al.

    Instance frequency, categorization, and the modulating effect of experience

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1991)
  • D. Homa et al.

    Category breadth and the abstraction of prototypical information

    Journal of Experimental Psychology: Human Learning and Memory

    (1976)
  • A. Hsu et al.

    Sampling assumptions affect use of indirect negative evidence in language learning

    PLoS ONE

    (2016)
  • J.K. Kruschke

    Human category learning: Implications for backpropagation models

    Connection Science

    (1993)
  • M.D. Lee et al.

    Extending the alcove model of category learning to featural stimulus domains

    Psychonomic Bulletin & Review

    (2002)
  • S. Lewandowsky et al.

    Competing strategies in categorization: Expediency and resistance to knowledge restructuring

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2000)
  • S. Lewandowsky et al.

    Knowledge partitioning: Context-dependent use of expertise

    Memory & Cognition

    (2000)
  • M.L. Lewis et al.

    Understanding the effect of social context on learning: A replication of Xu and Tenenbaum (2007b)

    Journal of Experimental Psychology: General

    (2016)
  • F. Liang et al.

    Mixtures of g priors for bayesian variable selection

    Journal of the American Statistical Association

    (2008)
  • Cited by (15)

    View all citing articles on Scopus
    View full text