Elsevier

Cognition

Volume 96, Issue 2, June 2005, Pages 143-182
Cognition

The differential role of phonological and distributional cues in grammatical categorisation

https://doi.org/10.1016/j.cognition.2004.09.001Get rights and content

Abstract

Recognising the grammatical categories of words is a necessary skill for the acquisition of syntax and for on-line sentence processing. The syntactic and semantic context of the word contribute as cues for grammatical category assignment, but phonological cues, too, have been implicated as important sources of information. The value of phonological and distributional cues has not, with very few exceptions, been empirically assessed. This paper presents a series of analyses of phonological cues and distributional cues and their potential for distinguishing grammatical categories of words in corpus analyses. The corpus analyses indicated that phonological cues were more reliable for less frequent words, whereas distributional information was most valuable for high frequency words. We tested this prediction in an artificial language learning experiment, where the distributional and phonological cues of categories of nonsense words were varied. The results corroborated the corpus analyses. For high-frequency nonwords, distributional information was more useful, whereas for low-frequency words there was more reliance on phonological cues. The results indicate that phonological and distributional cues contribute differentially towards grammatical categorisation.

Introduction

A necessary prerequisite to producing sentences is that the language learner derives a knowledge of the different grammatical categories and the relations between them. Knowing the category of a word is also a precursor to understanding referents in other's speech. Given the importance of this knowledge in language acquisition it is not surprising that so much debate has centred on this issue, particularly over how grammatical category information is attained. At one level, discussions have concerned whether the categories themselves are innate (Pinker, 1984), or can be learned (though it is, of course, agreed that assignment of lexical items to categories is learned). Assuming that grammatical categories can be learned, another level of debate concerns the sources available to the child in order to learn such categories. Explanations have been offered that invoke the importance of semantic (Bowerman, 1973, Macnamara, 1972), phonological (Kelly, 1992), and distributional (Harris, 1951) cues in the learning process. These have been reviewed in detail elsewhere and so we do not consider them at length here (Christiansen et al., 1998, Christiansen and Dale, 2001, Mintz, 2002, Redington and Chater, 1998). Several studies have explored the potential value of using one type of cue, either phonological or distributional, yet the benefits of integrating information between the different types has not been assessed empirically. This paper provides a test of how information is integrated across these different modalities of cues, employing corpus analyses of child-directed speech and an artificial language learning experiment.

Section snippets

Cues for grammatical categorisation

There are numerous studies that have assessed phonological and distributional information in determining the grammatical category of words. We review these in turn.

Combining distributional and phonological cues

Shi et al.'s (1998) analyses may be interpreted as combining phonological, acoustic and distributional cues, in that frequency and utterance position could be considered to be distributional cues. The differences in distributions for each cue were significant in their study, but it remains unclear how much information each source contributed towards correct classification, and what benefits may accrue from combining information between sources. A number of issues remain unresolved by these

Method

Corpus preparation. The corpus was derived from the CHILDES database of child-directed speech. We extracted all the speech by adults from all the English corpora in the database, resulting in 5,436,855 words. We replaced pauses and stops with boundary markers, producing 1,369,574 utterances in the corpus. The average length of an utterance was 3.97, which is in accordance with an assessment of the Bernstein Ratner fragment of the CHILDES corpus (Bernstein Ratner & Rooney, 2001). The CHILDES

Method

Corpus preparation. The same corpus as for Experiment 1 was employed.

Cue derivation. We tested the extent to which extremely local distributional information—bigram statistics provided useful information about grammatical category. We measured the occurrence of the target word appearing after a context word in contrast to Redington et al. (1998) who assessed the two previous and the two following words, or Mintz (2003) who employed one or more preceding and following words. Our analyses,

Experiment 3: combining phonological and distributional cues

Fig. 3 shows the correct classification of nouns and verbs based on the discriminant analyses of phonological or distributional cues entered separately for different frequency groupings. For high-frequency items, distributional information is extremely useful, but drops off dramatically for lower frequency items. For the phonological cues, the opposite pattern is observed: better performance for lower frequency words.4

Experiment 4: artificial language learning of bigrams

We adapted Valian and Coulson's (1988) artificial language such that category words were presented with different frequencies during training. Our hypothesis was that the association with marker-words would be learned more quickly for the high-frequency category words than the low-frequency category words. We also varied the extent to which there was coherence within the two categories of words. All the words within a category either shared several phonological properties, or none. We were also

General discussion

When cues were considered jointly in the discriminant analyses, classification accuracy increased over when single cues were considered. Cues provided additive value in the classification, contributing towards classification of different items. This was especially true when phonological and distributional cues were considered together. We found confirmation for our hypothesis that phonological and distributional information contributed differentially towards categorisation. At points where

Acknowledgements

All three authors were supported by Human Frontiers of Science Program grant RGP0177/2001-B. The second author was also supported by European Commission Project grant number HPRN-CT-1999-00065.

References (64)

  • P. Monaghan et al.

    Hemispheric asymmetries in the split-fovea model of semantic processing

    Brain and Language

    (2004)
  • M. Redington et al.

    Distributional information: A powerful cue for acquiring syntactic categories

    Cognitive Science

    (1998)
  • R. Shi et al.

    Newborn infants' sensitivity to perceptual cues to lexical and grammatical words

    Cognition

    (1999)
  • K.H. Smith

    Learning co-occurrence restrictions: Rule induction or rote learning?

    Journal of Verbal Learning and Verbal Behavior

    (1969)
  • M. Tomasello

    The item-based nature of children's early syntactic development

    Trends in Cognitive Science

    (2000)
  • V. Valian et al.

    Anchor points in language learning: The role of marker frequency

    Journal of Memory and Language

    (1988)
  • R.N. Aslin et al.

    Computation of conditional probability statistics by 8-month old infants

    Psychological Science

    (1996)
  • R.H. Baayen et al.

    The CELEX Lexical Database (CD-ROM)

    (1995)
  • N. Bernstein Ratner et al.

    How accessible is the lexicon in Motherese?

  • L. Bloomfield

    Language

    (1933)
  • M. Bowerman

    Structural relationships in children's utterances: Syntactic or semantic?

  • M.D.S. Braine

    What is learned in acquiring word classes: A step toward an acquisition theory

  • P.B. Brooks et al.

    Acquisition of gender-like noun subclasses in an artificial language: The contribution of phonological markers to learning

    Journal of Memory and Language

    (1993)
  • R. Campbell et al.

    This and thap-Constraints on the pronunciation of new written words

    Quarterly Journal of Experimental Psychology

    (1981)
  • K.W. Cassidy et al.

    Children's use of phonology to infer grammatical class in vocabulary learning

    Psychonomic Bulletin and Review

    (2001)
  • M.H. Christiansen et al.

    Learning to segment speech using multiple cues: A connectionist model

    Language and Cognitive Processes

    (1998)
  • M.H. Christiansen et al.

    Integrating distributional, prosodic and phonological information in a connectionist model of language acquisition

    (2001)
  • M.H. Christiansen et al.

    Discovering verbs through multiple-cue integration

  • A. Cutler

    Phonological cues to open- and closed-class words in the processing of spoken sentences

    Journal of Psycholinguistic Research

    (1993)
  • T. Dunning

    Accurate methods for the statistics of surprise and coincidence

    Computational Linguistics

    (1993)
  • G. Durieux et al.

    Predicting grammatical classes from phonological cues: An empirical test

  • S.P. Finch et al.

    Bootstrapping syntactic categories

    (1992)
  • Cited by (0)

    View full text