Elsevier

Brain and Language

Volume 97, Issue 3, June 2006, Pages 322-331
Brain and Language

Asymmetries for the visual expression and perception of speech

https://doi.org/10.1016/j.bandl.2005.11.007Get rights and content

Abstract

This study explored asymmetries for movement, expression and perception of visual speech. Sixteen dextral models were videoed as they articulated: ‘bat,’ ‘cat,’ ‘fat,’ and ‘sat.’ Measurements revealed that the right side of the mouth was opened wider and for a longer period than the left. The asymmetry was accentuated at the beginning and ends of the vocalization and was attenuated for words where the lips did not articulate the first consonant. To measure asymmetries in expressivity, 20 dextral observers watched silent videos and reported what was said. The model’s mouth was covered so that the left, right or both sides were visible. Fewer errors were made when the right mouth was visible compared to the left—suggesting that the right side is more visually expressive of speech. Investigation of asymmetries in perception using mirror-reversed clips revealed that participants did not preferentially attend to one side of the speaker’s face. A correlational analysis revealed an association between movement and expressivity whereby a more motile right mouth led to stronger visual expressivity of the right mouth. The asymmetries are most likely driven by left hemisphere specialization for language, which causes a rightward motoric bias.

Introduction

On the face of it, the face appears to be almost symmetrical about its left/right axis. This apparent symmetry, however, belies a number of subtle and potentially important functional asymmetries. For the expression of emotion, the left side of the face moves more that the right (Borod, Kent, Koff, Martin, & Alpert, 1988). This asymmetry is particularly strong for the expression of sadness and reduced or reversed for the expression of happiness (Nicholls, Ellis, Clement, & Yoshino, 2004). Asymmetries for the visual expression and perception of speech have also been observed. Research investigating asymmetries in speech fall into three main categories: (1) asymmetries in movement of the speaker’s mouth, (2) asymmetries in visual expressivity of the speaker’s mouth, and (3) asymmetries in the perception of visible speech generated by the observer.

First, asymmetries in movement between the left and right sides of the mouth have frequently been reported during speech production (Campbell, 1982). Wolf and Goodale (1987) measured the amplitude and velocity of videotaped mouth movements when participants articulated a syllable, or made silent mouth movements. They found that the right side of the mouth opened faster and to a greater extent than the left. Because this asymmetry was observed for both verbal and non-verbal movements of the mouth, Wolf and Goodale (1987) concluded that the right side of the mouth, and hence the left hemisphere (LH), was specialized for the production of complex motor behavior. Other researchers, however, have found that the asymmetry is more reliable for verbal production (Graves & Landis, 1990). For example, Cadalbert, Landis, Regard, and Graves (1994) demonstrated that the right lips moved more than the left for spontaneous speech production, reciting the words of a song and singing the words of a song. For singing with no words, however, no asymmetry was observed. Although this result implies that verbal production is needed for the asymmetry to arise, it is also possible that singing with no words requires little movement of the external articulators.

Larger movements of the right side of the mouth have been reported for both genders; though this may only be true for discrete rather than serial word production (Hausmann et al., 1998). Increased motility of the right side of the mouth has been observed in the syllabic babbling of 5-month-old infants (Holowka & Petitto, 2002) and in Common Marmosets during social contact calls (Hook-Costigan & Rogers, 1998). The fact that such asymmetries exist in the absence of true language suggests that the asymmetry is tied to species-specific vocal communication, which may be lateralized to the LH in primates (Petersen, 1978) and infants (MacKain, Studdert-Kennedy, Spieker, & Stern, 1983). In an adult population, increased motility of the right side of the mouth has been linked to LH specialization for the production of speech, which exists in approximately 95% of the right-handed population (Sass, Buchanan, Westerveld, & Spencer, 1994). The direct connection between the LH and the right side of the mouth may cause it to move more than the left side of the mouth, which is controlled by the non-verbal, right hemisphere (RH) (Graves, Strauss, & Wada, 1990).

The second category of research has investigated asymmetries between the left and right sides of the mouth in their ability to express speech. Graves and Potter (1988) investigated the auditory expression of speech by asking participants to restrain manually the lips on one side of their mouth while reciting a tongue twister, which required precise bilabial co-ordination. Listeners, who could not see the speaker, judged the quality of articulation of each sample. The tongue twisters were articulated more clearly when the left side of the mouth was restrained and the right side of the mouth was free to move; demonstrating that movements of the right side of the mouth are more important to the articulation of speech. Similar results have been reported by Cadalbert et al. (1994).

A number of investigators have also investigated the effect of the mouth asymmetry on the visual expression of speech. The visual expression and perception of speech, also known as lip-reading, is an important source of information, not only for deaf people, but also for people with normal hearing. The importance of visual information to the perception of speech in individuals with normal hearing is exemplified by the McGurk effect. This illusion arises when speech sounds and lip movements are incongruent. Thus, if the mouth movements of ‘ga’ are dubbed over the sound ‘ba,’, a fusion of the two (e.g., ‘da’) is often heard (McGurk & McDonald, 1976). Such an effect clearly demonstrates that visual processing affects auditory experience in normal individuals (Driver, 1996).

Campbell (1986) examined asymmetries in lip-reading using photographic chimeric images where one side of the face articulated one sound (a consonant or vowel) while the other side articulated another. When asked to identify the sounds represented by the left and right sides of the face, participants reported the sound from the speaker’s right side more accurately. Campbell therefore concluded that the right side of the speaker’s mouth was more visually expressive of speech than was the left. A similar conclusion was reached by Nicholls, Searle, and Bradshaw (2004). They filmed individuals as they produced consonant–vowel syllables (such as ‘ba’ and ‘da’) and then rearranged the soundtrack and video footage to produce incongruent pairings, designed to elicit the McGurk effect. To investigate asymmetries in lip-reading, either the left or right side of the mouth was covered while participants watched the audiovisual clips. A significant McGurk effect was observed, which was weakest when the right side of the mouth was covered (and the left mouth was visible) and strongest when the left was covered (and the right mouth was visible). Such a result demonstrates that the right side of the mouth is more important in generating the McGurk effect and is therefore more important in the visual communication of speech.

The final category of research has investigated the perceptual asymmetries generated by the observer as they lip-read. A number of studies have used divided visual field techniques, where stimuli are presented briefly to either the left or right visual. Because the left and right visual fields project to the contralateral visual cortex, experimenters can determine which visual field (and hence which hemisphere) can extract visual information from the mouth more efficiently. Campbell, De Gelder, and De Haan (1996) presented still facial images to the left or right visual field, which silently articulated a Dutch word. They found evidence of a right visual field (hence LH) advantage for matching the face to the sound it would have made. A similar right visual field advantage has been reported by Smeele, Massaro, Cohen, and Sittig (1998). The right visual field advantage may reflect its direct access to LH language processing centers. A right visual field advantage, however, is not always reported. Baynes, Funnell, and Fowler (1994) reported that the McGurk effect was stronger when the visual stimulus was presented in the left visual field—though this asymmetry was reversed when printed response alternatives were provided. Finally, Campbell (1986) has reported a left visual field advantage for matching a photograph of the lips to the sound that was produced. In this case, it was suggested that the left visual field advantage reflected the RH’s superiority for face processing. From the evidence review above, it can be seen that support for a LH or RH advantage is inconsistent and may depend upon the degree to which the task loads on language or face processing skills, respectively.

Divided visual field presentations do not allow observation of the face under normal viewing conditions. A more naturalistic technique, which presents the face under free-viewing conditions, has been used by Thompson, Malmberg, Goodell, and Boring (2004) to examine how attention is distributed across a speaker’s face. Spatial attention was gauged by presenting targets across the face and asking observers to report them when they appeared. Targets were detected better when they were presented on the left side of the speaker’s face (i.e., to the observer’s right). Similar biases toward the observer’s right side have been reported by Burt and Perrett, 1997, Campbell, 1982 for chimeric lip-reading stimuli. These rightward biases are generally thought to have an attentional origin, generated by activation of LH language processing mechanisms (Thompson et al., 2004). It is interesting to note that a rightward bias on behalf of the observer causes them to attend to the less informative (left) side of the speaker’s face.

A leftward bias has been observed by Nicholls, Searle et al. (2004) under free-viewing conditions. They investigated perceptual asymmetries by presenting video images in their normal orientation, or left/right mirror reversed. They found that the McGurk illusion was reduced when the videos were mirror-reversed—but only when the entire mouth was visible. The fact that the effect of mirror reversal emerged only when the whole mouth was visible suggests that the critical perceptual asymmetry is generated by the mouth, not the face. The asymmetry is therefore unlikely to reflect a leftward attentional bias generated through activation of RH face processing mechanisms. Instead, Nicholls et al. suggested that the effect of mirror-reversal arose because observers attend more to their left side where the right side of the speaker’s mouth is located—because this side is usually more informative. When the image is mirror-reversed, observers continue to attend to their left where they believe the right side of the speaker’s mouth is located. Because the image has been mirror reversed, however, they are actually attending to the left, less informative, side of the speaker’s mouth, resulting in a reduction of the McGurk effect. The mechanism proposed by Nicholls has ecological validity because it ties the asymmetry in perception to the side of the mouth that is more expressive. However, the mechanism is not directly related to hemispheric laterality or the perceptual asymmetries they generate.

Asymmetries in the expression and perception of visual speech have the potential to affect the efficiency of speech-reading in normal and hearing-impaired populations. Knowledge of the lateralization of these functions is also important to understanding the neural bases of these functions. From the evidence reviewed above, however, it can be seen that the different categories of research have yielded inconsistent results on this topic. The present study sought to address these inconsistencies by measuring all three categories of behavior in the one study. Previous research has often used static, non-naturalistic, images to investigate asymmetries in visual language expression. The current study improves upon previous research by using digital video footage to capture the movements and to display the images.

Participants made four-choice forced discriminations of silently presented video articulations of ‘bat,’ ‘cat,’ ‘fat’ or ‘sat.’ These words can be differentiated from one another on the basis of one visible speech category or viseme (Owens & Blazek, 1985). However, the place of articulation for the consonants that identify the words fall into two categories. The place of articulation for the ‘b’ in ‘bat’ is labial while the ‘f’ in ‘fat’ is labiodental. Thus, both consonants are articulated, at least in part, by the lips. In contrast, the ‘c’ in ‘cat’ is velar while the ‘s’ in ‘sat’ is alveolar (Walther, 1982). Because lip movements do not differentiate these consonants, it was predicted that they may involve less precise movements of the mouth—leading to a smaller asymmetry in movement compared to ‘bat’ and ‘fat.’

Asymmetries in movement were gauged by measuring the vertical size of the opening on the left and right side of the mouth as it changed throughout the articulation. Movement asymmetries were analyzed in three ways. First, movement was averaged over each articulation to obtain a measure of average movement for each of the words articulated by each of the speakers. It was expected that the right side of the mouth would, on average, be opened wider than the left. Second, movement asymmetries were measured as they changed across the beginning, middle, and end of the articulation. Bearing in mind reports that mouth asymmetries are accentuated at the beginning and end of an articulation (Wolf & Goodale, 1987), it was anticipated that the asymmetry would be attenuated at the middle of the articulation. Increased asymmetries at the beginning and end of an articulation could reflect the fact that the right side of the mouth opened earlier and closed later than did the left. To investigate this, the total time the right or left side of the mouth was open was analyzed.

Asymmetries in the ability of the mouth to express visible speech were measured by covering the left or right sides of the mouth. To accentuate the asymmetry, the middle of the mouth was also covered so that only approximately one-third of the mouth was visible. It was predicted that fewer errors would occur when the right side of the mouth was visible compared to when the left side was visible. Observer’s perceptual asymmetries were gauged by presenting the videos in normal or mirror-reversed orientations. If the asymmetry is tied to the physiognomy of the speaker’s face and is not related to a preference on behalf of the observer to attend to one side or another, mirror reversal should have no effect on the data.

Finally, the association between asymmetries in movement and asymmetries in the ability to express visual speech was investigated through a correlational analysis. It was expected that individuals with more asymmetrical mouth movements should also be more asymmetrical when having their speech read. Such a finding would demonstrate the functional significance of the movements and would also confirm the validity of the movement and expressivity measures.

Section snippets

Participants

Twenty (f = 12, m = 8) observers were drawn from a university student population. The modal age of observers was 18 years and all were dextral according to the Edinburgh Handedness Inventory (Oldfield, 1971). Observers reported having normal, or corrected to normal visual acuity and all were naïve in relation to the specific aims of the study. Written consent was given prior to the commencement of the study.

Apparatus

Stimuli were filmed using a Canon MVX1 digital video camera. This camera uses the PAL

Average movement

A measure of average mouth aperture was obtained using the mean aperture across the duration of the utterance for each side of the mouth, each of the models and each of the words. The data were analyzed with a repeated measures ANOVA with side (left, right) and word (bat, cat, fat, sat) as within-participants factors and model as the source of variance. Effect sizes are reported as eta-squared values (η2). As can be seen in Fig. 3, the opening of the right side of the mouth was significantly

Discussion

Movement profiles were affected by the word being pronounced. The ‘b’ in ‘bat’ is a plosive bilabial stop, produced by obstructing the passage of air with both lips and then suddenly releasing the pressure (Walther, 1982). This means of articulation is reflected in Fig. 5, with a sudden opening of the mouth, which declines over time. The ‘f’ in ‘fat’ is a voiceless labiodental fricative produced by passing air through a narrow channel between the lower lip and upper teeth. Therefore, ‘fat’ is

References (28)

  • A. Cadalbert et al.

    Singing with and without words: Hemispheric asymmetries in motor control

    Journal of Clinical and Experimental Neuropsychology

    (1994)
  • R. Campbell

    Asymmetries in moving faces

    British Journal of Psychology

    (1982)
  • J. Driver

    Enhancement of selective listening of illusory mislocation of speech sounds due to lip-reading

    Nature

    (1996)
  • R. Graves et al.

    Asymmetry in mouth opening during different speech tasks

    International Journal of Psychology

    (1990)
  • Cited by (8)

    • Dissociating facial electromyographic correlates of visual and verbal induced rumination

      2021, International Journal of Psychophysiology
      Citation Excerpt :

      Finally, we positioned a sensor on the flexor carpi radialis (FCR) to control for general (whole body) muscle contraction (see the supplementary materials for a depiction of the position of the sensors). Speech-related sensors were positioned on the right side of the face whereas the emotion-related (forehead) sensor was positioned on the left side of participants' faces, following studies that found larger movements of the right side of the mouth during speech production (Nicholls and Searle, 2006), and more emotional expression on the left side of the face (Nicholls et al., 2004). Since participants were asked to use a mouse to provide answers, the forearm sensor was positioned on the non-dominant forearm (that participants did not use to provide the answer).

    View all citing articles on Scopus
    View full text