Elsevier

Auris Nasus Larynx

Volume 36, Issue 5, October 2009, Pages 571-577
Auris Nasus Larynx

Long-term average spectral characteristics of Cantonese alaryngeal speech

https://doi.org/10.1016/j.anl.2008.12.005Get rights and content

Abstract

Objective

In Hong Kong, esophageal (SE), tracheoesophageal (TE), electrolaryngeal (EL), and pneumatic artificial laryngeal (PA) speech are commonly used by laryngectomees as a means to regain verbal communication after total laryngectomy. While SE and TE speech has been studied to some extent, little is known regarding the EL and PA sound quality. The present study examined the sound quality associated with SE, TE, EL, and PA speech, and compared with that associated with laryngeal (NL) speech by using long-term average speech spectra (LTAS).

Methods

Continuous speech samples of reading a 136-word passage were obtained from NL, SE, TE, EL, and PA speakers of Cantonese. The alaryngeal speakers were all superior speakers selected from the New Voice Club of Hong Kong, which is a self-help organization for the laryngectomees in Hong Kong. TE speakers were fitted with Provox valve, and EL speakers used Servox-type electrolarynx. Speech samples were digitized at 20 kHz and 16 bits/sample by using Praat, based on which LTAS contours were developed. First spectral peak (FSP), mean spectral energy (MSE), and spectral tilt (ST) derived from the LTAS contours associated with different speaker groups were compared.

Results

Data revealed all speakers generally exhibited similar LTA contours. However, PA speakers exhibited the lowest average FSP value and the greatest average MSE value. NL phonation was associated with a significantly greater ST value than alaryngeal speech of Cantonese.

Conclusion

The differences in FSP, MSE, and ST values in different speaker groups may be related to the different sound sources being used by the laryngectomees, and the difference in the way the sound source is coupled with the vocal tract system.

Introduction

With the larynx severed, laryngectomized patients need to adopt an alternative phonation method after total laryngectomy. To date, esophageal (SE), tracheoesophageal (TE), and electrolaryngeal (EL) speech, and use of a pneumatic artificial larynx (PA) are the most common alaryngeal phonation methods used in Hong Kong [1], [2]. All of these alaryngeal phonation methods differ in how sound is created: both SE and TE speakers make use of the pharyngoesophageal (PE) segment as the new sound source (i.e., neoglottis) [3], [4], [5], [6], while EL and PA speakers rely on an external device for sound generation [2].

The use of an alternative voicing method by laryngectomees inevitably changes the voice quality. A large number of studies have investigated the different sound characteristics associated with SE, TE, and EL phonation [2], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. Yet, few studies focused on the sound characteristics of PA phonation, which was likely a result of its limited use in the North America and Europe [2], [22], [23]. These studies sought to describe the sound characteristics of different alaryngeal phonation through the examination of acoustic and aerodynamic parameters. Among the most frequently examined are the average fundamental frequency (F0), F0 range and perturbation, phonation intensity, vowel duration, and voice onset time. However, studying the voice source of laryngeal and alaryngeal speech has not been easy as it is always “contaminated” by the effect of vocal tract filter. According to the source-filter theory, voices of laryngeal (NL) speakers are products of the laryngeal and supralaryngeal resonance systems, both of which are independent of each other [24]. Accordingly, alaryngeal phonation is the product of the neoglottal sound source and the supra-neoglottal resonance system, determined by the vocal tract configuration. Despite the wide range of acoustic and aerodynamic studies of alaryngeal phonation, few studies critically examined the vibratory behavior of the voicing source of different alaryngeal phonation, despite the fact that such knowledge is of paramount importance in restoring post-laryngectomy verbal communication. To study the sound source of different alaryngeal phonation, the effect of vocal tract filter needs to be eliminated. This can be done by inverse-filtering the aerodynamic signals, or by means of long-term average spectral analysis of acoustic speech signals.

Analysis of long-term average spectra (LTAS) of speech offers a unique and reliable approach for estimating the vibratory characteristics of the alaryngeal sound source. Calculation of the LTAS involves excluding pauses and voiceless portions from a continuous sample of phonatory behavior and acoustically examining the remaining signal as discrete spectra derived from consecutive temporal intervals of phonatory activity [25]. By averaging these individual spectra, the LTAS levels out the short-time variations present in the human voice due to the filtering properties of the vocal tract [26]. Three common features extracted from the LTAS are first spectral peak (FSP), mean spectral energy (MSE), and spectral tilt (ST). The FSP is the frequency value associated with the first amplitude peak across the LTAS display. The FSP is assumed to provide a representation of the average F0 across a phonatory sample [27]. The MSE is the average amplitude value across the frequency range of 0–8000 Hz. Physiologically, MSE is thought to represent the constant properties of the vocal source, as the LTAS averaging process eliminates any dynamic features induced by articulatory movement during vocalization [25]. The ST is a representation of how quickly the amplitudes of the harmonics decline, with a low ST corresponding to a hyperadductional phonatory state [27].

A number of previous studies have used LTAS to examine normal and disordered voice characteristics [27], [28], [29], [30], [31], [32], [33], [34], [35], [36]. Despite the many LTAS studies examining voice production in various speaker groups, application of LTAS to examine alaryngeal phonation has been rare. Globlek et al. [37] reported that the LTAS spectral timbres of SE and TE voices were similar. However, quantitative information concerning the specific spectral features of SE and TE phonation were not reported. Weinberg et al. [38] examined LTAS in SE speakers and found a considerably lower average spectral level (7–10 dB) compared to NL phonation. Weiss et al. [39] observed that the LTAS of EL phonation decreased in amplitude at approximately 500 Hz, while the LTAS of NL phonation began to decrease in amplitude at approximately 200 Hz. In addition, EL phonation in the frequency region of 2–4 kHz remained 5–10 dB higher compared to NL phonation.

Based on the above discussion, several drawbacks arise. (1) Results of past LTAS studies examining alaryngeal phonation are inconclusive and fail to provide a detailed depict of the underlying source characteristics of different alaryngeal speech. Past studies failed to quantify LTAS spectra of alaryngeal speech by using parameters such as FSP, MSE, ST of alaryngeal speech. Information drawn from these studies has been based on isolated and peculiar parameters derived from the source spectrum. (2) Perhaps due to the scarcity of PA speakers, comprehensive LTAS study of all kinds of alaryngeal speech in comparison with NL phonation is lacking. Such data would provide valuable clinical information concerning the similarities and differences across the various alaryngeal phonation methods, as well as NL speech. (3) Previous LTAS research only focused on English speakers. Information on how different alaryngeal speech of a tone language is not available. By examining the performance of alaryngeal speakers of a tone language, additional knowledge of alaryngeal speech characteristics with regard to the control of tonal variation will be obtained. Due to the relative inability in pitch manipulation, alaryngeal speakers of a tone language, especially those using EL and PA speech, may find it more problematic as pitch variation in EL and PA phonation is reportedly lacking.

In response to the drawbacks from previous studies, the present study examined alaryngeal speakers of Cantonese. In a tonal language such as Cantonese, tone (i.e., F0 regulation) is primarily used to signal word-type [40]. The general research question for the present study was: Do LTAS measures (FSP, MSE, ST) differ significantly among NL, SE, TE, EL and PA modes of phonation? The LTAS spectra associated with NL, SE, TE, EL, and PA speech of Cantonese were examined, based on which LTAS parameters including FSP, MSE and ST were derived and compared.

Section snippets

Participants

Sixty-three adult male native speakers of Cantonese consisting of 10 laryngeal and 53 alaryngeal speakers participated in the present study. The alaryngeal participants consisted of 15 SE, 12 TE, 15 EL, and 11 PA speakers. The TE speakers were all using the Provox-type valve, EL speakers using Servox electrolarynx, and PA speakers using a custom-made pneumatic device. The average age of each speaker group was 63 years, with participants ranging in age from 48 to 80 years. None of the

Results

The overall average LTAS spectra associated with the NL, SE, TE, EL, and PA speaker groups are shown collectively in Fig. 1. The FSP, MSE, and ST values were calculated from the LTAS spectra which are represented in Table 1.

Discussion

In the study, LTAS spectra were used to average out the short-term dynamic features caused by articulatory movements, and a depict of sound source characteristics is obtained [25], [26]. As can be seen in Fig. 1, NL, SE, TE, EL, and PA speech demonstrated similar patterns of LTAS contours; all LTAS contours are downward sloping, with higher amplitude (energy) at lower frequency, and diminishing amplitudes as frequency increases. This amplitude attenuation with frequency was found in the LTAS

Conclusion

A comparison among the average LTAS contours associated with NL, SE, TE, EL, and PA speech of Cantonese reveals that the average LTAS contour of NL speech showed a steeper attenuation rate in amplitude (energy) with frequency when compared with the other speaker groups. While average FSP values of NL, SE and TE speech were comparable, the FSP value of PA was only about one half of that of NL speech. The low FSP value in PA speech is likely to be due to the different rate of vibration of the

References (47)

  • J.B. Moon et al.

    Aerodynamic and myoelastic contributions to tracheoesophageal voice production

    J Speech Hear Res

    (1987)
  • Ng M, Wong J. Voice onset time characteristics of esophageal, tracheoesophageal, and laryngeal speech of Cantonese. J...
  • J. Gandour et al.

    Perception of intonational contrasts in alaryngeal speech

    J Speech Hear Res

    (1983)
  • J. Gandour et al.

    Perception in lexical stress in alaryngeal speech

    J Speech Lang Hear Res

    (1983)
  • J. Gandour et al.

    Vowel length in Thai alaryngeal speech

    Folia Phoniatr Logop

    (1987)
  • J. Gandour et al.

    Voice onset time in Thai alaryngeal speech

    J Speech Hear Disord

    (1987)
  • J. Gandour et al.

    Tone in Thai alaryngeal speech

    J Speech Lang Hear Res

    (1988)
  • S.C. Holly et al.

    A comparison of the intelligibility of esophageal, electrolarynx, and normal speech in quiet and in noise

    J Commun Disord

    (1983)
  • H. Liu et al.

    Acoustic characteristics of Mandarin esophageal speech

    J Acoust Soc Am

    (2005)
  • Liu H, Ng M, Wan M, Wang S. The effect of tonal changes on voice onset time in Mandarin esophageal speech. J Voice; in...
  • H. Liu et al.

    Effects of place of articulation and aspiration on voice onset time in Mandarin esophageal speech

    Folia Phoniatr Logop

    (2007)
  • Ng M, Chu R. An acoustical and perceptual study of vowels produced by alaryngeal speakers of Cantonese. Folia Phoniatr...
  • M. Ng et al.

    Fundamental frequency, intensity, and vowel duration characteristics related to perception of Cantonese alaryngeal speech

    Folia Phoniatr Logop

    (2001)
  • Cited by (14)

    • The use of the Lombard Effect in Improving Alaryngeal Speech

      2021, Journal of Voice
      Citation Excerpt :

      Not unexpectedly, NL speech was found to show a greater intelligibility than alaryngeal speech, regardless of the background conditions (see Figure 3). According to previous reports, alaryngeal speech is associated with diminished intelligibility due to a worsened voice quality and a lack of variability in pitch and intonation (eg, ref2,3,11,12,15,56,57). The lack of pitch variation is particularly detrimental to a tonal language such as Cantonese.10,58

    • Differences in vocal characteristics between Cantonese and English produced by proficient Cantonese-English bilingual speakers - A long-term average spectral analysis

      2012, Journal of Voice
      Citation Excerpt :

      With the use of long-term average spectra of speech (LTAS), the effects of vocal tract on speech can be eliminated, and the vocal fold vibratory behavior can easily be examined. The technique of LTAS has been widely used in a number of studies such as evaluating treatment efficacy of voice therapy,9 classifying voice qualities,10 examining gender related differences on vocal characteristics,11 quantitatively studying different alaryngeal sound quality,12,13 and others. By averaging the short-term segmental fluctuations, LTAS analysis enables us to examine the vibratory characteristics of laryngeal source in an effective and reliable way.

    View all citing articles on Scopus
    View full text