Long-term average spectral characteristics of Cantonese alaryngeal speech
Introduction
With the larynx severed, laryngectomized patients need to adopt an alternative phonation method after total laryngectomy. To date, esophageal (SE), tracheoesophageal (TE), and electrolaryngeal (EL) speech, and use of a pneumatic artificial larynx (PA) are the most common alaryngeal phonation methods used in Hong Kong [1], [2]. All of these alaryngeal phonation methods differ in how sound is created: both SE and TE speakers make use of the pharyngoesophageal (PE) segment as the new sound source (i.e., neoglottis) [3], [4], [5], [6], while EL and PA speakers rely on an external device for sound generation [2].
The use of an alternative voicing method by laryngectomees inevitably changes the voice quality. A large number of studies have investigated the different sound characteristics associated with SE, TE, and EL phonation [2], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. Yet, few studies focused on the sound characteristics of PA phonation, which was likely a result of its limited use in the North America and Europe [2], [22], [23]. These studies sought to describe the sound characteristics of different alaryngeal phonation through the examination of acoustic and aerodynamic parameters. Among the most frequently examined are the average fundamental frequency (F0), F0 range and perturbation, phonation intensity, vowel duration, and voice onset time. However, studying the voice source of laryngeal and alaryngeal speech has not been easy as it is always “contaminated” by the effect of vocal tract filter. According to the source-filter theory, voices of laryngeal (NL) speakers are products of the laryngeal and supralaryngeal resonance systems, both of which are independent of each other [24]. Accordingly, alaryngeal phonation is the product of the neoglottal sound source and the supra-neoglottal resonance system, determined by the vocal tract configuration. Despite the wide range of acoustic and aerodynamic studies of alaryngeal phonation, few studies critically examined the vibratory behavior of the voicing source of different alaryngeal phonation, despite the fact that such knowledge is of paramount importance in restoring post-laryngectomy verbal communication. To study the sound source of different alaryngeal phonation, the effect of vocal tract filter needs to be eliminated. This can be done by inverse-filtering the aerodynamic signals, or by means of long-term average spectral analysis of acoustic speech signals.
Analysis of long-term average spectra (LTAS) of speech offers a unique and reliable approach for estimating the vibratory characteristics of the alaryngeal sound source. Calculation of the LTAS involves excluding pauses and voiceless portions from a continuous sample of phonatory behavior and acoustically examining the remaining signal as discrete spectra derived from consecutive temporal intervals of phonatory activity [25]. By averaging these individual spectra, the LTAS levels out the short-time variations present in the human voice due to the filtering properties of the vocal tract [26]. Three common features extracted from the LTAS are first spectral peak (FSP), mean spectral energy (MSE), and spectral tilt (ST). The FSP is the frequency value associated with the first amplitude peak across the LTAS display. The FSP is assumed to provide a representation of the average F0 across a phonatory sample [27]. The MSE is the average amplitude value across the frequency range of 0–8000 Hz. Physiologically, MSE is thought to represent the constant properties of the vocal source, as the LTAS averaging process eliminates any dynamic features induced by articulatory movement during vocalization [25]. The ST is a representation of how quickly the amplitudes of the harmonics decline, with a low ST corresponding to a hyperadductional phonatory state [27].
A number of previous studies have used LTAS to examine normal and disordered voice characteristics [27], [28], [29], [30], [31], [32], [33], [34], [35], [36]. Despite the many LTAS studies examining voice production in various speaker groups, application of LTAS to examine alaryngeal phonation has been rare. Globlek et al. [37] reported that the LTAS spectral timbres of SE and TE voices were similar. However, quantitative information concerning the specific spectral features of SE and TE phonation were not reported. Weinberg et al. [38] examined LTAS in SE speakers and found a considerably lower average spectral level (7–10 dB) compared to NL phonation. Weiss et al. [39] observed that the LTAS of EL phonation decreased in amplitude at approximately 500 Hz, while the LTAS of NL phonation began to decrease in amplitude at approximately 200 Hz. In addition, EL phonation in the frequency region of 2–4 kHz remained 5–10 dB higher compared to NL phonation.
Based on the above discussion, several drawbacks arise. (1) Results of past LTAS studies examining alaryngeal phonation are inconclusive and fail to provide a detailed depict of the underlying source characteristics of different alaryngeal speech. Past studies failed to quantify LTAS spectra of alaryngeal speech by using parameters such as FSP, MSE, ST of alaryngeal speech. Information drawn from these studies has been based on isolated and peculiar parameters derived from the source spectrum. (2) Perhaps due to the scarcity of PA speakers, comprehensive LTAS study of all kinds of alaryngeal speech in comparison with NL phonation is lacking. Such data would provide valuable clinical information concerning the similarities and differences across the various alaryngeal phonation methods, as well as NL speech. (3) Previous LTAS research only focused on English speakers. Information on how different alaryngeal speech of a tone language is not available. By examining the performance of alaryngeal speakers of a tone language, additional knowledge of alaryngeal speech characteristics with regard to the control of tonal variation will be obtained. Due to the relative inability in pitch manipulation, alaryngeal speakers of a tone language, especially those using EL and PA speech, may find it more problematic as pitch variation in EL and PA phonation is reportedly lacking.
In response to the drawbacks from previous studies, the present study examined alaryngeal speakers of Cantonese. In a tonal language such as Cantonese, tone (i.e., F0 regulation) is primarily used to signal word-type [40]. The general research question for the present study was: Do LTAS measures (FSP, MSE, ST) differ significantly among NL, SE, TE, EL and PA modes of phonation? The LTAS spectra associated with NL, SE, TE, EL, and PA speech of Cantonese were examined, based on which LTAS parameters including FSP, MSE and ST were derived and compared.
Section snippets
Participants
Sixty-three adult male native speakers of Cantonese consisting of 10 laryngeal and 53 alaryngeal speakers participated in the present study. The alaryngeal participants consisted of 15 SE, 12 TE, 15 EL, and 11 PA speakers. The TE speakers were all using the Provox-type valve, EL speakers using Servox electrolarynx, and PA speakers using a custom-made pneumatic device. The average age of each speaker group was 63 years, with participants ranging in age from 48 to 80 years. None of the
Results
The overall average LTAS spectra associated with the NL, SE, TE, EL, and PA speaker groups are shown collectively in Fig. 1. The FSP, MSE, and ST values were calculated from the LTAS spectra which are represented in Table 1.
Discussion
In the study, LTAS spectra were used to average out the short-term dynamic features caused by articulatory movements, and a depict of sound source characteristics is obtained [25], [26]. As can be seen in Fig. 1, NL, SE, TE, EL, and PA speech demonstrated similar patterns of LTAS contours; all LTAS contours are downward sloping, with higher amplitude (energy) at lower frequency, and diminishing amplitudes as frequency increases. This amplitude attenuation with frequency was found in the LTAS
Conclusion
A comparison among the average LTAS contours associated with NL, SE, TE, EL, and PA speech of Cantonese reveals that the average LTAS contour of NL speech showed a steeper attenuation rate in amplitude (energy) with frequency when compared with the other speaker groups. While average FSP values of NL, SE and TE speech were comparable, the FSP value of PA was only about one half of that of NL speech. The low FSP value in PA speech is likely to be due to the different rate of vibration of the
References (47)
- et al.
Speech performance of adult Cantonese-speaking laryngectomees using different types of alaryngeal phonation
J Voice
(1997) - et al.
Aerodynamic characteristics of laryngectomees breathing quietly and speaking with electrolarynx
J Voice
(2004) - et al.
Voice reconstruction using the free ileocolon flap versus the pneumatic artificial larynx: a comparison of patients’ preference and experience following laryngectomy
J Plast Reconstr Aesthet Surg
(2006) LTAS criteria pertinent to the measurement of voice quality
J Phonet
(1986)- et al.
Electrolarynx in voice rehabilitation
Auris Nasus Larynx
(2007) - et al.
Spectral energy distributions in four types of infant vocalizations
J Commun Disord
(1988) - et al.
Differences in voice quality between men and women: use of the long-term average spectrum (LTAS)
J Voice
(1996) - Law KYI. Communication activity and participation after laryngectomy. MS thesis. Hong Kong: University of Hong Kong;...
Some obstacles in learning esophageal speech
- et al.
Laryngectomy speech rehabilitation—a review of outcomes
Aerodynamic and myoelastic contributions to tracheoesophageal voice production
J Speech Hear Res
Perception of intonational contrasts in alaryngeal speech
J Speech Hear Res
Perception in lexical stress in alaryngeal speech
J Speech Lang Hear Res
Vowel length in Thai alaryngeal speech
Folia Phoniatr Logop
Voice onset time in Thai alaryngeal speech
J Speech Hear Disord
Tone in Thai alaryngeal speech
J Speech Lang Hear Res
A comparison of the intelligibility of esophageal, electrolarynx, and normal speech in quiet and in noise
J Commun Disord
Acoustic characteristics of Mandarin esophageal speech
J Acoust Soc Am
Effects of place of articulation and aspiration on voice onset time in Mandarin esophageal speech
Folia Phoniatr Logop
Fundamental frequency, intensity, and vowel duration characteristics related to perception of Cantonese alaryngeal speech
Folia Phoniatr Logop
Cited by (14)
The use of the Lombard Effect in Improving Alaryngeal Speech
2021, Journal of VoiceCitation Excerpt :Not unexpectedly, NL speech was found to show a greater intelligibility than alaryngeal speech, regardless of the background conditions (see Figure 3). According to previous reports, alaryngeal speech is associated with diminished intelligibility due to a worsened voice quality and a lack of variability in pitch and intonation (eg, ref2,3,11,12,15,56,57). The lack of pitch variation is particularly detrimental to a tonal language such as Cantonese.10,58
Differences in vocal characteristics between Cantonese and English produced by proficient Cantonese-English bilingual speakers - A long-term average spectral analysis
2012, Journal of VoiceCitation Excerpt :With the use of long-term average spectra of speech (LTAS), the effects of vocal tract on speech can be eliminated, and the vocal fold vibratory behavior can easily be examined. The technique of LTAS has been widely used in a number of studies such as evaluating treatment efficacy of voice therapy,9 classifying voice qualities,10 examining gender related differences on vocal characteristics,11 quantitatively studying different alaryngeal sound quality,12,13 and others. By averaging the short-term segmental fluctuations, LTAS analysis enables us to examine the vibratory characteristics of laryngeal source in an effective and reliable way.
An acoustic study of Cantonese alaryngeal speech in different speaking conditions
2023, Journal of the Acoustical Society of AmericaVocal-Feature-Based Classification of Post-Laryngectomy Patients for Rehabilitation Monitoring
2023, IEEE Transactions on Instrumentation and MeasurementEffect of Language on Voice Quality: An Acoustic Study of Bilingual Speakers of Mandarin Chinese and English
2022, Folia Phoniatrica et LogopaedicaChaos Behavior Analysis of Alaryngeal Voices Including Esophageal and Tracheoesophageal Voices
2022, Folia Phoniatrica et Logopaedica