Abstract
Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there are a number of different approaches for this modeling, all can be related to the basic linear model of speech production, where an excitation signal drives a vocal-tract filter.
The basic properties of the speech signal and of human speech perception can explain the principles of parametric speech coding as applied in early vocoders. Current speech modeling approaches, such as mixed excitation linear prediction, sinusoidal coding, and waveform interpolation, use more-sophisticated versions of these same concepts. Modern techniques for encoding the model parameters, in particular using the theory of vector quantization, allow the encoding of the model information with very few bits per speech frame.
Successful standardization of low-rate coders has enabled their widespread use for both military and satellite communications, at rates from 4 kb/s all the way down to 600 b/s. However, the goal of toll-quality low-rate coding continues to provide a research challenge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ACeS:
-
Asia Cellular Satellite
- AMSC-TMI:
-
American Mobile Satellite Corporation Telesat Mobile Incorporated
- APCO:
-
Association of Public-Safety Communications Officials
- CELP:
-
code-excited linear prediction
- DFT:
-
discrete Fourier transform
- DoD:
-
Department of Defense
- FIR:
-
finite impulse response
- ITU:
-
International Telecommunication Union
- LPC:
-
linear prediction coefficients
- LPC:
-
linear predictive coding
- LSF:
-
line spectral frequency
- MBE:
-
multiband excited
- MELP:
-
mixed excitation linear prediction
- MSVQ:
-
multistage VQ
- NATO:
-
North Atlantic Treaty Organization
- RCELP:
-
relaxed CELP
- REW:
-
rapidly evolving waveform
- RMS:
-
root mean square
- SD:
-
spectral distortion
- SEW:
-
slowly evolving waveform
- STC:
-
sinusoidal transform coder
- TDMA:
-
time-division multiple-access
- VQ:
-
vector quantization
- VSELP:
-
vector sum excited linear prediction
- WI:
-
waveform interpolation
References
M.R. Schroeder, B.S. Atal: Code excited linear prediction (CELP): High quality speech at very low bit rates, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (Tampa 1985) pp. 937-940
L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice Hall, Englewood Cliffs 1978)
G. Fant: Acoustic Theory of Speech Production (Mouton, The Hague 1960)
L.E. Kinsler: Fundamentals of Acoustics, 3rd edn. (Wiley, New York 1982)
B. Scharf: Critical bands. In: Foundations of Modern Auditory Theory, ed. by J.V. Tobias (Academic, New York 1970), Chap. 5
D.O. Kim, W.S. Rhode, S.R. Greenberg: Responses of cochlear nucleus neurons to speech signals: Neural encoding of pitch, intensity, and other parameters. In: Auditory Frequency Selectivity, ed. by B.C.J. Moore, R.D. Patterson (Plenum, New York 1986) pp. 281-288
H. Dudley: Remaking speech, J. Acoust. Soc. Am. 11, 169-177 (1939)
J.N. Holmes: The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer, IEEE Trans. Audio Electroacoust. 21, 298-305 (1973)
D.H. Klatt: Review of text-to-speech conversion for english, J. Acoust. Soc. Am. 82, 737-793 (1987)
F. Itakura, S. Saito: Analysis synthesis telephony based on the maximum likelihood method, Rep. 6th Int. Congr. Acoustics (1968) pp. C17-C20
B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50(2), 637-655 (1971)
O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Audio Electroacoust. 16, 68-72 (1968)
J. Makhoul, R. Viswanathan, R. Schwartz, A.W.F. Huggins: A mixed-source model for speech compression and synthesis, J. Acoust. Soc. Am. 64(6), 1577-1581 (1978)
S.Y. Kwon, A.J. Goldberg: An enhanced LPC vocoder with no voiced/unvoiced switch, IEEE Trans. Acoust. Speech Signal Process. 32, 851-858 (1984)
G.S. Kang, S.S. Everett: Improvement of the excitation source in the narrow-band linear prediction vocoder, IEEE Trans. Acoust. Speech Signal Process. 33, 377-386 (1985)
M.R. Sambur, A.E. Rosenberg, L.R. Rabiner, C.A. McGonegal: On reducing the buzz in LPC synthesis, J. Acoust. Soc. Am. 63, 918-924 (1978)
D.Y. Wong: On understanding the quality problems of LPC speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1980) pp. 725-728
B.S. Atal, N. David: On synthesizing natural-sounding speech by linear prediction, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1979) pp. 44-47
A. McCree, T.P. Barnwell III: A mixed excitation LPC vocoder model for low bit rate speech coding, IEEE Trans. Speech Audio Process. 3(4), 242-250 (1995)
A. McCree, T.P. Barnwell III: Improving the performance of a mixed excitation LPC vocoder in acoustic noise, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1992) pp. II137-II140
W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)
A. McCree, T.P. Barnwell III: A new mixed excitation LPC vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1991) pp. 593-596
D.L. Thomson, D.P. Prezas: Selective modeling of the LPC residual during unvoiced frames: White noise or pulse excitation, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1986) pp. 3087-3090
J.H. Chen, A. Gersho: Real-time vector APC speech coding at 4800 bps with adaptive postfiltering, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1987) pp. 2185-2188
W.B. Kleijn, D.J. Krasinski, R.H. Ketchum: Fast methods for the CELP speech coding algorithm, IEEE Trans. Acoust. Speech Signal Process. 38(8), 1330-1342 (1990)
J.N. Holmes: Formant excitation before and after glottal closure, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1976) pp. 39-42
A.E. Rosenberg: Effect of glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am. 49, 583-590 (1971)
A. McCree, J.C. DeMartin: A 1.7 kb/s MELP coder with improved analysis and quantization, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1998) pp. 593-596
T. Unno, T.P. Barnwell III, K. Truong: An improved mixed excitation linear prediction (MELP) coder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1999) pp. 245-248
W. Lin, S.N. Koh, X. Lin: Mixed excitation linear prediction coding of wideband speech at 8 kbps, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (2000) pp. II1137-II1140
N.R. Chong-White, R.V. Cox: An intelligibility enhancement for the mixed excitation linear prediction speech coder, IEEE Signal Process. Lett. 10(9), 263-266 (2003)
A.E. Ertan, T.P. Barnwell III: Improving the 2.4 kb/s military standard MELP (MS-MELP) coder using pitch-synchronous analysis and synthesis techniques, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (2005) pp. 761-764
R.J. McAulay, T.F. Quatieri: Sinusoidal coding. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995), Chap. 4
T.F. Quatieri: Discrete Time Speech Signal Processing: Principles and Practice (Prentice Hall, Englewood Cliffs 2002), Chap. 9
P. Hedelin: A tone-oriented voice-excited vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1981) pp. 205-208
L.B. Almeida, F.M. Silva: Variable-frequency synthesis: An improved harmonic coding scheme, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1984), Sects. 27.5.1-27.5.4.
R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. ASSP-34(4), 744-754 (1986)
D.W. Griffin, J.S. Lim: Multiband excitation vocoder, IEEE Trans. Acoust. Speech Signal Process. 36(8), 1223-1235 (1988)
E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5(5), 389-406 (1997)
C. Li, V. Cuperman: Analysis-by-synthesis multimode harmonic speech coding at 4 kb/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 3 (2000) pp. 1367-1370
C.O. Etemoglu, V. Cuperman, A. Gersho: Speech coding with an analysis-by-synthesis sinusoidal model, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 3 (2000) pp. 1371-1374
M.S. Brandstein: A 1.5 kbps multi-band excitation speech coder, M.S. Thesis (Massachusetts Institute of Technology, Cambridge 1990)
R. McAulay, T. Parks, T. Quatieri, M. Sabin: Sine-wave amplitude coding at low data rates. In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Boston 1991) pp. 203-214
S. Yeldener, A.M. Kondoz, B.G. Evans: High quality multiband LPC coding of speech at 2.4 kbit/s, Electron. Lett. 27(14), 1287-1289 (1991)
M. Nishiguchi, J. Matsumoto, R. Wakatsuki, S. Ono: Vector quantized MBE with simplified V/UV division at 3.0 kbit/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1993) pp. 151-154
A. Das, A.V. Rao, A. Gersho: Variable-dimension vector quantization, IEEE Signal Process. Lett. 3(7), 200-202 (1996)
P. Lupini, V. Cuperman: Nonsquare transform vector quantization, IEEE Signal Process. Lett. 3(1), 1-3 (1996)
W.B. Kleijn, J. Haagen: Waveform interpolation for coding and synthesis. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995), Chap. 5
W.B. Kleijn: Encoding speech using prototype waveforms, IEEE Trans. Speech Audio Process. 1(4), 386-399 (1993)
W.B. Kleijn, J. Haagen: Transformation and decomposition of the speech signal for coding, IEEE Signal Process. Lett. 1, 136-138 (1994)
T. Eriksson, W.B. Kleijn: On waveform-interpolation coding with asymptotically perfect reconstruction, Proc. IEEE Workshop on Speech Coding (1999) pp. 93-95
N.R. Chong, I.S. Burnett, J.F. Chicharo: A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition, IEEE Trans. Speech Audio Process. 8(3), 345-348 (2000)
A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer, Dordrecht 1992)
H. Dudley: Phonetic pattern recognition vocoder for narrow-band speech transmission, J. Acoust. Soc. Am. 30, 733-739 (1958)
C.E. Shannon: A mathematical theory of communication, Bell Syst. Tech. J. 27, 379-423,623-656 (1948)
J. Picone, G. Doddington: A phonetic vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1989) pp. 580-583
J. Makhoul: Linear prediction: A tutorial review, IEEE Proc. 63, 561-579 (1975)
F. Itakura: Line spectrum representation of linear predictive coefficients of speech signals, J. Acoust. Soc. Am. 57, S35(A) (1975)
K.K. Paliwal, B.S. Atal: Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1(1), 3-14 (1993)
J.S. Collura, A. McCree, T.E. Tremain: Perceptually based distortion measures for spectrum quantization, Proc. IEEE Workshop on Speech Coding for Telecommunications (1995) pp. 49-50
W.P. LeBlanc, B. Bhattacharya, S.A. Mahmoud, V. Cuperman: Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding, IEEE Trans. Speech Audio Process. 1(4), 373-385 (1993)
W. Gardner, B. Rao: Theoretical analysis of the high-rate vector quantization of LPC parameters, IEEE Trans. Speech Audio Process. 3, 367-381 (1995)
A. McCree, J.C. DeMartin: A 1.6 kb/s MELP coder for wireless communications, Proc. IEEE Workshop on Speech Coding for Telecommunications (1997) pp. 23-24
T.E. Tremain: The government standard linear predictive coding algorithm: LPC-10, Speech Technol. 1, 40-49 (1982)
T.E. Tremain, M.A. Kohler, T.G. Champion: Philosophy and goals of the DoD 2400 bps vocoder selection process, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1996) pp. 1137-1140
A. McCree, K. Truong, E.B. George, T.P. Barnwell III, V.R. Viswanathan: A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, Vol. 1 (1996) pp. 200-203
L.M. Supplee, R.P. Cohn, J.S. Collura, A. McCree: MELP: the new Federal Standard at 2400 bps, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1997) pp. 1591-1594
M.A. Kohler: A comparison of the new 2400 bps MELP federal standard with other standard coders, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1997) pp. 1587-1590
J.P. Campbell Jr., T.E. Tremain, V.C. Welch: The DoD 4.8 kbps Standard (Proposed Federal Standard (1016)). In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Boston 1991) pp. 121-133
S. Villette, K.T. Al Naimi, C. Sturt, A.M. Kondoz, H. Palaz: A 2.4/1.2 kbps SB-LPC based speech coder: the Turkish NATO STANAG candidate, Proc. IEEE Workshop on Speech Coding (2002) pp. 87-89
G. Guilmin, P. Gournay, F. Chartier: Description of the French NATO candidate, Proc. IEEE Workshop on Speech Coding (2002) pp. 84-86
T. Wang, K. Koishida, V. Cuperman, A. Gersho, J.S. Collura: A 1200/2400 bps coding suite based on MELP, Proc. IEEE Workshop on Speech Coding (2002) pp. 90-92
J.S. Collura, D.F. Brandt, D.J. Rahikka: The 1.2 kbps/2.4 kbps MELP speech coding suite with integrated noise pre-processing, IEEE Mil. Commun. Conf. Proc., Vol. 2 (1999) pp. 1449-1453
R. Martin, R.V. Cox: New speech enhancement techniques for low bit rate speech coding, Proc. IEEE Workshop on Speech Coding (1999) pp. 165-167
T. Wang, K. Koishida, V. Cuperman, A. Gersho, J.S. Collura: A 1200 bps speech coder based on MELP, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2000) pp. 1375-1378
G. Guilmin, F. Capman, B. Ravera, F. Chartier: New NATO STANAG narrow band voice coder at 600 bits/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2006) pp. 689-693
J.V. Evans: Satellite systems for personal communications, Proc. IEEE, Vol. 86 (1998) pp. 1325-1341
J.C. Hardwick, J.S. Lim: The application of the IMBE speech coder to mobile communications, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1991) pp. 249-252
S.F.C. Neto, F.L. Corcoran, J. Phipps, S. Dimolitsas: Performance assessment of 4.8 kbit/s AMBE coding under aeronautical environmental conditions, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1996) pp. 499-502
I.A. Gerson, M.A. Jasiuk: VSELP). In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer, Norwell 1991) pp. 69-79
S. Dimolitsas, C. Ravishankar, G. Schroder: Current objectives in 4-kb/s wireline-quality speech coding standardization, IEEE Signal Process. Lett. 1(11), 157-159 (1994)
E.L.T. Choy: Waveform interpolation speech coder at 4 kb/s, M.S. Thesis (McGill University, Montreal 1998)
O. Gottesman, A. Gersho: Enhanced waveform interpolative coding at low bit-rate, IEEE Trans. Speech Audio Process. 9(8), 786-798 (2001)
J. Stachurski, A. McCree, V. Viswanathan: High quality MELP coding at bit-rates around 4 kb/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1999) pp. 485-488
S. Yeldener: A 4 kb/s toll quality harmonic excitation linear predictive speech coder, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1999) pp. 481-484
W.B. Kleijn, R.P. Ramachandran, P. Kroon: Generalized analysis-by-synthesis coding and its application to pitch rediction, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1992) pp. 337-340
J. Thyssen, Y. Gao, A. Benyassine, E. Shlomot, C. Murgia, H. Su, K. Mano, Y. Hiwasaki, H. Ehara, K. Yasunaga, C. Lamblin, B. Kovesi, J. Stegmann, H. Kang: A candidate for the ITU-T 4 kbit/s speech coding standard, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (2001) pp. 681-684
I.M. Trancoso, L. Almeida, J.M. Tribolet: A study on the relationships between stochastic and harmonic coding, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1986) pp. 1709-1712
E. Shlomot, V. Cuperman, A. Gersho: Combined harmonic and waveform coding of speech at low bit rates, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1998) pp. 585-588
A. McCree, J. Stachurski, T. Unno, E. Ertan, E. Paksoy, V. Viswanathan, A. Heikkinen, A. Ramo, S. Himanen, P. Blocher, O. Dressler: A 4 kb/s hybrid MELP/CELP speech coding candidate for ITU standardization, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2002) pp. 629-632
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
McCree, A.V. (2008). Low-Bit-Rate Speech Coding. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)