Skip to main content

Part of the book series: Springer Handbooks ((SHB))

Abstract

Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there are a number of different approaches for this modeling, all can be related to the basic linear model of speech production, where an excitation signal drives a vocal-tract filter.

The basic properties of the speech signal and of human speech perception can explain the principles of parametric speech coding as applied in early vocoders. Current speech modeling approaches, such as mixed excitation linear prediction, sinusoidal coding, and waveform interpolation, use more-sophisticated versions of these same concepts. Modern techniques for encoding the model parameters, in particular using the theory of vector quantization, allow the encoding of the model information with very few bits per speech frame.

Successful standardization of low-rate coders has enabled their widespread use for both military and satellite communications, at rates from 4 kb/s all the way down to 600 b/s. However, the goal of toll-quality low-rate coding continues to provide a research challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACeS:

Asia Cellular Satellite

AMSC-TMI:

American Mobile Satellite Corporation Telesat Mobile Incorporated

APCO:

Association of Public-Safety Communications Officials

CELP:

code-excited linear prediction

DFT:

discrete Fourier transform

DoD:

Department of Defense

FIR:

finite impulse response

ITU:

International Telecommunication Union

LPC:

linear prediction coefficients

LPC:

linear predictive coding

LSF:

line spectral frequency

MBE:

multiband excited

MELP:

mixed excitation linear prediction

MSVQ:

multistage VQ

NATO:

North Atlantic Treaty Organization

RCELP:

relaxed CELP

REW:

rapidly evolving waveform

RMS:

root mean square

SD:

spectral distortion

SEW:

slowly evolving waveform

STC:

sinusoidal transform coder

TDMA:

time-division multiple-access

VQ:

vector quantization

VSELP:

vector sum excited linear prediction

WI:

waveform interpolation

References

  1. M.R. Schroeder, B.S. Atal: Code excited linear prediction (CELP): High quality speech at very low bit rates, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (Tampa 1985) pp. 937-940

    Google Scholar 

  2. L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice Hall, Englewood Cliffs 1978)

    Google Scholar 

  3. G. Fant: Acoustic Theory of Speech Production (Mouton, The Hague 1960)

    Google Scholar 

  4. L.E. Kinsler: Fundamentals of Acoustics, 3rd edn. (Wiley, New York 1982)

    MATH  Google Scholar 

  5. B. Scharf: Critical bands. In: Foundations of Modern Auditory Theory, ed. by J.V. Tobias (Academic, New York 1970), Chap. 5

    Google Scholar 

  6. D.O. Kim, W.S. Rhode, S.R. Greenberg: Responses of cochlear nucleus neurons to speech signals: Neural encoding of pitch, intensity, and other parameters. In: Auditory Frequency Selectivity, ed. by B.C.J. Moore, R.D. Patterson (Plenum, New York 1986) pp. 281-288

    Chapter  Google Scholar 

  7. H. Dudley: Remaking speech, J. Acoust. Soc. Am. 11, 169-177 (1939)

    Article  Google Scholar 

  8. J.N. Holmes: The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer, IEEE Trans. Audio Electroacoust. 21, 298-305 (1973)

    Article  Google Scholar 

  9. D.H. Klatt: Review of text-to-speech conversion for english, J. Acoust. Soc. Am. 82, 737-793 (1987)

    Article  Google Scholar 

  10. F. Itakura, S. Saito: Analysis synthesis telephony based on the maximum likelihood method, Rep. 6th Int. Congr. Acoustics (1968) pp. C17-C20

    Google Scholar 

  11. B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50(2), 637-655 (1971)

    Article  Google Scholar 

  12. O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Audio Electroacoust. 16, 68-72 (1968)

    Article  Google Scholar 

  13. J. Makhoul, R. Viswanathan, R. Schwartz, A.W.F. Huggins: A mixed-source model for speech compression and synthesis, J. Acoust. Soc. Am. 64(6), 1577-1581 (1978)

    Article  Google Scholar 

  14. S.Y. Kwon, A.J. Goldberg: An enhanced LPC vocoder with no voiced/unvoiced switch, IEEE Trans. Acoust. Speech Signal Process. 32, 851-858 (1984)

    Article  Google Scholar 

  15. G.S. Kang, S.S. Everett: Improvement of the excitation source in the narrow-band linear prediction vocoder, IEEE Trans. Acoust. Speech Signal Process. 33, 377-386 (1985)

    Article  Google Scholar 

  16. M.R. Sambur, A.E. Rosenberg, L.R. Rabiner, C.A. McGonegal: On reducing the buzz in LPC synthesis, J. Acoust. Soc. Am. 63, 918-924 (1978)

    Article  Google Scholar 

  17. D.Y. Wong: On understanding the quality problems of LPC speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1980) pp. 725-728

    Google Scholar 

  18. B.S. Atal, N. David: On synthesizing natural-sounding speech by linear prediction, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1979) pp. 44-47

    Google Scholar 

  19. A. McCree, T.P. Barnwell III: A mixed excitation LPC vocoder model for low bit rate speech coding, IEEE Trans. Speech Audio Process. 3(4), 242-250 (1995)

    Article  Google Scholar 

  20. A. McCree, T.P. Barnwell III: Improving the performance of a mixed excitation LPC vocoder in acoustic noise, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1992) pp. II137-II140

    Google Scholar 

  21. W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)

    Book  Google Scholar 

  22. A. McCree, T.P. Barnwell III: A new mixed excitation LPC vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1991) pp. 593-596

    Google Scholar 

  23. D.L. Thomson, D.P. Prezas: Selective modeling of the LPC residual during unvoiced frames: White noise or pulse excitation, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1986) pp. 3087-3090

    Google Scholar 

  24. J.H. Chen, A. Gersho: Real-time vector APC speech coding at 4800 bps with adaptive postfiltering, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1987) pp. 2185-2188

    Google Scholar 

  25. W.B. Kleijn, D.J. Krasinski, R.H. Ketchum: Fast methods for the CELP speech coding algorithm, IEEE Trans. Acoust. Speech Signal Process. 38(8), 1330-1342 (1990)

    Article  Google Scholar 

  26. J.N. Holmes: Formant excitation before and after glottal closure, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1976) pp. 39-42

    Google Scholar 

  27. A.E. Rosenberg: Effect of glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am. 49, 583-590 (1971)

    Article  Google Scholar 

  28. A. McCree, J.C. DeMartin: A 1.7 kb/s MELP coder with improved analysis and quantization, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1998) pp. 593-596

    Google Scholar 

  29. T. Unno, T.P. Barnwell III, K. Truong: An improved mixed excitation linear prediction (MELP) coder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1999) pp. 245-248

    Google Scholar 

  30. W. Lin, S.N. Koh, X. Lin: Mixed excitation linear prediction coding of wideband speech at 8 kbps, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (2000) pp. II1137-II1140

    Google Scholar 

  31. N.R. Chong-White, R.V. Cox: An intelligibility enhancement for the mixed excitation linear prediction speech coder, IEEE Signal Process. Lett. 10(9), 263-266 (2003)

    Article  Google Scholar 

  32. A.E. Ertan, T.P. Barnwell III: Improving the 2.4 kb/s military standard MELP (MS-MELP) coder using pitch-synchronous analysis and synthesis techniques, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (2005) pp. 761-764

    Google Scholar 

  33. R.J. McAulay, T.F. Quatieri: Sinusoidal coding. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995), Chap. 4

    Google Scholar 

  34. T.F. Quatieri: Discrete Time Speech Signal Processing: Principles and Practice (Prentice Hall, Englewood Cliffs 2002), Chap. 9

    Google Scholar 

  35. P. Hedelin: A tone-oriented voice-excited vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1981) pp. 205-208

    Google Scholar 

  36. L.B. Almeida, F.M. Silva: Variable-frequency synthesis: An improved harmonic coding scheme, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1984), Sects. 27.5.1-27.5.4.

    Google Scholar 

  37. R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. ASSP-34(4), 744-754 (1986)

    Article  Google Scholar 

  38. D.W. Griffin, J.S. Lim: Multiband excitation vocoder, IEEE Trans. Acoust. Speech Signal Process. 36(8), 1223-1235 (1988)

    Article  MATH  Google Scholar 

  39. E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5(5), 389-406 (1997)

    Article  Google Scholar 

  40. C. Li, V. Cuperman: Analysis-by-synthesis multimode harmonic speech coding at 4 kb/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 3 (2000) pp. 1367-1370

    Google Scholar 

  41. C.O. Etemoglu, V. Cuperman, A. Gersho: Speech coding with an analysis-by-synthesis sinusoidal model, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 3 (2000) pp. 1371-1374

    Google Scholar 

  42. M.S. Brandstein: A 1.5 kbps multi-band excitation speech coder, M.S. Thesis (Massachusetts Institute of Technology, Cambridge 1990)

    Google Scholar 

  43. R. McAulay, T. Parks, T. Quatieri, M. Sabin: Sine-wave amplitude coding at low data rates. In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Boston 1991) pp. 203-214

    Chapter  Google Scholar 

  44. S. Yeldener, A.M. Kondoz, B.G. Evans: High quality multiband LPC coding of speech at 2.4 kbit/s, Electron. Lett. 27(14), 1287-1289 (1991)

    Article  Google Scholar 

  45. M. Nishiguchi, J. Matsumoto, R. Wakatsuki, S. Ono: Vector quantized MBE with simplified V/UV division at 3.0 kbit/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1993) pp. 151-154

    Google Scholar 

  46. A. Das, A.V. Rao, A. Gersho: Variable-dimension vector quantization, IEEE Signal Process. Lett. 3(7), 200-202 (1996)

    Article  Google Scholar 

  47. P. Lupini, V. Cuperman: Nonsquare transform vector quantization, IEEE Signal Process. Lett. 3(1), 1-3 (1996)

    Article  Google Scholar 

  48. W.B. Kleijn, J. Haagen: Waveform interpolation for coding and synthesis. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995), Chap. 5

    Google Scholar 

  49. W.B. Kleijn: Encoding speech using prototype waveforms, IEEE Trans. Speech Audio Process. 1(4), 386-399 (1993)

    Article  Google Scholar 

  50. W.B. Kleijn, J. Haagen: Transformation and decomposition of the speech signal for coding, IEEE Signal Process. Lett. 1, 136-138 (1994)

    Article  Google Scholar 

  51. T. Eriksson, W.B. Kleijn: On waveform-interpolation coding with asymptotically perfect reconstruction, Proc. IEEE Workshop on Speech Coding (1999) pp. 93-95

    Google Scholar 

  52. N.R. Chong, I.S. Burnett, J.F. Chicharo: A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition, IEEE Trans. Speech Audio Process. 8(3), 345-348 (2000)

    Article  Google Scholar 

  53. A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer, Dordrecht 1992)

    Book  MATH  Google Scholar 

  54. H. Dudley: Phonetic pattern recognition vocoder for narrow-band speech transmission, J. Acoust. Soc. Am. 30, 733-739 (1958)

    Article  Google Scholar 

  55. C.E. Shannon: A mathematical theory of communication, Bell Syst. Tech. J. 27, 379-423,623-656 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  56. J. Picone, G. Doddington: A phonetic vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1989) pp. 580-583

    Google Scholar 

  57. J. Makhoul: Linear prediction: A tutorial review, IEEE Proc. 63, 561-579 (1975)

    Article  Google Scholar 

  58. F. Itakura: Line spectrum representation of linear predictive coefficients of speech signals, J. Acoust. Soc. Am. 57, S35(A) (1975)

    Article  Google Scholar 

  59. K.K. Paliwal, B.S. Atal: Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1(1), 3-14 (1993)

    Article  Google Scholar 

  60. J.S. Collura, A. McCree, T.E. Tremain: Perceptually based distortion measures for spectrum quantization, Proc. IEEE Workshop on Speech Coding for Telecommunications (1995) pp. 49-50

    Google Scholar 

  61. W.P. LeBlanc, B. Bhattacharya, S.A. Mahmoud, V. Cuperman: Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding, IEEE Trans. Speech Audio Process. 1(4), 373-385 (1993)

    Article  Google Scholar 

  62. W. Gardner, B. Rao: Theoretical analysis of the high-rate vector quantization of LPC parameters, IEEE Trans. Speech Audio Process. 3, 367-381 (1995)

    Article  Google Scholar 

  63. A. McCree, J.C. DeMartin: A 1.6 kb/s MELP coder for wireless communications, Proc. IEEE Workshop on Speech Coding for Telecommunications (1997) pp. 23-24

    Google Scholar 

  64. T.E. Tremain: The government standard linear predictive coding algorithm: LPC-10, Speech Technol. 1, 40-49 (1982)

    Google Scholar 

  65. T.E. Tremain, M.A. Kohler, T.G. Champion: Philosophy and goals of the DoD 2400 bps vocoder selection process, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1996) pp. 1137-1140

    Google Scholar 

  66. A. McCree, K. Truong, E.B. George, T.P. Barnwell III, V.R. Viswanathan: A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, Vol. 1 (1996) pp. 200-203

    Google Scholar 

  67. L.M. Supplee, R.P. Cohn, J.S. Collura, A. McCree: MELP: the new Federal Standard at 2400 bps, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1997) pp. 1591-1594

    Google Scholar 

  68. M.A. Kohler: A comparison of the new 2400 bps MELP federal standard with other standard coders, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1997) pp. 1587-1590

    Google Scholar 

  69. J.P. Campbell Jr., T.E. Tremain, V.C. Welch: The DoD 4.8 kbps Standard (Proposed Federal Standard (1016)). In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Boston 1991) pp. 121-133

    Chapter  Google Scholar 

  70. S. Villette, K.T. Al Naimi, C. Sturt, A.M. Kondoz, H. Palaz: A 2.4/1.2 kbps SB-LPC based speech coder: the Turkish NATO STANAG candidate, Proc. IEEE Workshop on Speech Coding (2002) pp. 87-89

    Google Scholar 

  71. G. Guilmin, P. Gournay, F. Chartier: Description of the French NATO candidate, Proc. IEEE Workshop on Speech Coding (2002) pp. 84-86

    Google Scholar 

  72. T. Wang, K. Koishida, V. Cuperman, A. Gersho, J.S. Collura: A 1200/2400 bps coding suite based on MELP, Proc. IEEE Workshop on Speech Coding (2002) pp. 90-92

    Google Scholar 

  73. J.S. Collura, D.F. Brandt, D.J. Rahikka: The 1.2 kbps/2.4 kbps MELP speech coding suite with integrated noise pre-processing, IEEE Mil. Commun. Conf. Proc., Vol. 2 (1999) pp. 1449-1453

    Google Scholar 

  74. R. Martin, R.V. Cox: New speech enhancement techniques for low bit rate speech coding, Proc. IEEE Workshop on Speech Coding (1999) pp. 165-167

    Google Scholar 

  75. T. Wang, K. Koishida, V. Cuperman, A. Gersho, J.S. Collura: A 1200 bps speech coder based on MELP, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2000) pp. 1375-1378

    Google Scholar 

  76. G. Guilmin, F. Capman, B. Ravera, F. Chartier: New NATO STANAG narrow band voice coder at 600 bits/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2006) pp. 689-693

    Google Scholar 

  77. J.V. Evans: Satellite systems for personal communications, Proc. IEEE, Vol. 86 (1998) pp. 1325-1341

    Google Scholar 

  78. J.C. Hardwick, J.S. Lim: The application of the IMBE speech coder to mobile communications, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1991) pp. 249-252

    Google Scholar 

  79. S.F.C. Neto, F.L. Corcoran, J. Phipps, S. Dimolitsas: Performance assessment of 4.8 kbit/s AMBE coding under aeronautical environmental conditions, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1996) pp. 499-502

    Google Scholar 

  80. I.A. Gerson, M.A. Jasiuk: VSELP). In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer, Norwell 1991) pp. 69-79

    Chapter  Google Scholar 

  81. S. Dimolitsas, C. Ravishankar, G. Schroder: Current objectives in 4-kb/s wireline-quality speech coding standardization, IEEE Signal Process. Lett. 1(11), 157-159 (1994)

    Article  Google Scholar 

  82. E.L.T. Choy: Waveform interpolation speech coder at 4 kb/s, M.S. Thesis (McGill University, Montreal 1998)

    Google Scholar 

  83. O. Gottesman, A. Gersho: Enhanced waveform interpolative coding at low bit-rate, IEEE Trans. Speech Audio Process. 9(8), 786-798 (2001)

    Article  Google Scholar 

  84. J. Stachurski, A. McCree, V. Viswanathan: High quality MELP coding at bit-rates around 4 kb/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1999) pp. 485-488

    Google Scholar 

  85. S. Yeldener: A 4 kb/s toll quality harmonic excitation linear predictive speech coder, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1999) pp. 481-484

    Google Scholar 

  86. W.B. Kleijn, R.P. Ramachandran, P. Kroon: Generalized analysis-by-synthesis coding and its application to pitch rediction, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1992) pp. 337-340

    Google Scholar 

  87. J. Thyssen, Y. Gao, A. Benyassine, E. Shlomot, C. Murgia, H. Su, K. Mano, Y. Hiwasaki, H. Ehara, K. Yasunaga, C. Lamblin, B. Kovesi, J. Stegmann, H. Kang: A candidate for the ITU-T 4 kbit/s speech coding standard, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (2001) pp. 681-684

    Google Scholar 

  88. I.M. Trancoso, L. Almeida, J.M. Tribolet: A study on the relationships between stochastic and harmonic coding, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1986) pp. 1709-1712

    Google Scholar 

  89. E. Shlomot, V. Cuperman, A. Gersho: Combined harmonic and waveform coding of speech at low bit rates, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1998) pp. 585-588

    Google Scholar 

  90. A. McCree, J. Stachurski, T. Unno, E. Ertan, E. Paksoy, V. Viswanathan, A. Heikkinen, A. Ramo, S. Himanen, P. Blocher, O. Dressler: A 4 kb/s hybrid MELP/CELP speech coding candidate for ITU standardization, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2002) pp. 629-632

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan V. McCree Dr. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

McCree, A.V. (2008). Low-Bit-Rate Speech Coding. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics