Low-Bit-Rate Speech Coding

McCree, Alan V.

doi:10.1007/978-3-540-49127-9_16

Alan V. McCree Dr.⁴

Part of the book series: Springer Handbooks ((SHB))

8182 Accesses
4 Citations

Abstract

Low-bit-rate speech coding, at rates below 4 kb/s, is needed for both communication and voice storage applications. At such low rates, full encoding of the speech waveform is not possible; therefore, low-rate coders rely instead on parametric models to represent only the most perceptually relevant aspects of speech. While there are a number of different approaches for this modeling, all can be related to the basic linear model of speech production, where an excitation signal drives a vocal-tract filter.

The basic properties of the speech signal and of human speech perception can explain the principles of parametric speech coding as applied in early vocoders. Current speech modeling approaches, such as mixed excitation linear prediction, sinusoidal coding, and waveform interpolation, use more-sophisticated versions of these same concepts. Modern techniques for encoding the model parameters, in particular using the theory of vector quantization, allow the encoding of the model information with very few bits per speech frame.

Successful standardization of low-rate coders has enabled their widespread use for both military and satellite communications, at rates from 4 kb/s all the way down to 600 b/s. However, the goal of toll-quality low-rate coding continues to provide a research challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACeS:: Asia Cellular Satellite
AMSC-TMI:: American Mobile Satellite Corporation Telesat Mobile Incorporated
APCO:: Association of Public-Safety Communications Officials
CELP:: code-excited linear prediction
DFT:: discrete Fourier transform
DoD:: Department of Defense
FIR:: finite impulse response
ITU:: International Telecommunication Union
LPC:: linear prediction coefficients
LPC:: linear predictive coding
LSF:: line spectral frequency
MBE:: multiband excited
MELP:: mixed excitation linear prediction
MSVQ:: multistage VQ
NATO:: North Atlantic Treaty Organization
RCELP:: relaxed CELP
REW:: rapidly evolving waveform
RMS:: root mean square
SD:: spectral distortion
SEW:: slowly evolving waveform
STC:: sinusoidal transform coder
TDMA:: time-division multiple-access
VQ:: vector quantization
VSELP:: vector sum excited linear prediction
WI:: waveform interpolation

References

M.R. Schroeder, B.S. Atal: Code excited linear prediction (CELP): High quality speech at very low bit rates, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (Tampa 1985) pp. 937-940
Google Scholar
L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice Hall, Englewood Cliffs 1978)
Google Scholar
G. Fant: Acoustic Theory of Speech Production (Mouton, The Hague 1960)
Google Scholar
L.E. Kinsler: Fundamentals of Acoustics, 3rd edn. (Wiley, New York 1982)
MATH Google Scholar
B. Scharf: Critical bands. In: Foundations of Modern Auditory Theory, ed. by J.V. Tobias (Academic, New York 1970), Chap. 5
Google Scholar
D.O. Kim, W.S. Rhode, S.R. Greenberg: Responses of cochlear nucleus neurons to speech signals: Neural encoding of pitch, intensity, and other parameters. In: Auditory Frequency Selectivity, ed. by B.C.J. Moore, R.D. Patterson (Plenum, New York 1986) pp. 281-288
Chapter Google Scholar
H. Dudley: Remaking speech, J. Acoust. Soc. Am. 11, 169-177 (1939)
Article Google Scholar
J.N. Holmes: The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer, IEEE Trans. Audio Electroacoust. 21, 298-305 (1973)
Article Google Scholar
D.H. Klatt: Review of text-to-speech conversion for english, J. Acoust. Soc. Am. 82, 737-793 (1987)
Article Google Scholar
F. Itakura, S. Saito: Analysis synthesis telephony based on the maximum likelihood method, Rep. 6th Int. Congr. Acoustics (1968) pp. C17-C20
Google Scholar
B.S. Atal, S.L. Hanauer: Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am. 50(2), 637-655 (1971)
Article Google Scholar
O. Fujimura: An approximation to voice aperiodicity, IEEE Trans. Audio Electroacoust. 16, 68-72 (1968)
Article Google Scholar
J. Makhoul, R. Viswanathan, R. Schwartz, A.W.F. Huggins: A mixed-source model for speech compression and synthesis, J. Acoust. Soc. Am. 64(6), 1577-1581 (1978)
Article Google Scholar
S.Y. Kwon, A.J. Goldberg: An enhanced LPC vocoder with no voiced/unvoiced switch, IEEE Trans. Acoust. Speech Signal Process. 32, 851-858 (1984)
Article Google Scholar
G.S. Kang, S.S. Everett: Improvement of the excitation source in the narrow-band linear prediction vocoder, IEEE Trans. Acoust. Speech Signal Process. 33, 377-386 (1985)
Article Google Scholar
M.R. Sambur, A.E. Rosenberg, L.R. Rabiner, C.A. McGonegal: On reducing the buzz in LPC synthesis, J. Acoust. Soc. Am. 63, 918-924 (1978)
Article Google Scholar
D.Y. Wong: On understanding the quality problems of LPC speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1980) pp. 725-728
Google Scholar
B.S. Atal, N. David: On synthesizing natural-sounding speech by linear prediction, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1979) pp. 44-47
Google Scholar
A. McCree, T.P. Barnwell III: A mixed excitation LPC vocoder model for low bit rate speech coding, IEEE Trans. Speech Audio Process. 3(4), 242-250 (1995)
Article Google Scholar
A. McCree, T.P. Barnwell III: Improving the performance of a mixed excitation LPC vocoder in acoustic noise, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1992) pp. II137-II140
Google Scholar
W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)
Book Google Scholar
A. McCree, T.P. Barnwell III: A new mixed excitation LPC vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1991) pp. 593-596
Google Scholar
D.L. Thomson, D.P. Prezas: Selective modeling of the LPC residual during unvoiced frames: White noise or pulse excitation, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1986) pp. 3087-3090
Google Scholar
J.H. Chen, A. Gersho: Real-time vector APC speech coding at 4800 bps with adaptive postfiltering, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1987) pp. 2185-2188
Google Scholar
W.B. Kleijn, D.J. Krasinski, R.H. Ketchum: Fast methods for the CELP speech coding algorithm, IEEE Trans. Acoust. Speech Signal Process. 38(8), 1330-1342 (1990)
Article Google Scholar
J.N. Holmes: Formant excitation before and after glottal closure, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1976) pp. 39-42
Google Scholar
A.E. Rosenberg: Effect of glottal pulse shape on the quality of natural vowels, J. Acoust. Soc. Am. 49, 583-590 (1971)
Article Google Scholar
A. McCree, J.C. DeMartin: A 1.7 kb/s MELP coder with improved analysis and quantization, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1998) pp. 593-596
Google Scholar
T. Unno, T.P. Barnwell III, K. Truong: An improved mixed excitation linear prediction (MELP) coder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1999) pp. 245-248
Google Scholar
W. Lin, S.N. Koh, X. Lin: Mixed excitation linear prediction coding of wideband speech at 8 kbps, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (2000) pp. II1137-II1140
Google Scholar
N.R. Chong-White, R.V. Cox: An intelligibility enhancement for the mixed excitation linear prediction speech coder, IEEE Signal Process. Lett. 10(9), 263-266 (2003)
Article Google Scholar
A.E. Ertan, T.P. Barnwell III: Improving the 2.4 kb/s military standard MELP (MS-MELP) coder using pitch-synchronous analysis and synthesis techniques, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (2005) pp. 761-764
Google Scholar
R.J. McAulay, T.F. Quatieri: Sinusoidal coding. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995), Chap. 4
Google Scholar
T.F. Quatieri: Discrete Time Speech Signal Processing: Principles and Practice (Prentice Hall, Englewood Cliffs 2002), Chap. 9
Google Scholar
P. Hedelin: A tone-oriented voice-excited vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1981) pp. 205-208
Google Scholar
L.B. Almeida, F.M. Silva: Variable-frequency synthesis: An improved harmonic coding scheme, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1984), Sects. 27.5.1-27.5.4.
Google Scholar
R.J. McAulay, T.F. Quatieri: Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust. Speech Signal Process. ASSP-34(4), 744-754 (1986)
Article Google Scholar
D.W. Griffin, J.S. Lim: Multiband excitation vocoder, IEEE Trans. Acoust. Speech Signal Process. 36(8), 1223-1235 (1988)
Article MATH Google Scholar
E.B. George, M.J.T. Smith: Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans. Speech Audio Process. 5(5), 389-406 (1997)
Article Google Scholar
C. Li, V. Cuperman: Analysis-by-synthesis multimode harmonic speech coding at 4 kb/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 3 (2000) pp. 1367-1370
Google Scholar
C.O. Etemoglu, V. Cuperman, A. Gersho: Speech coding with an analysis-by-synthesis sinusoidal model, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 3 (2000) pp. 1371-1374
Google Scholar
M.S. Brandstein: A 1.5 kbps multi-band excitation speech coder, M.S. Thesis (Massachusetts Institute of Technology, Cambridge 1990)
Google Scholar
R. McAulay, T. Parks, T. Quatieri, M. Sabin: Sine-wave amplitude coding at low data rates. In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Boston 1991) pp. 203-214
Chapter Google Scholar
S. Yeldener, A.M. Kondoz, B.G. Evans: High quality multiband LPC coding of speech at 2.4 kbit/s, Electron. Lett. 27(14), 1287-1289 (1991)
Article Google Scholar
M. Nishiguchi, J. Matsumoto, R. Wakatsuki, S. Ono: Vector quantized MBE with simplified V/UV division at 3.0 kbit/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1993) pp. 151-154
Google Scholar
A. Das, A.V. Rao, A. Gersho: Variable-dimension vector quantization, IEEE Signal Process. Lett. 3(7), 200-202 (1996)
Article Google Scholar
P. Lupini, V. Cuperman: Nonsquare transform vector quantization, IEEE Signal Process. Lett. 3(1), 1-3 (1996)
Article Google Scholar
W.B. Kleijn, J. Haagen: Waveform interpolation for coding and synthesis. In: Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam 1995), Chap. 5
Google Scholar
W.B. Kleijn: Encoding speech using prototype waveforms, IEEE Trans. Speech Audio Process. 1(4), 386-399 (1993)
Article Google Scholar
W.B. Kleijn, J. Haagen: Transformation and decomposition of the speech signal for coding, IEEE Signal Process. Lett. 1, 136-138 (1994)
Article Google Scholar
T. Eriksson, W.B. Kleijn: On waveform-interpolation coding with asymptotically perfect reconstruction, Proc. IEEE Workshop on Speech Coding (1999) pp. 93-95
Google Scholar
N.R. Chong, I.S. Burnett, J.F. Chicharo: A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition, IEEE Trans. Speech Audio Process. 8(3), 345-348 (2000)
Article Google Scholar
A. Gersho, R.M. Gray: Vector Quantization and Signal Compression (Kluwer, Dordrecht 1992)
Book MATH Google Scholar
H. Dudley: Phonetic pattern recognition vocoder for narrow-band speech transmission, J. Acoust. Soc. Am. 30, 733-739 (1958)
Article Google Scholar
C.E. Shannon: A mathematical theory of communication, Bell Syst. Tech. J. 27, 379-423,623-656 (1948)
Article MathSciNet MATH Google Scholar
J. Picone, G. Doddington: A phonetic vocoder, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1989) pp. 580-583
Google Scholar
J. Makhoul: Linear prediction: A tutorial review, IEEE Proc. 63, 561-579 (1975)
Article Google Scholar
F. Itakura: Line spectrum representation of linear predictive coefficients of speech signals, J. Acoust. Soc. Am. 57, S35(A) (1975)
Article Google Scholar
K.K. Paliwal, B.S. Atal: Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Speech Audio Process. 1(1), 3-14 (1993)
Article Google Scholar
J.S. Collura, A. McCree, T.E. Tremain: Perceptually based distortion measures for spectrum quantization, Proc. IEEE Workshop on Speech Coding for Telecommunications (1995) pp. 49-50
Google Scholar
W.P. LeBlanc, B. Bhattacharya, S.A. Mahmoud, V. Cuperman: Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding, IEEE Trans. Speech Audio Process. 1(4), 373-385 (1993)
Article Google Scholar
W. Gardner, B. Rao: Theoretical analysis of the high-rate vector quantization of LPC parameters, IEEE Trans. Speech Audio Process. 3, 367-381 (1995)
Article Google Scholar
A. McCree, J.C. DeMartin: A 1.6 kb/s MELP coder for wireless communications, Proc. IEEE Workshop on Speech Coding for Telecommunications (1997) pp. 23-24
Google Scholar
T.E. Tremain: The government standard linear predictive coding algorithm: LPC-10, Speech Technol. 1, 40-49 (1982)
Google Scholar
T.E. Tremain, M.A. Kohler, T.G. Champion: Philosophy and goals of the DoD 2400 bps vocoder selection process, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1996) pp. 1137-1140
Google Scholar
A. McCree, K. Truong, E.B. George, T.P. Barnwell III, V.R. Viswanathan: A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, Vol. 1 (1996) pp. 200-203
Google Scholar
L.M. Supplee, R.P. Cohn, J.S. Collura, A. McCree: MELP: the new Federal Standard at 2400 bps, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (1997) pp. 1591-1594
Google Scholar
M.A. Kohler: A comparison of the new 2400 bps MELP federal standard with other standard coders, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (1997) pp. 1587-1590
Google Scholar
J.P. Campbell Jr., T.E. Tremain, V.C. Welch: The DoD 4.8 kbps Standard (Proposed Federal Standard (1016)). In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Boston 1991) pp. 121-133
Chapter Google Scholar
S. Villette, K.T. Al Naimi, C. Sturt, A.M. Kondoz, H. Palaz: A 2.4/1.2 kbps SB-LPC based speech coder: the Turkish NATO STANAG candidate, Proc. IEEE Workshop on Speech Coding (2002) pp. 87-89
Google Scholar
G. Guilmin, P. Gournay, F. Chartier: Description of the French NATO candidate, Proc. IEEE Workshop on Speech Coding (2002) pp. 84-86
Google Scholar
T. Wang, K. Koishida, V. Cuperman, A. Gersho, J.S. Collura: A 1200/2400 bps coding suite based on MELP, Proc. IEEE Workshop on Speech Coding (2002) pp. 90-92
Google Scholar
J.S. Collura, D.F. Brandt, D.J. Rahikka: The 1.2 kbps/2.4 kbps MELP speech coding suite with integrated noise pre-processing, IEEE Mil. Commun. Conf. Proc., Vol. 2 (1999) pp. 1449-1453
Google Scholar
R. Martin, R.V. Cox: New speech enhancement techniques for low bit rate speech coding, Proc. IEEE Workshop on Speech Coding (1999) pp. 165-167
Google Scholar
T. Wang, K. Koishida, V. Cuperman, A. Gersho, J.S. Collura: A 1200 bps speech coder based on MELP, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2000) pp. 1375-1378
Google Scholar
G. Guilmin, F. Capman, B. Ravera, F. Chartier: New NATO STANAG narrow band voice coder at 600 bits/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2006) pp. 689-693
Google Scholar
J.V. Evans: Satellite systems for personal communications, Proc. IEEE, Vol. 86 (1998) pp. 1325-1341
Google Scholar
J.C. Hardwick, J.S. Lim: The application of the IMBE speech coder to mobile communications, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1991) pp. 249-252
Google Scholar
S.F.C. Neto, F.L. Corcoran, J. Phipps, S. Dimolitsas: Performance assessment of 4.8 kbit/s AMBE coding under aeronautical environmental conditions, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1996) pp. 499-502
Google Scholar
I.A. Gerson, M.A. Jasiuk: VSELP). In: Advances in Speech Coding, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer, Norwell 1991) pp. 69-79
Chapter Google Scholar
S. Dimolitsas, C. Ravishankar, G. Schroder: Current objectives in 4-kb/s wireline-quality speech coding standardization, IEEE Signal Process. Lett. 1(11), 157-159 (1994)
Article Google Scholar
E.L.T. Choy: Waveform interpolation speech coder at 4 kb/s, M.S. Thesis (McGill University, Montreal 1998)
Google Scholar
O. Gottesman, A. Gersho: Enhanced waveform interpolative coding at low bit-rate, IEEE Trans. Speech Audio Process. 9(8), 786-798 (2001)
Article Google Scholar
J. Stachurski, A. McCree, V. Viswanathan: High quality MELP coding at bit-rates around 4 kb/s, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1999) pp. 485-488
Google Scholar
S. Yeldener: A 4 kb/s toll quality harmonic excitation linear predictive speech coder, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1999) pp. 481-484
Google Scholar
W.B. Kleijn, R.P. Ramachandran, P. Kroon: Generalized analysis-by-synthesis coding and its application to pitch rediction, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1992) pp. 337-340
Google Scholar
J. Thyssen, Y. Gao, A. Benyassine, E. Shlomot, C. Murgia, H. Su, K. Mano, Y. Hiwasaki, H. Ehara, K. Yasunaga, C. Lamblin, B. Kovesi, J. Stegmann, H. Kang: A candidate for the ITU-T 4 kbit/s speech coding standard, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 2 (2001) pp. 681-684
Google Scholar
I.M. Trancoso, L. Almeida, J.M. Tribolet: A study on the relationships between stochastic and harmonic coding, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1986) pp. 1709-1712
Google Scholar
E. Shlomot, V. Cuperman, A. Gersho: Combined harmonic and waveform coding of speech at low bit rates, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1998) pp. 585-588
Google Scholar
A. McCree, J. Stachurski, T. Unno, E. Ertan, E. Paksoy, V. Viswanathan, A. Heikkinen, A. Ramo, S. Himanen, P. Blocher, O. Dressler: A 4 kb/s hybrid MELP/CELP speech coding candidate for ITU standardization, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2002) pp. 629-632
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems Technology, MIT Lincoln Laboratory, 244 Wood Street, 02420-9185, Lexington, MA, USA
Alan V. McCree Dr.

Authors

Alan V. McCree Dr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan V. McCree Dr. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McCree, A.V. (2008). Low-Bit-Rate Speech Coding. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics