Skip to main content

Wideband Speech and Audio Coding in the Perceptual Domain

  • Chapter
Advanced Signal Processing for Communication Systems

Part of the book series: The International Series in Engineering and Computer Science ((SECS,volume 703))

  • 258 Accesses

Abstract

A new critical band auditory filterbank with superior auditory masking properties is proposed and is applied to wideband speech and audio coding. The analysis and synthesis are performed in the perceptual domain using this filterbank. The outputs of the analysis filters are processed to obtain a series of pulse trains that represent neural firing. Simultaneous and temporal masking models are applied to reduce the number of pulses in order to achieve a compact time-frequency parameterization. The pulse amplitudes and positions are then coded using a run-length coding algorithm. The new speech and audio coder produces high quality coded speech and audio, with both temporal and spectral fidelity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambikairajah, E., Black, N.D. and Linggard, R., “Digital filter simulation of the basilar membrane”, Computer Speech and Language, 1989, vol. 3, pp. 105–118.

    Article  Google Scholar 

  2. Ambikairajah, E., Davis, A.G., and Wong, W.T.K., “Auditory masking and MPEG-1 audio compression”, Electr. & Commun. Eng. Journal, vol. 9, no. 4, August 1997, pp. 165–197.

    Google Scholar 

  3. Ambikairajah, E., Epps, J. and Lin, L., “Wideband speech and audio coding using Gammatone filter banks”, Proc. ICASSP, 2001, pp. 773–776.

    Google Scholar 

  4. Black, M. and Zeytinoglu, M., “Computationally efficient wavelet packet coding of wide-band stereo audio signals”, Proc. ICASSP, 1995, pp. 3075–3078.

    Google Scholar 

  5. Flanagan, J.L., “Models for approximating basilar membrane displacement”, Bell Sys. Tech. J, 1960, vol. 39, pp. 1163–1191.

    Google Scholar 

  6. Kobayashi, T. and Imai, A., “Design of IIR digital filter with arbitrary log magnitude function by WLS techniques”, IEEE Trans. ASSP, vol. ASSP-38, 1990, pp. 247–252.

    MathSciNet  Google Scholar 

  7. Kubin, G. and Kleijn, W.B., “On speech coding in a perceptual domain”, Proc. ICASSP, 1999, pp. 205–208.

    Google Scholar 

  8. Liberman, M.C. “Auditory-nerve response from cats raised in a low-noise chamber”, J. Acoust. Soc. Am., vol. 63, 1978, pp. 442–455.

    Google Scholar 

  9. Lin, L., Holmes, W.H. and Ambikairajah, E., “Auditory filter bank inversion”, Proc. ISCAS 2001, 200l. Vol. 2 pp: 537–540.

    Google Scholar 

  10. Lin, L., Ambikairajah, E. and Holmes, W.H., “Log-magnitude modelling of auditory tuning curves”, Proc. ICASSP, 2001, pp. 3293–3296.

    Google Scholar 

  11. Lin, L., Ambikairajah, E. and Holmes, W.H., “Auditory filterbank design using masking curves”, Proc. EUROSPEECH 2001, pp. 411–414.

    Google Scholar 

  12. Lyon, R.F., “A computational model of filtering detection and compression in the cochlea”, Proc. ICASSP, 1982, pp. 1282–1285.

    Google Scholar 

  13. Patterson, R.D., Allerhand, M., and Giguere, C., “Time-domain modelling of peripheral auditory processing: a modular architecture and a software platform”, J. Acoust. Soc. Am., vol. 98, 1995, pp. 1890–1894.

    Article  Google Scholar 

  14. Rhode, W.S., “Observation of the vibration of the basilar membrane of the squirrel monkey using the Mossbauer technique”, J. Acoust. Soc. Am., vol. 49, 1971, pp. 1218–1231.

    Article  Google Scholar 

  15. Robert, A. and Eriksson, J., “A composite model of the auditory periphery for simulating responses to complex sounds”, J. Acoust. Soc. Am., vol. 106, 1999, pp. 1852–1864.

    Article  Google Scholar 

  16. Zwicker, E. and Zwicker, U.T., “Audio engineering and psychoacoustics: matching signals to the final receiver, the human auditory system”, J. Audio Eng. Soc., vol. 39, No. 3, 1991, pp. 115–125.

    MathSciNet  Google Scholar 

  17. Zwicker, E. and Fastl, H., Psychoacoustics: Facts and models. Springer-Verlag, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Kluwer Academic Publishers

About this chapter

Cite this chapter

Lin, L., Ambikairajah, E., Holmes, W. (2002). Wideband Speech and Audio Coding in the Perceptual Domain. In: Wysocki, T.A., Darnell, M., Honary, B. (eds) Advanced Signal Processing for Communication Systems. The International Series in Engineering and Computer Science, vol 703. Springer, Boston, MA. https://doi.org/10.1007/0-306-47791-2_2

Download citation

  • DOI: https://doi.org/10.1007/0-306-47791-2_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-7202-4

  • Online ISBN: 978-0-306-47791-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics