Skip to main content

Time Delay Estimation and Source Localization

  • Chapter
Springer Handbook of Speech Processing

Part of the book series: Springer Handbooks ((SHB))

Abstract

A fundamental requirement of microphone arrays is the capability of instantaneously locating and continuously tracking a speech sound source. The problem is challenging in practice due to the fact that speech is a nonstationary random process with a wideband spectrum, and because of the simultaneous presence of noise, room reverberation, and other interfering speech sources. This Chapter presents an overview of the research and development on this technology in the last three decades. Focusing on a two-stage framework for speech source localization, we survey and analyze the state-of-the-art time delay estimation (TDE) and source localization algorithms.

This chapter is organized into two sections. In Sect. 51.2, we will study the TDE problem and review a number of cutting-edge TDE algorithms, ranging from the generalized cross-correlation methods to blind multichannel-identification-based algorithms and the second-order statistics-based multichannel cross-correlation coefficient method to the higher-order statistics-based entropy-minimization approach. In Sect. 51.3, we will investigate the source localization problem from the perspective of estimation theory. The emphasis is on least-squares estimators with closed-form estimates. The spherical intersection, spherical interpolation, and linear-correction spherical interpolation algorithms will be presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

CRLB:

Cramèr-Rao lower bound

FIR:

finite impulse response

GCC:

generalized cross-correlation

HOS:

higher-order statistics

IDTFT:

inverse discrete-time Fourier transform

LMS:

least mean square

PDF:

probability density function

SI:

speech intelligibility

SIMO:

single-input multiple-output

SNR:

signal-to-noise ratio

SOS:

second-order statistics

TDOA:

time difference of arrival

References

  1. D.R. Fischell, C.H. Coker: A speech direction finder, Proc. ICASSP (1984) pp. 19.8.1-19.8.4.

    Google Scholar 

  2. H.F. Silverman: Some analysis of microphone arrays for speech data analysis, IEEE Trans. ASSP 35, 1699-1712 (1987)

    Article  Google Scholar 

  3. J.L. Flanagan, A. Surendran, E. Jan: Spatially selective sound capture for speech and audio processing, Speech Commun. 13, 207-222 (1993)

    Article  Google Scholar 

  4. D.B. Ward, G.W. Elko: Mixed nearfield/farfield beamforming: a new technique for speech acquisition in a reverberant environment, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoust. (1997)

    Google Scholar 

  5. D.V. Rabinkin, R.J. Ranomeron, J.C. French, J.L. Flanagan: A DSP implementation of source location using microphone arrays, Proc. SPIE 2846, 88-99 (1996)

    Article  Google Scholar 

  6. H. Wang, P. Chu: Voice source localization for automatic camera pointing system in videoconferencing, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoust. (1997)

    Google Scholar 

  7. C. Wang, M.S. Brandstein: A hybrid real-time face tracking system, Proc. ICASSP, Vol. 6 (1998) pp. 3737-3741

    Google Scholar 

  8. Y. Huang, J. Benesty, G.W. Elko: Microphone arrays for video camera steering. In: Acoustic Signal Processing for Telecommunication, ed. by S.L. Gay, J. Benesty (Kluwer Academic, Boston 2000) pp. 239-259, chap. 11

    Chapter  Google Scholar 

  9. S. Haykin: Radar array processing for angle of arrival estimation. In: Array Signal Process, ed. by S. Haykin (Prentice-Hall, Englewood Cliffs 1985)

    Google Scholar 

  10. H. Krim, M. Viberg: Two decades of array signal processing research: the parametric approach, IEEE Signal Process. Mag. 13(4), 67-94 (1996)

    Article  Google Scholar 

  11. R.J. Vaccaro: The past, present; future of underwater acoustic signal processing, IEEE Signal Process. Mag. 15, 21-51 (1998)

    Article  Google Scholar 

  12. D.V. Sidorovich, A.B. Gershman: Two-dimensional wideband interpolated root-MUSIC applied to measured seismic data, IEEE Trans. Signal Process. 46(8), 2263-2267 (1998)

    Article  Google Scholar 

  13. J. Capon: Maximum-likelihood spectral estimation, Proc. IEEE 57, 1408-1418 (1969)

    Article  Google Scholar 

  14. R.O. Schmidt: A Signal Subspace Approach to Multiple Emitter Location and Spectral Estimation (Stanford University, Stanford 1981), Ph.D. thesis

    Google Scholar 

  15. W. Bangs, P. Schultheis: Space-time processing for optimal parameter estimation. In: Signal Process, ed. by J. Griffiths, P. Stocklin, C. Van Schooneveld (New York, Academic 1973) pp. 577-590

    Google Scholar 

  16. W.R. Hahn, S.A. Tretter: Optimum processing for delay-vector estimation in passive signal arrays, IEEE Trans. Inform. Theory 19, 608-614 (1973)

    Article  MATH  Google Scholar 

  17. M. Wax, T. Kailath: Optimum localization of multiple sources by passive arrays, IEEE Trans. ASSP 31(5), 1210-1218 (1983)

    Article  Google Scholar 

  18. M.S. Brandstein, H.F. Silverman: A practical methodology for speech source localization with microphone arrays, Comput. Speech Lang. 2, 91-126 (1997)

    Article  Google Scholar 

  19. C.H. Knapp, G.C. Carter: The generalized correlation method for estimation of time delay, IEEE Trans. ASSP 24, 320-327 (1976)

    Article  Google Scholar 

  20. G.C. Carter, A.H. Nuttall, P.G. Cable: The smoothed coherence transform, Proc. IEEE 61, 1497-1498 (1973)

    Article  Google Scholar 

  21. J.P. Ianniello: Time delay estimation via cross-correlation in the presence of large estimation errors, IEEE Trans. ASSP 30, 998-1003 (1982)

    Article  Google Scholar 

  22. B. Champagne, S. Bédard, A. Stéphenne: Performance of time-delay estimation in presence of room reverberation, IEEE Trans. Speech Audio Process. 4, 148-152 (1996)

    Article  Google Scholar 

  23. M. Omologo, P. Svaizer: Acoustic event localization using a crosspower-spectrum phase based technique, Proc. ICASSP, Vol. 2 (1994) pp. 273-276

    Google Scholar 

  24. M.S. Brandstein: A pitch-based approach to time-delay estimation of reverberant speech, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoustics (1997)

    Google Scholar 

  25. M. Omologo, P. Svaizer: Acoustic source location in noisy and reverberant environment using CSP analysis, ICASSP, Vol. 2 (1996) pp. 921-924

    Google Scholar 

  26. A. Stéphenne, B. Champagne: Cepstral prefiltering for time delay estimation in reverberant environments, Proc. ICASSP, Vol. 5. (1995) pp. 3055-3058

    Google Scholar 

  27. J. Benesty: Adaptive eigenvalue decomposition algorithm for passive acoustic source localization, J. Acoust. Soc. Am. 107, 384-391 (2000)

    Article  Google Scholar 

  28. G. Xu, H. Liu, L. Tong, T. Kailath: A least-squares approach to blind channel identification, IEEE Trans. Signal Process. 43, 2982-2993 (1995)

    Article  Google Scholar 

  29. Y. Huang, J. Benesty: Adaptive multichannel time delay estimation based on blind system identification for acoustic source localization. In: Adaptive Signal Processing: Application to Real-World Problems, ed. by J. Benesty, Y. Huang (Springer, Berlin:Heidelberg 2003)

    Google Scholar 

  30. Y. Huang, J. Benesty: Adaptive multi-channel least mean square and Newton algorithms for blind channel identification, Signal Process. 82, 1127-1138 (2002)

    Article  MATH  Google Scholar 

  31. Y. Huang, J. Benesty: A class of frequency-domain adaptive approaches to blind multichannel identification, IEEE Trans. Signal Process. 51, 11-24 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  32. Y. Huang, J. Benesty, J. Chen: Optimal step size of the adaptive multichannel LMS algorithm for blind SIMO identification, IEEE Signal Process. Lett. 12, 173-176 (2005)

    Article  Google Scholar 

  33. Y. Huang, J. Benesty, J. Chen: Acoustic MIMO Signal Process (Berlin, Springer 2006)

    MATH  Google Scholar 

  34. R.L. Kirlin, D.F. Moore, R.F. Kubichek: Improvement of delay measurements from sonar arrays via sequential state estimation, IEEE Trans. ASSP 29, 514-519 (1981)

    Article  MATH  Google Scholar 

  35. T. Nishiura, T. Yamada, S. Nakamura, K. Shikano: Localization of multiple sound sources based on a CSP analysis with a microphone array, Proc. ICASSP (2000) pp. 1053-1055

    Google Scholar 

  36. S.M. Griebel, M.S. Brandstein: Microphone array source localization using realizable delay vectors, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoust. (2001) pp. 71-74

    Google Scholar 

  37. J. DiBiase, H. Silverman, M. Brandstein: Robust localization in reverberant rooms. In: Microphone Arrays: Signal Processing Techniques and Applications, ed. by M. Branstein, D. Ward (Springer, Berlin 2001)

    Google Scholar 

  38. J. Chen, J. Benesty, Y. Huang: Robust time delay estimation exploiting redundancy among multiple microphones, IEEE Trans. Speech Audio Process. 11, 549-557 (2003)

    Article  Google Scholar 

  39. J. Benesty, J. Chen, Y. Huang: Time-delay estimation via linear interpolation and cross-correlation, IEEE Trans. Speech Audio Process. 12, 509-519 (2004)

    Article  Google Scholar 

  40. J.S. Bendat, A.G. Piersol: Random Data Analysis and Measurement Procedures (Wiley, New York 1986)

    MATH  Google Scholar 

  41. D. Cochran, H. Gish, D. Sinno: A geometric approach to multichannel signal detection, IEEE Trans. Signal Process. 43, 2049-2057 (1995)

    Article  Google Scholar 

  42. C.E. Shannon: A mathematical theory of communication, Bell Syst. Tech. J. 27, 379-423, 623-656 (1948)

    MATH  Google Scholar 

  43. T.M. Cover, J.A. Thomas: Elements of Information Theory (Wiley, New York 1991)

    Book  MATH  Google Scholar 

  44. I. Kojadinovic: On the use of mutual information in data analysis: an overview, Int. Symposium on Applied Stochastic Models and Data Analysis (2005)

    Google Scholar 

  45. J. Benesty, Y. Huang, J. Chen: Time delay estimation via minimum entropy, IEEE Signal Process. Lett. 14, 157-160 (2006)

    Article  Google Scholar 

  46. L.R. Rabiner, R.W. Schafer: Digital Process. of Speech Signals (Prentice-Hall, Englewood Cliffs 1978)

    Google Scholar 

  47. S. Gazor, W. Zhang: Speech probability distribution, IEEE Signal Process. Lett. 10(7), 204-207 (2003)

    Article  Google Scholar 

  48. S. Kotz, T.J. Kozubowski, K. Podgórski: An asymmetric multivariate Laplace distribution, Technical Report No. 367, Department of Statistics and Applied Probaility (Univ. of California, Santa Barbara 2000)

    MATH  Google Scholar 

  49. T. Eltoft, T. Kim, T.-W. Lee: On the multivariate Laplace distribution, IEEE Signal Process. Lett. 13, 300-303 (2006)

    Article  Google Scholar 

  50. Y. Huang, J. Benesty, G.W. Elko, R.M. Mersereau: Real-time passive source localization: an unbiased linear-correction least-squares approach, IEEE Trans. Speech Audio Process. 9, 943-956 (2001)

    Article  Google Scholar 

  51. S.M. Kay: Fundamentals of Statistical Signal Process.: Estimation Theory (Prentice-Hall, Englewood Cliffs 1993)

    MATH  Google Scholar 

  52. Y.T. Chan, K.C. Ho: A simple and efficient estimator for hyperbolic location, IEEE Trans. Signal Process. 42, 1905-1915 (1994)

    Article  Google Scholar 

  53. H.C. Schau, A.Z. Robinson: Passive source localization employing intersecting spherical surfaces from time-of-arrival differences, IEEE Trans. ASSP 35, 1223-1225 (1987)

    Article  Google Scholar 

  54. J.S. Abel, J.O. Smith: The spherical interpolation method for closed-form passive source localization using range difference measurements echo cancelation, Proc. ICASSP (1987) pp. 471-474

    Google Scholar 

  55. Y. Huang, J. Benesty, G.W. Elko: Passive acoustic source localization for video camera steering, Proc. IEEE Int. Conf. ACSP, Vol. 2. (2000) pp. 909-912

    Google Scholar 

  56. R.O. Schmidt: A new approach to geometry of range difference location, IEEE Trans. Aerosp. Electron. 8, 821-835 (1972)

    Article  Google Scholar 

  57. T.K. Moon, W.C. Stirling: Mathematical Methods and Algorithms (Prentice-Hall, Upper Saddle River 1999)

    Google Scholar 

  58. C.D. Meyer: Matrix Analysis and Applied Linear Algebra (SIAM, Philadelphia 2000)

    Book  Google Scholar 

  59. W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling: Numerical Recipes in C: The Art of Scientific Computing (Cambridge Univ. Press, Cambridge 1988)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yiteng (Arden) Huang Dr. , Jacob Benesty Prof. or Jingdong Chen Dr. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Huang, Y.(., Benesty, J., Chen, J. (2008). Time Delay Estimation and Source Localization. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics