Abstract
A fundamental requirement of microphone arrays is the capability of instantaneously locating and continuously tracking a speech sound source. The problem is challenging in practice due to the fact that speech is a nonstationary random process with a wideband spectrum, and because of the simultaneous presence of noise, room reverberation, and other interfering speech sources. This Chapter presents an overview of the research and development on this technology in the last three decades. Focusing on a two-stage framework for speech source localization, we survey and analyze the state-of-the-art time delay estimation (TDE) and source localization algorithms.
This chapter is organized into two sections. In Sect. 51.2, we will study the TDE problem and review a number of cutting-edge TDE algorithms, ranging from the generalized cross-correlation methods to blind multichannel-identification-based algorithms and the second-order statistics-based multichannel cross-correlation coefficient method to the higher-order statistics-based entropy-minimization approach. In Sect. 51.3, we will investigate the source localization problem from the perspective of estimation theory. The emphasis is on least-squares estimators with closed-form estimates. The spherical intersection, spherical interpolation, and linear-correction spherical interpolation algorithms will be presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- CRLB:
-
Cramèr-Rao lower bound
- FIR:
-
finite impulse response
- GCC:
-
generalized cross-correlation
- HOS:
-
higher-order statistics
- IDTFT:
-
inverse discrete-time Fourier transform
- LMS:
-
least mean square
- PDF:
-
probability density function
- SI:
-
speech intelligibility
- SIMO:
-
single-input multiple-output
- SNR:
-
signal-to-noise ratio
- SOS:
-
second-order statistics
- TDOA:
-
time difference of arrival
References
D.R. Fischell, C.H. Coker: A speech direction finder, Proc. ICASSP (1984) pp. 19.8.1-19.8.4.
H.F. Silverman: Some analysis of microphone arrays for speech data analysis, IEEE Trans. ASSP 35, 1699-1712 (1987)
J.L. Flanagan, A. Surendran, E. Jan: Spatially selective sound capture for speech and audio processing, Speech Commun. 13, 207-222 (1993)
D.B. Ward, G.W. Elko: Mixed nearfield/farfield beamforming: a new technique for speech acquisition in a reverberant environment, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoust. (1997)
D.V. Rabinkin, R.J. Ranomeron, J.C. French, J.L. Flanagan: A DSP implementation of source location using microphone arrays, Proc. SPIE 2846, 88-99 (1996)
H. Wang, P. Chu: Voice source localization for automatic camera pointing system in videoconferencing, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoust. (1997)
C. Wang, M.S. Brandstein: A hybrid real-time face tracking system, Proc. ICASSP, Vol. 6 (1998) pp. 3737-3741
Y. Huang, J. Benesty, G.W. Elko: Microphone arrays for video camera steering. In: Acoustic Signal Processing for Telecommunication, ed. by S.L. Gay, J. Benesty (Kluwer Academic, Boston 2000) pp. 239-259, chap. 11
S. Haykin: Radar array processing for angle of arrival estimation. In: Array Signal Process, ed. by S. Haykin (Prentice-Hall, Englewood Cliffs 1985)
H. Krim, M. Viberg: Two decades of array signal processing research: the parametric approach, IEEE Signal Process. Mag. 13(4), 67-94 (1996)
R.J. Vaccaro: The past, present; future of underwater acoustic signal processing, IEEE Signal Process. Mag. 15, 21-51 (1998)
D.V. Sidorovich, A.B. Gershman: Two-dimensional wideband interpolated root-MUSIC applied to measured seismic data, IEEE Trans. Signal Process. 46(8), 2263-2267 (1998)
J. Capon: Maximum-likelihood spectral estimation, Proc. IEEE 57, 1408-1418 (1969)
R.O. Schmidt: A Signal Subspace Approach to Multiple Emitter Location and Spectral Estimation (Stanford University, Stanford 1981), Ph.D. thesis
W. Bangs, P. Schultheis: Space-time processing for optimal parameter estimation. In: Signal Process, ed. by J. Griffiths, P. Stocklin, C. Van Schooneveld (New York, Academic 1973) pp. 577-590
W.R. Hahn, S.A. Tretter: Optimum processing for delay-vector estimation in passive signal arrays, IEEE Trans. Inform. Theory 19, 608-614 (1973)
M. Wax, T. Kailath: Optimum localization of multiple sources by passive arrays, IEEE Trans. ASSP 31(5), 1210-1218 (1983)
M.S. Brandstein, H.F. Silverman: A practical methodology for speech source localization with microphone arrays, Comput. Speech Lang. 2, 91-126 (1997)
C.H. Knapp, G.C. Carter: The generalized correlation method for estimation of time delay, IEEE Trans. ASSP 24, 320-327 (1976)
G.C. Carter, A.H. Nuttall, P.G. Cable: The smoothed coherence transform, Proc. IEEE 61, 1497-1498 (1973)
J.P. Ianniello: Time delay estimation via cross-correlation in the presence of large estimation errors, IEEE Trans. ASSP 30, 998-1003 (1982)
B. Champagne, S. Bédard, A. Stéphenne: Performance of time-delay estimation in presence of room reverberation, IEEE Trans. Speech Audio Process. 4, 148-152 (1996)
M. Omologo, P. Svaizer: Acoustic event localization using a crosspower-spectrum phase based technique, Proc. ICASSP, Vol. 2 (1994) pp. 273-276
M.S. Brandstein: A pitch-based approach to time-delay estimation of reverberant speech, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoustics (1997)
M. Omologo, P. Svaizer: Acoustic source location in noisy and reverberant environment using CSP analysis, ICASSP, Vol. 2 (1996) pp. 921-924
A. Stéphenne, B. Champagne: Cepstral prefiltering for time delay estimation in reverberant environments, Proc. ICASSP, Vol. 5. (1995) pp. 3055-3058
J. Benesty: Adaptive eigenvalue decomposition algorithm for passive acoustic source localization, J. Acoust. Soc. Am. 107, 384-391 (2000)
G. Xu, H. Liu, L. Tong, T. Kailath: A least-squares approach to blind channel identification, IEEE Trans. Signal Process. 43, 2982-2993 (1995)
Y. Huang, J. Benesty: Adaptive multichannel time delay estimation based on blind system identification for acoustic source localization. In: Adaptive Signal Processing: Application to Real-World Problems, ed. by J. Benesty, Y. Huang (Springer, Berlin:Heidelberg 2003)
Y. Huang, J. Benesty: Adaptive multi-channel least mean square and Newton algorithms for blind channel identification, Signal Process. 82, 1127-1138 (2002)
Y. Huang, J. Benesty: A class of frequency-domain adaptive approaches to blind multichannel identification, IEEE Trans. Signal Process. 51, 11-24 (2003)
Y. Huang, J. Benesty, J. Chen: Optimal step size of the adaptive multichannel LMS algorithm for blind SIMO identification, IEEE Signal Process. Lett. 12, 173-176 (2005)
Y. Huang, J. Benesty, J. Chen: Acoustic MIMO Signal Process (Berlin, Springer 2006)
R.L. Kirlin, D.F. Moore, R.F. Kubichek: Improvement of delay measurements from sonar arrays via sequential state estimation, IEEE Trans. ASSP 29, 514-519 (1981)
T. Nishiura, T. Yamada, S. Nakamura, K. Shikano: Localization of multiple sound sources based on a CSP analysis with a microphone array, Proc. ICASSP (2000) pp. 1053-1055
S.M. Griebel, M.S. Brandstein: Microphone array source localization using realizable delay vectors, Proc. IEEE ASSP Workshop Appl. Signal Process. Audio Acoust. (2001) pp. 71-74
J. DiBiase, H. Silverman, M. Brandstein: Robust localization in reverberant rooms. In: Microphone Arrays: Signal Processing Techniques and Applications, ed. by M. Branstein, D. Ward (Springer, Berlin 2001)
J. Chen, J. Benesty, Y. Huang: Robust time delay estimation exploiting redundancy among multiple microphones, IEEE Trans. Speech Audio Process. 11, 549-557 (2003)
J. Benesty, J. Chen, Y. Huang: Time-delay estimation via linear interpolation and cross-correlation, IEEE Trans. Speech Audio Process. 12, 509-519 (2004)
J.S. Bendat, A.G. Piersol: Random Data Analysis and Measurement Procedures (Wiley, New York 1986)
D. Cochran, H. Gish, D. Sinno: A geometric approach to multichannel signal detection, IEEE Trans. Signal Process. 43, 2049-2057 (1995)
C.E. Shannon: A mathematical theory of communication, Bell Syst. Tech. J. 27, 379-423, 623-656 (1948)
T.M. Cover, J.A. Thomas: Elements of Information Theory (Wiley, New York 1991)
I. Kojadinovic: On the use of mutual information in data analysis: an overview, Int. Symposium on Applied Stochastic Models and Data Analysis (2005)
J. Benesty, Y. Huang, J. Chen: Time delay estimation via minimum entropy, IEEE Signal Process. Lett. 14, 157-160 (2006)
L.R. Rabiner, R.W. Schafer: Digital Process. of Speech Signals (Prentice-Hall, Englewood Cliffs 1978)
S. Gazor, W. Zhang: Speech probability distribution, IEEE Signal Process. Lett. 10(7), 204-207 (2003)
S. Kotz, T.J. Kozubowski, K. Podgórski: An asymmetric multivariate Laplace distribution, Technical Report No. 367, Department of Statistics and Applied Probaility (Univ. of California, Santa Barbara 2000)
T. Eltoft, T. Kim, T.-W. Lee: On the multivariate Laplace distribution, IEEE Signal Process. Lett. 13, 300-303 (2006)
Y. Huang, J. Benesty, G.W. Elko, R.M. Mersereau: Real-time passive source localization: an unbiased linear-correction least-squares approach, IEEE Trans. Speech Audio Process. 9, 943-956 (2001)
S.M. Kay: Fundamentals of Statistical Signal Process.: Estimation Theory (Prentice-Hall, Englewood Cliffs 1993)
Y.T. Chan, K.C. Ho: A simple and efficient estimator for hyperbolic location, IEEE Trans. Signal Process. 42, 1905-1915 (1994)
H.C. Schau, A.Z. Robinson: Passive source localization employing intersecting spherical surfaces from time-of-arrival differences, IEEE Trans. ASSP 35, 1223-1225 (1987)
J.S. Abel, J.O. Smith: The spherical interpolation method for closed-form passive source localization using range difference measurements echo cancelation, Proc. ICASSP (1987) pp. 471-474
Y. Huang, J. Benesty, G.W. Elko: Passive acoustic source localization for video camera steering, Proc. IEEE Int. Conf. ACSP, Vol. 2. (2000) pp. 909-912
R.O. Schmidt: A new approach to geometry of range difference location, IEEE Trans. Aerosp. Electron. 8, 821-835 (1972)
T.K. Moon, W.C. Stirling: Mathematical Methods and Algorithms (Prentice-Hall, Upper Saddle River 1999)
C.D. Meyer: Matrix Analysis and Applied Linear Algebra (SIAM, Philadelphia 2000)
W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling: Numerical Recipes in C: The Art of Scientific Computing (Cambridge Univ. Press, Cambridge 1988)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Huang, Y.(., Benesty, J., Chen, J. (2008). Time Delay Estimation and Source Localization. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)