Abstract
Multiple sound source localization is an important application in speech processing. In this paper, a cuboids nested microphone array (CuNMA) is proposed for sound acquisition. Also, the spatial aliasing is eliminated by the use of this array. Then, the subband processing is proposed based on the GammaTone filter bank. In the next, the generalized eigenvalue decomposition (GEVD) algorithm is implemented on all microphone pairs of CuNMA and for each obtained subband of the GammaTone filter bank. In each subband, the standard deviation (SD) is calculated for all direction of arrival (DOA) estimations, and the subbands with improper information are eliminated. Then, the K-means clustering with silhouette criteria are implemented on all DOAs for estimating the number of speakers and to allocate the related DOAs for each cluster. The proposed method is compared with steered response power-phase transform (SRP-PHAT), Geometric Projection, and spectral source model-deep neural network (SSM-DNN) on simulated data in noisy and reverberant conditions, which the results show the superiority of the proposed method in comparison with other previous works.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Simon, H.J.: Bilateral amplification and sound localization: then and now. J. Rehabil. Res. Dev. 42(4), 117–132 (2005)
Wu, X., Gong, H., Chen, P., Zhong, Z., Xu, Y.: Surveillance robot utilizing video and audio information. J. Intell. Robot. Syst. 55(4/5), 403–421 (2009)
Wang, C., Griebel, S., Brandstein, M.: Robust automatic videoconferencing with multiple cameras and microphones. In: IEEE International Conference on Multimedia and Expo, New York, NY, USA, pp. 1585–1588 (2000)
Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Sig. Process. 92(8), 1950–1960 (2012)
Sheng, X., Hu, Y.H.: Maximum likelihood multiple-source localization using acoustic energy measurements with wireless sensor networks. IEEE Trans. Sig. Process. 53(1), 44–53 (2005)
Roy, R., Kailath, T.: Esprit-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Sig. Process. 37(7), 984–995 (1989)
Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)
Pavlidi, D., Griffin, A., Puigt, M., Mouchtaris, A.: Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans. Audio Speech Lang. Process. 21(10), 2193–2206 (2013)
Ma, N., Gonzalez, J.A., Brown, G.J.: Robust binaural localization of a target sound source by combining spectral source models and deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 26, 2122–2131 (2018)
Long, T., Chen, J., Huang, G., Benesty, J., Cohen, I.: Acoustic source localization based on geometric projection in reverberant and noisy environments. IEEE J. Sel. Top. Sign. Process. 13(1), 143–155 (2019)
Zheng, Y.R., Goubran, R.A., El-Tanany, M.: Experimental evaluation of a nested microphone array with adaptive noise cancellers. IEEE Trans. Instrum. Measur. 53(3), 777–786 (2004)
Boer, E.D., Kruidenier, C.: On ringing limits of the auditory periphery. Biol. Cybern. 63(6), 433–442 (1990)
Benesty, J.: Adaptive eigenvalue decomposition algorithm for passive acoustic source localization. J. Acoust. Soc. Am. 107, 384–391 (2000)
Peter, J.R.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Linguistic Data Consortium, Philadelphia. https://catalog.ldc.upenn.edu/LDC93S1. Accessed 20 May 2019
Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)
Do, H., Silverman, H.F.: SRP-PHAT methods of locating simultaneous multiple talkers using a frame of microphone array data. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, pp. 125–128 (2010)
Acknowledgment
The authors acknowledge financial support from: FONDECYT No. 3190147 and FONDECYT No. 11180107.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Firoozabadi, A.D. et al. (2021). Simultaneous Sound Source Localization by Proposed Cuboids Nested Microphone Array Based on Subband Generalized Eigenvalue Decomposition. In: Hassanien, A.E., Slowik, A., Snášel, V., El-Deeb, H., Tolba, F.M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020. AISI 2020. Advances in Intelligent Systems and Computing, vol 1261. Springer, Cham. https://doi.org/10.1007/978-3-030-58669-0_72
Download citation
DOI: https://doi.org/10.1007/978-3-030-58669-0_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58668-3
Online ISBN: 978-3-030-58669-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)