Multi-channel speech enhancement using early and late fusion convolutional neural networks

Priyanka, S. Siva; Kumar, T. Kishore

doi:10.1007/s11760-022-02301-4

Multi-channel speech enhancement using early and late fusion convolutional neural networks

Original Paper
Published: 05 August 2022

Volume 17, pages 973–979, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

S. Siva Priyanka¹ &
T. Kishore Kumar¹

440 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, novel early and late fusion convolutional neural networks (CNNs) are proposed for multi-channel speech enhancement. Two beamformers namely delay-and-sum and minimum variance distortionless response are used as prefilters to suppress the effect of noise in the input microphone array. Enhanced outputs of the two beamformers are to form two-channel input to the CNN and it is known as the early fusion CNN model. On the other hand, outputs of the beamformers are considered as inputs to the two individual CNNs separately. Further, outputs of CNNs are concatenated to form an input to the fully connected layers and it is known as the late fusion CNN model. Performance of the proposed early and late fusion CNNs is evaluated on the multi-conditional microphone array data simulated from the standard TIMIT dataset. Results demonstrate the superiority of our proposed models when compared to popular existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-Channel Speech Enhancement Using Neural Network Adaptive Beamforming

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

Article 06 October 2021

A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain

References

Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979)
Article Google Scholar
Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 116–120. IEEE (2015)
Bai, M.R., Ih, J.G., Benesty, J.: Acoustic Array Systems: Theory, Implementation, and Application. John Wiley & Sons (2013)
Buckley, K.: Spatial/spectral filtering with linearly constrained minimum variance beamformers. IEEE Trans. Acoust. Speech Signal Process. 35(3), 249–266 (1987)
Article Google Scholar
Chen, J., Wang, D.: Long short-term memory for speaker generalization in supervised speech separation. J. Acoust. Soc. Am. 141(6), 4705–4714 (2017)
Article MathSciNet Google Scholar
Cherry, E.C.: Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25(5), 975–979 (1953)
Article Google Scholar
Fu, S.W., Tsao, Y., Lu, X.: SNR-aware convolutional neural network modeling for speech enhancement. In: Interspeech, pp. 3768–3772 (2016)
Fu, S.W., Tsao, Y., Lu, X., Kawai, H.: Raw waveform-based speech enhancement by fully convolutional networks. In: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 006–012. IEEE (2017)
Gannot, S., Cohen, I.: Speech enhancement based on the general transfer function GSC and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561–571 (2004)
Article Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: Getting started with the DARPA TIMIT CD-ROM: an acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 107, p. 16 (1988)
Habets, E.A.P., Benesty, J., Cohen, I., Gannot, S., Dmochowski, J.: New insights into the MVDR beamformer in room acoustics. IEEE Trans. Audio Speech Lang. Process. 18(1), 158–170 (2009)
Article Google Scholar
Hirsch, H.G., Pearce, D.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-Automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW) (2000)
Hu, Y., Loizou, P.C.: Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2007)
Article Google Scholar
Lee, B.K., Jeong, J.: Deep neural network-based speech separation combining with MVDR beamformer for automatic speech recognition system. In: IEEE International Conference on Consumer Electronics (ICCE), pp. 1–4. IEEE (2019)
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: Interspeech, vol. 2013, pp. 436–440 (2013)
Masuyama, Y., Togami, M., Komatsu, T.: Consistency-aware multi-channel speech enhancement using deep neural networks. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 821–825. IEEE (2020)
Niwa, K., Hioka, Y., Kobayashi, K.: Post-filter design for speech enhancement in various noisy environments. In: 14th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 35–39. IEEE (2014)
Nossier, S.A., Rizk, M.R.M., Moussa, N.D., el Shehaby, S.: Enhanced smart hearing aid using deep neural networks. Alex. Eng. J. 58(2), 539–550 (2019)
Article Google Scholar
Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)
Article Google Scholar
Pandey, A., Wang, D.: A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Trans. Audio Speech Lang. Process. 27(7), 1179–1188 (2019)
Article Google Scholar
Perrot, V., Polichetti, M., Varray, F., Garcia, D.: So you think you can DAS? A viewpoint on delay-and-sum beamforming. Ultrasonics 111, 106309 (2021)
Article Google Scholar
Shankar, N., Bhat, G.S., Panahi, I.: Comparison and real-time implementation of fixed and adaptive beamformers for speech enhancement on smartphones for hearing study. In: Proceedings of Meetings on Acoustics 178ASA, vol. 39, p. 055009. Acoustical Society of America (2019)
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Tan, K., Chen, J., Wang, D.: Gated residual networks with dilated convolutions for monaural speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 189–198 (2018)
Article Google Scholar
Tan, K., Zhang, X., Wang, D.: Real-time speech enhancement using an efficient convolutional recurrent network for dual-microphone mobile phones in close-talk scenarios. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5751–5755. IEEE (2019)
Tu, Y.H., Du, J., Lee, C.H.: Speech enhancement based on teacher-student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 27(12), 2080–2091 (2019)
Article Google Scholar
Tu, Y., Du, J., Xu, Y., Dai, L., Lee, C.H.: Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. In: The 9th International Symposium on Chinese Spoken Language Processing, pp. 250–254. IEEE (2014)
Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar
Wang, D., Bao, C.: Multi-channel speech enhancement based on the MVDR beamformer and postfilter. In: 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), pp. 1–5. IEEE (2020)
Wang, Y., Narayanan, A., Wang, D.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)
Article Google Scholar
Wang, Z.Q., Wang, D.: Multi-microphone complex spectral mapping for speech dereverberation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 486–490. IEEE (2020)
Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Roux, J.L., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 91–99. Springer (2015)
Williamson, D.S., Wang, Y., Wang, D.: Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 483–492 (2015)
Article Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2013)
Article Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, National Institute of Technology Warangal, Warangal, Telangana, 506 004, India
S. Siva Priyanka & T. Kishore Kumar

Authors

S. Siva Priyanka
View author publications
You can also search for this author in PubMed Google Scholar
T. Kishore Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Siva Priyanka.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Priyanka, S.S., Kumar, T.K. Multi-channel speech enhancement using early and late fusion convolutional neural networks. SIViP 17, 973–979 (2023). https://doi.org/10.1007/s11760-022-02301-4

Download citation

Received: 17 December 2021
Revised: 22 June 2022
Accepted: 26 June 2022
Published: 05 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11760-022-02301-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-channel speech enhancement using early and late fusion convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Dual-Channel Speech Enhancement Using Neural Network Adaptive Beamforming

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-channel speech enhancement using early and late fusion convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Dual-Channel Speech Enhancement Using Neural Network Adaptive Beamforming

Enhancement of single channel speech quality and intelligibility in multiple noise conditions using wiener filter and deep CNN

A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation