Skip to main content
Log in

An investigation towards speaker identification using a single-sound-frame

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Traditional neural network-based speaker identification (SI) studies employ a combination of acoustic features extracted from sequential sounds to present the speakers’ voice biometrics in which several sound segments before and after the current segment are stacked and fed to the network. Although this method is particularly important for speech recognition tasks where words are constructed from sequential sound segments, and successful recognition of words depends on the previous phonetic sequences, SI systems should be able to operate based on the distinctive speaker features available in an individual sound segment and identify the speaker regardless of the previously uttered sounds. This paper investigates this hypothesis by proposing a novel text-independent SI model trained at sound level. In order to achieve this, the investigation was conducted by first studying the best distinguishable configuration of coefficients in a single acoustic segment, then to identify the best frame length to overlapping ratio, and finally measuring the reliability of conducting SI using only a single sound segment. Overall more than one hundred SI systems were trained and evaluated, in which results indicate that performing SI using a single acoustic sound frame decreases the complexity of SI and facilitates it since the classifier requires to learn fewer number of acoustic features in compare to the traditional stacked-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ahmad KS, Thosar AS, Nirmal JH, Pande VS (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: ICAPR 2015–2015 8th Int. Conf. Adv. Pattern Recognit. https://doi.org/10.1109/ICAPR.2015.7050669.

  2. Ali H, Tran SN, Benetos E, d’Avila Garcez AS (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Applic 29:13–19. https://doi.org/10.1007/s00521-016-2501-7

    Article  Google Scholar 

  3. Almaadeed N, Aggoun A, Amira A (2015) Speaker identification using multimodal neural networks and wavelet analysis. IET Biometrics 4:18–28. https://doi.org/10.1049/iet-bmt.2014.0011

    Article  Google Scholar 

  4. Biagetti G, Crippa P, Falaschetti L, Orcioni S, Turchetti C, (2016) Robust speaker identification in a meeting with short audio segments. In: Smart Innov. Syst. Technol., pp. 465–477. https://doi.org/10.1007/978-3-319-39627-9_41.

  5. Chandra M, Nandi P, Kumari A, Mishra S (2015) Spectral-subtraction based features for speaker identification. In: Adv. Intell. Syst. Comput. https://doi.org/10.1007/978-3-319-12012-6_58

  6. Chollet F (2015) Keras https://keras.io

  7. Daqrouq K, Al Azzawi KY (2012) Average framing linear prediction coding with wavelet transform for text-independent speaker identification system. Comput Electr Eng 38:1467–1479. https://doi.org/10.1016/j.compeleceng.2012.04.014

    Article  Google Scholar 

  8. Dhonde SB, Chaudhari A, Jagade SM (2017) Integration of mel-frequency cepstral coefficients with log energy and temporal derivatives for text-independent speaker identification. In: Adv. Intell. Syst. Comput. https://doi.org/10.1007/978-981-10-1675-2_78

  9. Do H, Tashev I, Acero A (2011) A new speaker identification algorithm for gaming scenarios. In: ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc. https://doi.org/10.1109/ICASSP.2011.5947588.

  10. Dutta M, Patgiri C, Sarma M, Sarma KK (2014) Closed-set text-independent speaker identification system using multiple ANN classifiers. In: Adv. Intell. Syst. Comput. https://doi.org/10.1007/978-3-319-11933-5_41

  11. Fan X, Hansen JHL (2011) Speaker identification within whispered speech audio streams. IEEE Trans Audio Speech Lang Process 19:1408–1421. https://doi.org/10.1109/TASL.2010.2091631

    Article  Google Scholar 

  12. Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32. https://doi.org/10.1109/MSP.2015.2462851

  13. Hinton G, Deng L, Yu D, Dahl G, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29:82–97. https://doi.org/10.1109/MSP.2012.2205597

    Article  Google Scholar 

  14. Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS One 11. https://doi.org/10.1371/journal.pone.0158520

  15. Kockmann M, Burget L, Honza Černocký J (2011) Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Comm 53:1172–1185. https://doi.org/10.1016/j.specom.2011.01.007

    Article  Google Scholar 

  16. LeCun YA, Bengio Y, Hinton GE (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

  17. Lee H, Largman Y, Pham P, Ng A (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. Adv Neural Inf Proces Syst. https://doi.org/10.1145/1553374.1553453

  18. Lu H, Bernheim Brush AJ, Priyantha B, Karlson AK, Liu J (2011) SpeakerSense: Energy efficient unobtrusive speaker identification on mobile phones. In: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), pp. 188–205. https://doi.org/10.1007/978-3-642-21726-5_12.

  19. Lukic Y, Vogt C, Dürr O, Stadelmann T, Durr O, Stadelmann T (2016) Speaker identification and clustering using convolutional neural networks. In: Mach. Learn. Signal Process. (MLSP), 2016 IEEE 26th Int. Work., IEEE, pp. 13–16. https://doi.org/10.1109/MLSP.2016.7738816.

  20.  Lyons et al. (2020, January 14). jameslyons/python_speech_features: release v0.6.1 (Version 0.6.1). Zenodo. https://doi.org/10.5281/zenodo.3607820.

  21. Maina CW, Walsh JML (2010) Joint speech enhancement and speaker identification using approximate Bayesian inference. In: 2010 44th Annu. Conf. Inf. Sci. Syst. CISS 2010. https://doi.org/10.1109/CISS.2010.5464893.

  22. Matejka P, Glembek O, Novotny O, Plchot O, Grezl F, Burget L, Cernocky JH (2016) Analysis of DNN approaches to speaker identification. In: ICASSP. IEEE Int Conf Acoust Speech Signal Process - Proc, IEEE:5100–5104. https://doi.org/10.1109/ICASSP.2016.7472649

  23. Mohamed A, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20:14–22. https://doi.org/10.1109/TASL.2011.2109382

    Article  Google Scholar 

  24. Nagaraja BG, Jayanna HS (2013) Multilingual speaker identification by combining evidence from lpr and multitaper mfcc. J Intell Syst. https://doi.org/10.1515/jisys-2013-0038

  25. Nakagawa S, Wang L, Ohtsuka S (2012) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process. https://doi.org/10.1109/TASL.2011.2172422

  26. Qi P, Wang L (2011) Experiments of GMM based speaker identification. In: URAI 2011–2011 8th Int. Conf. Ubiquitous Robot. Ambient Intell.. https://doi.org/10.1109/URAI.2011.6145927.

  27. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3:72–83. https://doi.org/10.1109/89.365379

    Article  Google Scholar 

  28. Richardson F, Member S, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. In: IEEE Signal Process. Lett., IEEE, Queensland, Australia, pp. 1671–1675. https://doi.org/10.1109/LSP.2015.2420092

  29. Saeed K, Nammous MK (2007) A speech-and-speaker identification system: feature extraction, description, and classification of speech-signal image. IEEE Trans Ind Electron 54:887–897. https://doi.org/10.1109/TIE.2007.891647

    Article  Google Scholar 

  30. Shahamiri SR, Binti Salim SS (2014) Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inform 28. https://doi.org/10.1016/j.aei.2014.01.001

  31. Shahamiri SR, Salim SSB (2014) A multi-views multi-learners approach towards Dysarthric speech recognition using multi-nets Artifi cial neural networks. IEEE Trans Neural Syst Rehabil Eng 22:1053–1063. https://doi.org/10.1109/TNSRE.2014.2309336

    Article  Google Scholar 

  32. Shahamiri SRSR, Binti Salim SS, Salim SSB (2014) Real-time frequency-based noise-robust automatic speech recognition using multi-nets artificial neural networks: a multi-views multi-learners approach. Neurocomputing 129:199–207. https://doi.org/10.1016/j.neucom.2013.09.040

    Article  Google Scholar 

  33. Sinith MS, Salim A, Gowri Sankar K, Sandeep Narayanan KV, Soman V (2010) A novel method for text-independent speaker identification using MFCC and GMM. In: ICALIP 2010–2010 Int. Conf. Audio, Lang. Image Process. Proc. https://doi.org/10.1109/ICALIP.2010.5684389

  34. Sremath S, Reza S, Singh A, Wang R, Tirumala SS, Shahamiri SR, Garhwal AS, Wang R (2017) Speaker identification features extraction methods : a systematic review. Expert Syst Appl 90:250–271. https://doi.org/10.1016/j.eswa.2017.08.015

    Article  Google Scholar 

  35. Tirumala SS, Shahamiri SR (2017) A deep autoencoder approach for speaker identification. In: ACM Int. Conf. Proceeding Ser. https://doi.org/10.1145/3163080.3163097.

  36. Yu H, Ma Z, Li M, Guo J (2014) Histogram transform model using MFCC features for text-independent speaker identification. In: 2014 48th Asilomar Conf. Signals, Syst. Comput., IEEE, pp. 500–504. https://doi.org/10.1109/ACSSC.2014.7094494

  37. Zhang Z, Wang L, Kai A, Yamada T, Li W, Iwahashi M (2015) Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP J Audio Speech Music Process 2015:12. https://doi.org/10.1186/s13636-015-0056-7

    Article  Google Scholar 

  38. Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. In: ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc. pp. 3997–4001. https://doi.org/10.1109/ICASSP.2014.6854352.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Reza Shahamiri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahamiri, S.R., Thabtah, F. An investigation towards speaker identification using a single-sound-frame. Multimed Tools Appl 79, 31265–31281 (2020). https://doi.org/10.1007/s11042-020-09580-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09580-4

Keywords

Navigation