Automatic spoken language identification using MFCC based time series features

Biswas, Mainak; Rahaman, Saif; Ahmadian, Ali; Subari, Kamalularifin; Singh, Pawan Kumar

doi:10.1007/s11042-021-11439-1

Automatic spoken language identification using MFCC based time series features

1222: Intelligent Multimedia Data Analytics and Computing
Published: 03 January 2022

Volume 82, pages 9565–9595, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mainak Biswas¹,
Saif Rahaman¹,
Ali Ahmadian^2,3,
Kamalularifin Subari⁴ &
…
Pawan Kumar Singh ORCID: orcid.org/0000-0002-9598-7981¹

979 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Spoken Language Identification (SLID) is a fairly well researched field. It has already been established as a significant first step in all multilingual speech recognition systems. With the rise in ASR technologies in recent years, the importance of SLID has become undeniable. In this work, we propose a model for the recognition of Indian and foreign languages. With the goal of making our model robust to noise from everyday life, we augment our data with noise of varying loudness taken from diverse environments. From the MFCC time series of this augmented data, we extract aggregated macro-level features, and perform feature selection using the FRESH (FeatuRe Extraction based on Scalable Hypothesis tests) algorithm. This helps us obtain a set of features that are relevant to this problem. This filtered set is used to train an Artificial Neural Network. The model is then tested on three standard datasets. Firstly, from the IIT-M IndicTTS speech database, six languages are selected, and an accuracy of 99.93% is obtained. Secondly, the IIIT-H Indic speech database consisting of seven languages is used, and an accuracy of 99.94% is recorded. Lastly, eight languages from the VoxForge dataset are also used, and we achieve an accuracy of 98.43%. The promising results obtained lead us to believe that these features are suitable for capturing language specific characteristics of speech. Hence, we propose that they can be used as standard features for the task of SLID. The source code of our present work can be found by accessing the link: https://github.com/rahamansaif/LID-using-time-series-MFCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LIFA: Language identification from audio with LPCC-G features

Article 14 December 2023

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

Article 04 April 2019

Spoken Language Identification of Indian Languages Using MFCC Features

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Józefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2016) TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. ArXiv, https://arxiv.org/abs/1603.04467.
Albadr MAA, Tiun S, AL-Dhief FT, Sammour MAM (2018) Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PLoS ONE. https://doi.org/10.1371/journal.pone.0194770
Article Google Scholar
Alim SA, Rashid NKA (2018) Some commonly used speech feature extraction algorithms. In: López-Ruiz R (ed) From natural to artificial intelligence-algorithms and applications. IntechOpen, London. https://doi.org/10.5772/intechopen.80419
Chapter Google Scholar
Anjana JS, Poorna SS (2018) Language identification from speech features using SVM and LDA. 2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 1–4. https://doi.org/10.1109/WiSPNET.2018.8538638
Approximate Entropy. (n.d.). https://en.wikipedia.org/wiki/Approximate_entropy
Baby A, Thomas A, Consortium TTS (2016). Resources for Indian languages. https://www.iitm.ac.in/donlab/tts/database.php
Barai B, Das D, Das N, Basu S, Nasipuri M (2019) VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency. Advanced computing and systems for security. Springer, Berlin, pp 125–145
Chapter Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188. https://doi.org/10.1214/aos/1013699998
Article MathSciNet MATH Google Scholar
Biswas M, Rahaman S, Kundu S, Singh PK, Sarkar R (2021) Spoken language identification of Indian languages using MFCC features. In: Kumar P, Singh AK (eds) Machine learning for intelligent multimedia analytics: techniques and applications. Springer, Singapore, pp 311–323
Google Scholar
Christ M, Kempa-Liehr A, Feindt M (2016) Distributed and parallel time series feature extraction for industrial big data applications. ArXiv, https://arxiv.org/abs/1610.07717.
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420
Article Google Scholar
Draghici A, Abeßer J, Lukashevich H (2020) A Study on Spoken Language Identification using Deep Neural Networks. Proceedings of the 15th International Conference on Audio Mostly. https://doi.org/10.1145/3411109.3411123
Delgado-Bonal A, Marshak A (2019) Approximate entropy and sample entropy: a comprehensive tutorial. Entropy 21:541. https://doi.org/10.3390/e21060541
Article MathSciNet Google Scholar
Garain A, Singh PK, Sarkar R (2021) FuzzyGCP: a deep learning architecture for automatic spoken language identification from speech signals. Expert Systems with Applications 168:114416. https://doi.org/10.1016/j.eswa.2020.114416
Article Google Scholar
Gazeau V, Varol C (2018) Automatic spoken language recognition with neural networks. Int J Inf. Technol Comput Sci 10:11–17. https://doi.org/10.5815/ijitcs.2018.08.02
Article Google Scholar
Ghosh A (2020). Ranked: the 100 most spoken languages worldwide. https://www.visualcapitalist.com/100-most-spoken-languages/
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256
Google Scholar
Gupta M, Bharti SS, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. 2017 4th International Conference on Power, Control & Embedded Systems (ICPCES), 1–6.
Heracleous P, Takai K, Yasuda K, Mohammad Y, Yoneyama A (2018) Comparative study on spoken language identification based on deep learning. 2018 26th European Signal Processing Conference (EUSIPCO), 2265–2269. https://doi.org/10.23919/EUSIPCO.2018.8553347
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752. https://doi.org/10.1121/1.399423
Article Google Scholar
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589. https://doi.org/10.1109/89.326616
Article Google Scholar
How many languages. (n.d.). https://www.ethnologue.com/guides/how-many-languages
Jog AH, Jugade OA, Kadegaonkar AS, Birajdar GK (2018) Indian language identification using cochleagram based texture descriptors and ANN Classifier. 2018 15th IEEE India Council International Conference (INDICON), 1–6. https://doi.org/10.1109/INDICON45594.2018.8987167
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 1–15.
Krishna DN, Patil A, Raj M, SaiPrasad H S, Garapati PA (2020) Identification of Indian languages using ghost-VLAD pooling. https://www.researchgate.net/publication/339065645_Identification_of_Indian_Languages_using_Ghost-VLAD_pooling
Korkut C, Haznedaroglu A, Arslan L (2020) Comparison of Deep Learning Methods for Spoken Language Identification BT - Speech and Computer (A. Karpov & R. Potapova (Eds.); pp. 223–231). Springer International Publishing.
Kumar SK (2017) On weight initialization in deep neural networks. ArXiv, https://arxiv.org/abs/1704.08863. 1–9.
Languages of India. (n.d.). https://en.wikipedia.org/wiki/Languages_of_India Accessed 21 Feb 2021
Lopez-moreno I, Gonzalez-dominguez J, Plchot O, Martinez D, Gonzalez-rodriguez J, Moreno P (2014) Automatic language identification using deep neural networks. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5337–5341.
Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580. https://doi.org/10.1109/PROC.1975.9792
Article Google Scholar
Manchala S, Prasad VK, Janaki V (2014) GMM based language identification system using robust features. Int J Speech Technol 17(2):99–105. https://doi.org/10.1007/s10772-013-9209-1
Article Google Scholar
Martin A, Greenberg C (2010). The 2009 NIST language recognition evaluation. Odyssey 2010: Speaker and Language Recognition Workshop, 165–171.
McFee B, Lostanlen V, McVicar M, Metsai A, Balke S, Thomé C, Raffel C, Malek A, Lee D, Zalkow F, Lee K, Nieto O, Mason J, Ellis D, Yamamoto R, Seyfarth S, Battenberg E, Mopoзoв B, Bittner R et al (2020). librosa/librosa: 0.7.2. https://doi.org/10.5281/ZENODO.3606573
Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. Handwörterbuch pattern recognition and artificial intelligence. Academic Press, Cambridge, pp 311–323
Google Scholar
Mukherjee H, Ghosh S, Sen S, SkMd O, Santosh KC, Phadikar S, Roy K (2019) Deep learning for spoken language identification: can we visualize speech signal patterns? Neural Comput Appl 31(12):8483–8501. https://doi.org/10.1007/s00521-019-04468-3
Article Google Scholar
Mukherjee H, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2020) A lazy learning-based language identification from speech using MFCC-2 features. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-019-00928-3
Article Google Scholar
Mukherjee H, Dhar A, Obaidullah SM, Phadikar S, Roy K (2020) Image-based features for speech signal classification. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-019-08553-6
Article Google Scholar
Padi B, Mohan A, Ganapathy S (2020) Towards relevance and sequence modeling in language recognition. IEEE/ACM Transact Audio, Speech Lang Process 28:1223–1232. https://doi.org/10.1109/TASLP.2020.2983580
Article Google Scholar
Prahallad K, Kumar E, Keri V, Suyambu R, Black A (2012) The IIIT-H Indic Speech Databases, INTERSPEECH. http://festvox.org/databases/iiit_voices/
Revay S, Teschke M, Novetta (2019) Multiclass language identification using deep learning on spectral images of audio signals. ArXiv, https://arxiv.org/abs/1905.04348. 1–7.
Sarthak, Shukla S, Mittal G (2019) Spoken language identification using convNets. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11912 LNCS, 252–265. https://doi.org/10.1007/978-3-030-34255-5_17
Stoica P, Moses RL (2005) Spectral analysis of signals. Prentice Hall, Hoboken
Google Scholar
Strang G (2005) Linear algebra and its application. In Linear Algebra 4th Edition, chapter 3.5, pp. 211-221. http://facultymember.iaukhsh.ac.ir/images/Uploaded_files/[Strang_G.]_Linear_algebra_and_its_applications(4)[5881001].PDF
Titus A, Silovsky J, Chen N, Hsiao R, Young M, Ghoshal A. (2020). Improving Language Identification for Multilingual Speakers. https://arxiv.org/pdf/2001.11019.pdf
van der Merwe R. (2020) Triplet entropy loss: improving the generalisation of short speech language identification systems. ArXiv, abs/2012.03775
VoxForge. (n.d.). http://www.voxforge.org/ Accessed 25 Jan 2021

Download references

Author information

Authors and Affiliations

Department of Information Technology, Jadavpur University, Kolkata, 700106, India
Mainak Biswas, Saif Rahaman & Pawan Kumar Singh
Institute of IR 4.0, The National University of Malaysia, 43600 UKM, Bangi, Selangor, Malaysia
Ali Ahmadian
Department of Mathematics, Near East University, Mersin 10, Nicosia, TRNC, Turkey
Ali Ahmadian
Faculty of Science Social and Humanities, Universiti Teknologi Malaysia, Skudai, Johor Bahru, 81310, Malaysia
Kamalularifin Subari

Authors

Mainak Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Saif Rahaman
View author publications
You can also search for this author in PubMed Google Scholar
Ali Ahmadian
View author publications
You can also search for this author in PubMed Google Scholar
Kamalularifin Subari
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawan Kumar Singh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Biswas, M., Rahaman, S., Ahmadian, A. et al. Automatic spoken language identification using MFCC based time series features. Multimed Tools Appl 82, 9565–9595 (2023). https://doi.org/10.1007/s11042-021-11439-1

Download citation

Received: 05 February 2021
Revised: 06 August 2021
Accepted: 17 August 2021
Published: 03 January 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-021-11439-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic spoken language identification using MFCC based time series features

Abstract

Access this article

Similar content being viewed by others

LIFA: Language identification from audio with LPCC-G features

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

Spoken Language Identification of Indian Languages Using MFCC Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic spoken language identification using MFCC based time series features

Abstract

Access this article

Similar content being viewed by others

LIFA: Language identification from audio with LPCC-G features

Recognition of Spoken Languages from Acoustic Speech Signals Using Fourier Parameters

Spoken Language Identification of Indian Languages Using MFCC Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation