Deep learning for spoken language identification: Can we visualize speech signal patterns?

Mukherjee, Himadri; Ghosh, Subhankar; Sen, Shibaprasad; Sk Md, Obaidullah; Santosh, K. C.; Phadikar, Santanu; Roy, Kaushik

doi:10.1007/s00521-019-04468-3

Deep learning for spoken language identification: Can we visualize speech signal patterns?

Original Article
Published: 05 September 2019

Volume 31, pages 8483–8501, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Himadri Mukherjee¹,
Subhankar Ghosh⁵,
Shibaprasad Sen⁶,
Obaidullah Sk Md²,
K. C. Santosh ORCID: orcid.org/0000-0003-4176-0236³,
Santanu Phadikar⁴ &
…
Kaushik Roy¹

849 Accesses
23 Citations
Explore all metrics

Abstract

Western countries entertain speech recognition-based applications. It does not happen in a similar magnitude in East Asia. Language complexity could potentially be one of the primary reasons behind this lag. Besides, multilingual countries like India need to be considered so that language identification (words and phrases) can be possible through speech signals. Unlike the previous works, in this paper, we propose to use speech signal patterns for spoken language identification, where image-based features are used. The concept is primarily inspired from the fact that speech signal can be read/visualized. In our experiment, we use spectrograms (for image data) and deep learning for spoken language classification. Using the IIIT-H Indic speech database for Indic languages, we achieve the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results. Furthermore, for a relative decrease of 4018.60% in the signal-to-noise ratio, a decrease of only 0.50% in accuracy tells us the fact that our concept is fairly robust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review into deep learning techniques for spoken language identification

Article 14 April 2022

Comparison of Deep Learning Methods for Spoken Language Identification

Spoken Language Identification for Native Indian Languages Using Deep Learning Techniques

References

Pan S-T, Lan M-L (2014) An efficient hybrid learning algorithm for neural network-based speech recognition systems on FPGA chip. Neural Comput Appl 24(7–8):1879–1885
Article Google Scholar
Mustafa MK, Allen T, Appiah K (2019) A comparative review of dynamic neural networks and hidden Markov model methods for mobile on-device speech recognition. Neural Comput Appl 31(2):891–899
Article Google Scholar
Jun S, Kim M, Oh M, Park H-M (2013) Robust speech recognition based on independent vector analysis using harmonic frequency dependency. Neural Comput Appl 22(7–8):1321–1327
Article Google Scholar
Dua M, Aggarwal R, Biswas M (2018) Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3499-9
Article Google Scholar
Dudley WH (1939) The vocoder. Bell Labs Rec 18:122
Google Scholar
Mukherjee H, Halder C, Phadikar S, Roy K (2017) Read—a Bangla phoneme recognition system. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, pp 599–607
Tang Z, Wang D, Chen Y, Shi Y, Li L (2017) Phone-aware neural language identification. In: 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA). IEEE, pp 1–6
Giwa O, Davel MH (2017) The effect of language identification accuracy on speech recognition accuracy of proper names. In: 2017 Pattern recognition association of South Africa and robotics and mechatronics (PRASA-RobMech). IEEE, pp 187–192
Gunawan TS, Husain R, Kartiwi M (2017) Development of language identification system using MFCC and vector quantization. In: 2017 IEEE 4th international conference on smart instrumentation, measurement and application (ICSIMA). IEEE, pp 1–4
Masumura R, Asami T, Masataki H, Aono Y (2017) Parallel phonetically aware DNNS and LSTM-RNNS for frame-by-frame discriminative modeling of spoken language identification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5260–5264
He J, Zhang Z, Zhao X, Li P, Yan Y (2016) Similar language identification for Uyghur and Kazakh on short spoken texts. In: 2016 8th international conference on intelligent human–machine systems and cybernetics (IHMSC), vol 2. IEEE, pp 496–499
Jin M, Song Y, McLoughlin I, Dai L-R (2018) LID-senones and their statistics for language identification. IEEE/ACM Trans Audio Speech Lang Process 26(1):171–183
Article Google Scholar
Mukherjee H, Obaidullah SM, Phadikar S, Roy K (2018) A Dravidian language identification system. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 2654–2657
Gupta M, Bharti SS, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. In: 2017 4th international conference on power, control & embedded systems (ICPCES).IEEE, pp 1–6
Madhu C, George A, Mary L (2017) Automatic language identification for seven Indian languages using higher level features. In: 2017 IEEE international conference on signal processing, informatics, communication and energy systems (SPICES). IEEE, pp 1–6
Nercessian S, Torres-Carrasquillo P, Martinez-Montes G (2016) Approaches for language identification in mismatched environments. In: 2016 IEEE spoken language technology workshop (SLT). IEEE, pp 335–340
Rebai I, BenAyed Y, Mahdi W (2017) Improving of open-set language identification by using deep SVM and thresholding functions. In: 2017 IEEE/ACS 14th international conference on computer systems and applications (AICCSA). IEEE, pp 796–802
Berkling KM, Arai T, Barnard E (1994) Analysis of phoneme-based features for language identification. In: Proceedings of ICASSP’94. IEEE international conference on acoustics, speech and signal processing, vol 1. IEEE, pp I–289
Srivastava BML, Vydana H, Vuppala AK, Shrivastava M (2017) Significance of neural phonotactic models for large-scale spoken language identification. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 2144–2151
Tang Z, Wang D, Chen Y, Li L, Abel A (2018) Phonetic temporal neural model for language identification. IEEE/ACM Trans Audio Speech Lang Process 26(1):134–144
Article Google Scholar
Mukherjee H, Obaidullah SM, Santosh K, Phadikar S, Roy K (2019) A lazy learning-based language identification from speech using MFCC-2 features. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-019-00928-3
Article Google Scholar
Mukherjee H, Dhar A, Phadikar S, Roy K (2017) RECAL—a language identification system. In: 2017 international conference on signal processing and communication (ICSPC). IEEE, pp 300–304
Watanabe S, Hori T, Hershey JR (2017) Language independent end-to-end architecture for joint language identification and speech recognition. In: 2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, pp 265–271
Revathi A, Jeyalakshmi C, Muruganantham T (2018) Perceptual features based rapid and robust language identification system for various Indian classical languages. In: Computational vision and bio inspired computing. Springer, pp 291–305
Zissman MA, Singer E (1994) Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling. In: Proceedings of ICASSP’94. IEEE international conference on acoustics, speech and signal processing, vol 1. IEEE, pp I–305
Zissman MA (1995) Language identification using phoneme recognition and phonotactic language modeling. In: 1995 international conference on acoustics, speech, and signal processing, vol 5. IEEE, pp 3503–3506
Saikia R, Singh SR, Sarmah P (2017) Effect of language independent transcribers on spoken language identification for different Indian languages. In: 2017 international conference on Asian language processing (IALP). IEEE, pp 214–217
Lamel LF, Gauvain J-L (1993) Cross-lingual experiments with phone recognition. In: 1993 IEEE international conference on acoustics, speech, and signal processing, vol 2. IEEE, pp 507–510
Ghozi R, Fraj O, Jaïdane M (2007) Visually-based audio texture segmentation for audio scene analysis. In: 2007 15th European signal processing conference. IEEE, pp 1531–1535
Dennis JW. Sound event recognition in unstructured environments using spectrogram image processing. Nanyang Technological University, Singapore
Montalvo A, Costa YM, Calvo JR (2015) Language identification using spectrogram texture. In: Iberoamerican congress on pattern recognition. Springer, pp 543–550
Prahallad K, Kumar EN, Keri V, Rajendran S, Black AW (2012) The IIIT-H Indic speech databases. In: Thirteenth annual conference of the international speech communication association
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Article Google Scholar
Zhang D, Han X, Deng C (2018) Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J Power Energy Syst 4(3):362–370
Article Google Scholar
Sang J, Yu J, Jain R, Lienhart R, Cui P, Feng J (2018) Deep learning for multimedia: science or technology? In: Proceedings of the 2018 ACM multimedia conference on multimedia conference, ACM, pp 1354–1355
Olivas-Padilla BE, Chacon-Murguia MI (2019) Classification of multiple motor imagery using deep convolutional neural networks and spatial filters. Appl Soft Comput 75:461–472
Article Google Scholar
Chevtchenko SF, Vale RF, Macario V, Cordeiro FR (2018) A convolutional neural network with feature fusion for real-time hand posture recognition. Appl Soft Comput 73:748–766
Article Google Scholar
Wang Y, Chen Y, Yang N, Zheng L, Dey N, Ashour AS, Rajinikanth V, Tavares JMR, Shi F (2019) Classification of mice hepatic granuloma microscopic images based on a deep convolutional neural network. Appl Soft Comput 74:40–50
Article Google Scholar
Mukherjee H, Obaidullah SM, Santosh K, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Technol 21(4):753–760
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Mohanaiah P, Sathyanarayana P, GuruKumar L (2013) Image texture feature extraction using GLCM approach. Int J Sci Res Publ 3(5):1
Google Scholar
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2009) WLD: a robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720
Article Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
MathSciNet MATH Google Scholar
Simons GF, Fennig CD (2017) Ethnologue: languages of Asia. SIL International, Dallas
Google Scholar
Bouguelia M-R, Nowaczyk S, Santosh K, Verikas A (2018) Agreeing to disagree: active learning with noisy labels without crowdsourcing. Int J Mach Learn Cybern 9(8):1307–1319
Article Google Scholar
Bhattacharyya S, Snasel V, Dey A, Dey S, Konar D (2018) Quantum spider monkey optimization (QSMO) algorithm for automatic gray-scale image clustering. In: 2018 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1869–1874
Nath SS, Mishra G, Kar J, Chakraborty S, Dey N (2014) A survey of image classification methods and techniques. In: 2014 international conference on control, instrumentation, communication and computational technologies (ICCICCT). IEEE, pp 554–557
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken
MATH Google Scholar
Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee & Kaushik Roy
Department of Computer Science and Engineering, Aliah University, Kolkata, India
Obaidullah Sk Md
Department of Computer Science, The University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
Santanu Phadikar
CVPR Unit, Indian Statistical Institute, Kolkata, India
Subhankar Ghosh
Department of Computer Science and Engineering, Future Institute of Engineering and Management, Kolkata, India
Shibaprasad Sen

Authors

Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Subhankar Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Shibaprasad Sen
View author publications
You can also search for this author in PubMed Google Scholar
Obaidullah Sk Md
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. C. Santosh.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, H., Ghosh, S., Sen, S. et al. Deep learning for spoken language identification: Can we visualize speech signal patterns?. Neural Comput & Applic 31, 8483–8501 (2019). https://doi.org/10.1007/s00521-019-04468-3

Download citation

Received: 26 May 2019
Accepted: 26 August 2019
Published: 05 September 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00521-019-04468-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for spoken language identification: Can we visualize speech signal patterns?

Abstract

Access this article

Similar content being viewed by others

A review into deep learning techniques for spoken language identification

Comparison of Deep Learning Methods for Spoken Language Identification

Spoken Language Identification for Native Indian Languages Using Deep Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep learning for spoken language identification: Can we visualize speech signal patterns?

Abstract

Access this article

Similar content being viewed by others

A review into deep learning techniques for spoken language identification

Comparison of Deep Learning Methods for Spoken Language Identification

Spoken Language Identification for Native Indian Languages Using Deep Learning Techniques

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation