Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study

Zarrouk, Elyes; Ben Ayed, Yassine; Gargouri, Faiez

doi:10.1007/s10772-013-9221-5

Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study

Published: 24 January 2014

Volume 17, pages 223–233, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Elyes Zarrouk¹,
Yassine Ben Ayed¹ &
Faiez Gargouri²

588 Accesses
25 Citations
Explore all metrics

Abstract

This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

References

Abd-Arahman, H. (2008). Nutshell in science of tajweed () (pp. 52–69) (in Arabic).
Al-Diri, B., & Sharieh, A. (2002). A database for Arabic speech recognition ARABIC_DB (Technical Report). University of Jordan, Amman, Jordan.
Al-Diri, B., Sharieh, A., & Qutiashat, M. (2007). A speech recognition model based on tri-phones for the Arabic language. Advances in Modelling and Analysis B: Signal Processing and Pattern Recognition, 50(2), 49–64. ISSN 1240-4543.
Google Scholar
Al-Otaibi, F. (2001). Speaker-dependant continuous Arabic speech recognition. M.Sc. thesis, King Saud University.
Ben Ayed, Y., Fohr, D., Haton, J. P., & Chollet, G. (2003). Confidence measures for keyword spotting using support vector machines. In IEEE international conference on acoustics, speech and signal processing (ICASSP).
Google Scholar
Bernadis, G., & Bourlard, H. (1998). Confidence measures in hybrid HMM/ANN speech recognition. In Proceedings of the first workshop on text, speech, dialogue.
Google Scholar
Boite, J. M., Bourlard, H., D’hoore, B., Accaino, S., & Vantieghem, J. (1994). Task independent and dependent training: performance comparison of HMM and hybrid HMM/MLP approaches. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP) (Vol. 1, pp. 617–620).
Google Scholar
Bourlard, H., & Morgan, N. (1994). Connectionist speech recognition: a hybrid approach. Norwell, Boston: Kluwer Academic.
Book Google Scholar
Bourouba, H., & Djemili, R. (2006). New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition. In 2nd information and communication technologies, ICTTA’06.
Google Scholar
Castellani, A., Botturi, D., Bicego, M., & Fiorini, P. (2004). Hybrid HMM/SVM: model for the analysis and segmentation of teleoperation tasks. In Proceedings of the IEEE international conference on robotics and automation, New Orleans.
Google Scholar
Chaudhuri, A.K. De, & Chatterjee, D. (2008). A comparative study of kernels for the multi-class support vector machine. In: Proceedings of the 2008 fourth international conference on natural computation. Washington: IEEE Computer Society.
Google Scholar
Connel, S. (1996). A comparison of hidden Markov model features for the recognition of cursive handwriting. Computer Science Department, Michigan State University, MS Thesis.
Dhia, A., & Moustafa, E. (2012). Cross-word modeling for Arabic speech recognition. Springer briefs in electrical computer engineering, search technology (pp. 17–21). Berlin: Springer.
Google Scholar
Elmahdy, M., Gruhn, R., et al. (2009). Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In Eighth international symposium on natural language processing, SNLP.
Google Scholar
Faria, A. (2007). An investigation of tandem MLP features for ASR. International Computer Science Institute, TR 07-003.
Ganapathiraju, A., et al. (1998). Support vector machines for speech recognition. In Proceedings of the ICSLP, Sydney, Australia (pp. 2923–2926).
Google Scholar
Gemello, R., Mana, F., Scanzio, S., Laface, P., & De Mori, R. (2006). Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training. In ICASSP.
Google Scholar
Hermansky, H., & Cox, L. (1991). Perceptual linear predictive (PLP) analysis-resynthesis. In Proceedings of Eurospeech’91, Genova (pp. 329–332).
Google Scholar
Hyassat, H., & Abu Zitar, R. (2008). Arabic speech recognition using SPHINX engine. International Journal of Speech Technology, 9(3–4), 133–150.
Google Scholar
Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–556.
Article Google Scholar
Joachims, T. (1999). SVMLight: support vector machine. http://www-ai.informatik.unidortmund.de/FORSCHUNG/VERFAHREN/SVM_LIGHT/svm_light.eng.html, University of Dortmund, November 1999.
Jodouin, J. F. (1994). Les réseaux de neurones: principes & définitions. Paris: Edition Hermes.
Google Scholar
Kuo, H. J., Angu, L., et al. (2010). Morphological and syntactic features for Arabic speech recognition. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP).
Google Scholar
Markel, J. D., & Gray, A. H. Jr. (1976). Linear prediction of speech. Berlin: Springer.
Book MATH Google Scholar
Messaoudi, A., Gauvain, J. L., et al. (2006). Arabic broadcast news transcription using a one million word vocalized vocabulary. In Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP 2006.
Google Scholar
Mihelic, F., & Zibert, J. (2008). Speech recognition technologies and applications. Vienna: I-TECH.
Book Google Scholar
Nofal, M., Abdel, R. E., et al. (2004). The development of acoustic models for command and control Arabic speech recognition system. In International conference on electrical, electronic and computer engineering, ICEEC’04.
Google Scholar
O’Shaughnessy, D. (2003). Interacting with computers by voice automatic speech recognitions and synthesis. Proceedings of the IEEE, 91(9), 1272–1300.
Article Google Scholar
Park, J., Diehl, F., et al. (2009). Training and adapting MLP features for Arabic speech recognition. In IEEE international conference on acoustics, speech and signal processing, ICASSP2009.
Google Scholar
Rabiner, L.-R., & Juang, B.-H. (1993). Fundamentals of speech recognition. New York: Prentice-Hall.
Google Scholar
Rajesh, K. A., & Mayank, D. (2011). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308.
Article Google Scholar
Saon, G., Soltau, H., et al. (2010). The IBM 2008 GALE Arabic speech transcription system. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP).
Google Scholar
Schwartz, R., Klovstad, J., Makhoul, J., & Sorensen, J. (1980). A preliminary design of a phonetic vocoder based on a diphone model. In Proceedings IEEE international conference on acoustics, speech, and signal processing (pp. 32–35).
Chapter Google Scholar
Selouani, S.-A., & Alotaibi, Y. A. (2011). Adaptation of foreign accented speakers in native Arabic ASR systems. Applied Computer Information, 9(1), 1–10.
Article Google Scholar
Shoaib, M., Rasheed, F., Akhtar, J., Awais, M., Masud, S., & Shamail, S. (2003). A novel approach to increase the robustness of speaker independent Arabic speech recognition. In 7th international multi topic conference, INMIC 2003 (pp. 371–376).
Chapter Google Scholar
Soltau, H., Saon, G., et al. (2007). The IBM 2006 gale Arabic ASR system. In IEEE international conference on acoustics, speech and signal processing, ICASSP2007.
Google Scholar
Steve, Y., Gunnar, E., Mark, G., Thomas, H., Dan, K., Liu Gareth M, X. A., Julian, O., Dave, O., Dan, P., Valtcho, V., & Phil, W. (2006). The HTK book (for HTK Version 3.4) (pp. 294–297). Cambridge University Engineering Department.
Vapnik, V. (1979). Estimation of dependences based an empirical data. Moscow: Nauka. English translation, Springer, New York (1979).
Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Book MATH Google Scholar
Xiang, B., Nguyen, K., Nguyen, L., Schwartz, R., & Makhoul, J. (2006). Morphological decomposition for Arabic broadcast news transcription. In Proceedings of ICASSP, Toulouse (Vol. I, pp. 1089–1092).
Google Scholar
Zarrouk, E., & Ben Ayed, Y. (2011). Automatic speech recognition with hybrid models. In Proceedings of SPED conference (pp. 183–188).
Google Scholar
Zarrouk, E., & Ben Ayed, Y. (2012, in press). Hybrid SVM/HMM model for the Arab phonemes recognition. The International Arab Journal of Information Technology. Paper ID:5665.
Zarrouk, E., Ben Ayed, Y., & Gargouri, F. (2013). Hybrid SVM/HMM model for the recognition of Arabic triphones-based continuous speech. In 10th international multi-conference on systems, signals & devices (SSD), Tunisia.
Google Scholar

Download references

Author information

Authors and Affiliations

National School of Engineers of Sfax, El Habib City, Sfax, Tunisia
Elyes Zarrouk & Yassine Ben Ayed
Higher Institute of Computer Science and Multimedia at Sfax University, Sfax University, Sfax, Tunisia
Faiez Gargouri

Authors

Elyes Zarrouk
View author publications
You can also search for this author in PubMed Google Scholar
Yassine Ben Ayed
View author publications
You can also search for this author in PubMed Google Scholar
Faiez Gargouri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elyes Zarrouk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zarrouk, E., Ben Ayed, Y. & Gargouri, F. Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study. Int J Speech Technol 17, 223–233 (2014). https://doi.org/10.1007/s10772-013-9221-5

Download citation

Received: 16 June 2013
Accepted: 20 December 2013
Published: 24 January 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10772-013-9221-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study

Abstract

Access this article