Skip to main content
Log in

Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abd-Arahman, H. (2008). Nutshell in science of tajweed () (pp. 52–69) (in Arabic).

  • Al-Diri, B., & Sharieh, A. (2002). A database for Arabic speech recognition ARABIC_DB (Technical Report). University of Jordan, Amman, Jordan.

  • Al-Diri, B., Sharieh, A., & Qutiashat, M. (2007). A speech recognition model based on tri-phones for the Arabic language. Advances in Modelling and Analysis B: Signal Processing and Pattern Recognition, 50(2), 49–64. ISSN 1240-4543.

    Google Scholar 

  • Al-Otaibi, F. (2001). Speaker-dependant continuous Arabic speech recognition. M.Sc. thesis, King Saud University.

  • Ben Ayed, Y., Fohr, D., Haton, J. P., & Chollet, G. (2003). Confidence measures for keyword spotting using support vector machines. In IEEE international conference on acoustics, speech and signal processing (ICASSP).

    Google Scholar 

  • Bernadis, G., & Bourlard, H. (1998). Confidence measures in hybrid HMM/ANN speech recognition. In Proceedings of the first workshop on text, speech, dialogue.

    Google Scholar 

  • Boite, J. M., Bourlard, H., D’hoore, B., Accaino, S., & Vantieghem, J. (1994). Task independent and dependent training: performance comparison of HMM and hybrid HMM/MLP approaches. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP) (Vol. 1, pp. 617–620).

    Google Scholar 

  • Bourlard, H., & Morgan, N. (1994). Connectionist speech recognition: a hybrid approach. Norwell, Boston: Kluwer Academic.

    Book  Google Scholar 

  • Bourouba, H., & Djemili, R. (2006). New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition. In 2nd information and communication technologies, ICTTA’06.

    Google Scholar 

  • Castellani, A., Botturi, D., Bicego, M., & Fiorini, P. (2004). Hybrid HMM/SVM: model for the analysis and segmentation of teleoperation tasks. In Proceedings of the IEEE international conference on robotics and automation, New Orleans.

    Google Scholar 

  • Chaudhuri, A.K. De, & Chatterjee, D. (2008). A comparative study of kernels for the multi-class support vector machine. In: Proceedings of the 2008 fourth international conference on natural computation. Washington: IEEE Computer Society.

    Google Scholar 

  • Connel, S. (1996). A comparison of hidden Markov model features for the recognition of cursive handwriting. Computer Science Department, Michigan State University, MS Thesis.

  • Dhia, A., & Moustafa, E. (2012). Cross-word modeling for Arabic speech recognition. Springer briefs in electrical computer engineering, search technology (pp. 17–21). Berlin: Springer.

    Google Scholar 

  • Elmahdy, M., Gruhn, R., et al. (2009). Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In Eighth international symposium on natural language processing, SNLP.

    Google Scholar 

  • Faria, A. (2007). An investigation of tandem MLP features for ASR. International Computer Science Institute, TR 07-003.

  • Ganapathiraju, A., et al. (1998). Support vector machines for speech recognition. In Proceedings of the ICSLP, Sydney, Australia (pp. 2923–2926).

    Google Scholar 

  • Gemello, R., Mana, F., Scanzio, S., Laface, P., & De Mori, R. (2006). Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training. In ICASSP.

    Google Scholar 

  • Hermansky, H., & Cox, L. (1991). Perceptual linear predictive (PLP) analysis-resynthesis. In Proceedings of Eurospeech’91, Genova (pp. 329–332).

    Google Scholar 

  • Hyassat, H., & Abu Zitar, R. (2008). Arabic speech recognition using SPHINX engine. International Journal of Speech Technology, 9(3–4), 133–150.

    Google Scholar 

  • Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–556.

    Article  Google Scholar 

  • Joachims, T. (1999). SVMLight: support vector machine. http://www-ai.informatik.unidortmund.de/FORSCHUNG/VERFAHREN/SVM_LIGHT/svm_light.eng.html, University of Dortmund, November 1999.

  • Jodouin, J. F. (1994). Les réseaux de neurones: principes & définitions. Paris: Edition Hermes.

    Google Scholar 

  • Kuo, H. J., Angu, L., et al. (2010). Morphological and syntactic features for Arabic speech recognition. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP).

    Google Scholar 

  • Markel, J. D., & Gray, A. H. Jr. (1976). Linear prediction of speech. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Messaoudi, A., Gauvain, J. L., et al. (2006). Arabic broadcast news transcription using a one million word vocalized vocabulary. In Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP 2006.

    Google Scholar 

  • Mihelic, F., & Zibert, J. (2008). Speech recognition technologies and applications. Vienna: I-TECH.

    Book  Google Scholar 

  • Nofal, M., Abdel, R. E., et al. (2004). The development of acoustic models for command and control Arabic speech recognition system. In International conference on electrical, electronic and computer engineering, ICEEC’04.

    Google Scholar 

  • O’Shaughnessy, D. (2003). Interacting with computers by voice automatic speech recognitions and synthesis. Proceedings of the IEEE, 91(9), 1272–1300.

    Article  Google Scholar 

  • Park, J., Diehl, F., et al. (2009). Training and adapting MLP features for Arabic speech recognition. In IEEE international conference on acoustics, speech and signal processing, ICASSP2009.

    Google Scholar 

  • Rabiner, L.-R., & Juang, B.-H. (1993). Fundamentals of speech recognition. New York: Prentice-Hall.

    Google Scholar 

  • Rajesh, K. A., & Mayank, D. (2011). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308.

    Article  Google Scholar 

  • Saon, G., Soltau, H., et al. (2010). The IBM 2008 GALE Arabic speech transcription system. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP).

    Google Scholar 

  • Schwartz, R., Klovstad, J., Makhoul, J., & Sorensen, J. (1980). A preliminary design of a phonetic vocoder based on a diphone model. In Proceedings IEEE international conference on acoustics, speech, and signal processing (pp. 32–35).

    Chapter  Google Scholar 

  • Selouani, S.-A., & Alotaibi, Y. A. (2011). Adaptation of foreign accented speakers in native Arabic ASR systems. Applied Computer Information, 9(1), 1–10.

    Article  Google Scholar 

  • Shoaib, M., Rasheed, F., Akhtar, J., Awais, M., Masud, S., & Shamail, S. (2003). A novel approach to increase the robustness of speaker independent Arabic speech recognition. In 7th international multi topic conference, INMIC 2003 (pp. 371–376).

    Chapter  Google Scholar 

  • Soltau, H., Saon, G., et al. (2007). The IBM 2006 gale Arabic ASR system. In IEEE international conference on acoustics, speech and signal processing, ICASSP2007.

    Google Scholar 

  • Steve, Y., Gunnar, E., Mark, G., Thomas, H., Dan, K., Liu Gareth M, X. A., Julian, O., Dave, O., Dan, P., Valtcho, V., & Phil, W. (2006). The HTK book (for HTK Version 3.4) (pp. 294–297). Cambridge University Engineering Department.

  • Vapnik, V. (1979). Estimation of dependences based an empirical data. Moscow: Nauka. English translation, Springer, New York (1979).

    Google Scholar 

  • Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.

    Book  MATH  Google Scholar 

  • Xiang, B., Nguyen, K., Nguyen, L., Schwartz, R., & Makhoul, J. (2006). Morphological decomposition for Arabic broadcast news transcription. In Proceedings of ICASSP, Toulouse (Vol. I, pp. 1089–1092).

    Google Scholar 

  • Zarrouk, E., & Ben Ayed, Y. (2011). Automatic speech recognition with hybrid models. In Proceedings of SPED conference (pp. 183–188).

    Google Scholar 

  • Zarrouk, E., & Ben Ayed, Y. (2012, in press). Hybrid SVM/HMM model for the Arab phonemes recognition. The International Arab Journal of Information Technology. Paper ID:5665.

  • Zarrouk, E., Ben Ayed, Y., & Gargouri, F. (2013). Hybrid SVM/HMM model for the recognition of Arabic triphones-based continuous speech. In 10th international multi-conference on systems, signals & devices (SSD), Tunisia.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elyes Zarrouk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zarrouk, E., Ben Ayed, Y. & Gargouri, F. Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study. Int J Speech Technol 17, 223–233 (2014). https://doi.org/10.1007/s10772-013-9221-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9221-5

Keywords

Navigation