Abstract
This paper presents a new hybrid method for continuous Arabic speech recognition based on triphones modelling. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Hidden Markov Models (HMM) standards. In this work, we describe a new approach of categorising Arabic vowels to long and short vowels to be applied on the labeling phase of speech signals. Using this new labeling method, we deduce that SVM/HMM hybrid model is more efficient then HMMs standards and the hybrid system Multi-Layer Perceptron (MLP) with HMM. The obtained results for the Arabic speech recognition system based on triphones are 64.68 % with HMMs, 72.39 % with MLP/HMM and 74.01 % for SVM/HMM hybrid model. The WER obtained for the recognition of continuous speech by the three systems proves the performance of SVM/HMM by obtaining the lowest average for 4 tested speakers 11.42 %.
Similar content being viewed by others
References
Abd-Arahman, H. (2008). Nutshell in science of tajweed () (pp. 52–69) (in Arabic).
Al-Diri, B., & Sharieh, A. (2002). A database for Arabic speech recognition ARABIC_DB (Technical Report). University of Jordan, Amman, Jordan.
Al-Diri, B., Sharieh, A., & Qutiashat, M. (2007). A speech recognition model based on tri-phones for the Arabic language. Advances in Modelling and Analysis B: Signal Processing and Pattern Recognition, 50(2), 49–64. ISSN 1240-4543.
Al-Otaibi, F. (2001). Speaker-dependant continuous Arabic speech recognition. M.Sc. thesis, King Saud University.
Ben Ayed, Y., Fohr, D., Haton, J. P., & Chollet, G. (2003). Confidence measures for keyword spotting using support vector machines. In IEEE international conference on acoustics, speech and signal processing (ICASSP).
Bernadis, G., & Bourlard, H. (1998). Confidence measures in hybrid HMM/ANN speech recognition. In Proceedings of the first workshop on text, speech, dialogue.
Boite, J. M., Bourlard, H., D’hoore, B., Accaino, S., & Vantieghem, J. (1994). Task independent and dependent training: performance comparison of HMM and hybrid HMM/MLP approaches. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP) (Vol. 1, pp. 617–620).
Bourlard, H., & Morgan, N. (1994). Connectionist speech recognition: a hybrid approach. Norwell, Boston: Kluwer Academic.
Bourouba, H., & Djemili, R. (2006). New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition. In 2nd information and communication technologies, ICTTA’06.
Castellani, A., Botturi, D., Bicego, M., & Fiorini, P. (2004). Hybrid HMM/SVM: model for the analysis and segmentation of teleoperation tasks. In Proceedings of the IEEE international conference on robotics and automation, New Orleans.
Chaudhuri, A.K. De, & Chatterjee, D. (2008). A comparative study of kernels for the multi-class support vector machine. In: Proceedings of the 2008 fourth international conference on natural computation. Washington: IEEE Computer Society.
Connel, S. (1996). A comparison of hidden Markov model features for the recognition of cursive handwriting. Computer Science Department, Michigan State University, MS Thesis.
Dhia, A., & Moustafa, E. (2012). Cross-word modeling for Arabic speech recognition. Springer briefs in electrical computer engineering, search technology (pp. 17–21). Berlin: Springer.
Elmahdy, M., Gruhn, R., et al. (2009). Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In Eighth international symposium on natural language processing, SNLP.
Faria, A. (2007). An investigation of tandem MLP features for ASR. International Computer Science Institute, TR 07-003.
Ganapathiraju, A., et al. (1998). Support vector machines for speech recognition. In Proceedings of the ICSLP, Sydney, Australia (pp. 2923–2926).
Gemello, R., Mana, F., Scanzio, S., Laface, P., & De Mori, R. (2006). Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training. In ICASSP.
Hermansky, H., & Cox, L. (1991). Perceptual linear predictive (PLP) analysis-resynthesis. In Proceedings of Eurospeech’91, Genova (pp. 329–332).
Hyassat, H., & Abu Zitar, R. (2008). Arabic speech recognition using SPHINX engine. International Journal of Speech Technology, 9(3–4), 133–150.
Jelinek, F. (1976). Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4), 532–556.
Joachims, T. (1999). SVMLight: support vector machine. http://www-ai.informatik.unidortmund.de/FORSCHUNG/VERFAHREN/SVM_LIGHT/svm_light.eng.html, University of Dortmund, November 1999.
Jodouin, J. F. (1994). Les réseaux de neurones: principes & définitions. Paris: Edition Hermes.
Kuo, H. J., Angu, L., et al. (2010). Morphological and syntactic features for Arabic speech recognition. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP).
Markel, J. D., & Gray, A. H. Jr. (1976). Linear prediction of speech. Berlin: Springer.
Messaoudi, A., Gauvain, J. L., et al. (2006). Arabic broadcast news transcription using a one million word vocalized vocabulary. In Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP 2006.
Mihelic, F., & Zibert, J. (2008). Speech recognition technologies and applications. Vienna: I-TECH.
Nofal, M., Abdel, R. E., et al. (2004). The development of acoustic models for command and control Arabic speech recognition system. In International conference on electrical, electronic and computer engineering, ICEEC’04.
O’Shaughnessy, D. (2003). Interacting with computers by voice automatic speech recognitions and synthesis. Proceedings of the IEEE, 91(9), 1272–1300.
Park, J., Diehl, F., et al. (2009). Training and adapting MLP features for Arabic speech recognition. In IEEE international conference on acoustics, speech and signal processing, ICASSP2009.
Rabiner, L.-R., & Juang, B.-H. (1993). Fundamentals of speech recognition. New York: Prentice-Hall.
Rajesh, K. A., & Mayank, D. (2011). Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I). International Journal of Speech Technology, 14(4), 297–308.
Saon, G., Soltau, H., et al. (2010). The IBM 2008 GALE Arabic speech transcription system. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP).
Schwartz, R., Klovstad, J., Makhoul, J., & Sorensen, J. (1980). A preliminary design of a phonetic vocoder based on a diphone model. In Proceedings IEEE international conference on acoustics, speech, and signal processing (pp. 32–35).
Selouani, S.-A., & Alotaibi, Y. A. (2011). Adaptation of foreign accented speakers in native Arabic ASR systems. Applied Computer Information, 9(1), 1–10.
Shoaib, M., Rasheed, F., Akhtar, J., Awais, M., Masud, S., & Shamail, S. (2003). A novel approach to increase the robustness of speaker independent Arabic speech recognition. In 7th international multi topic conference, INMIC 2003 (pp. 371–376).
Soltau, H., Saon, G., et al. (2007). The IBM 2006 gale Arabic ASR system. In IEEE international conference on acoustics, speech and signal processing, ICASSP2007.
Steve, Y., Gunnar, E., Mark, G., Thomas, H., Dan, K., Liu Gareth M, X. A., Julian, O., Dave, O., Dan, P., Valtcho, V., & Phil, W. (2006). The HTK book (for HTK Version 3.4) (pp. 294–297). Cambridge University Engineering Department.
Vapnik, V. (1979). Estimation of dependences based an empirical data. Moscow: Nauka. English translation, Springer, New York (1979).
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Xiang, B., Nguyen, K., Nguyen, L., Schwartz, R., & Makhoul, J. (2006). Morphological decomposition for Arabic broadcast news transcription. In Proceedings of ICASSP, Toulouse (Vol. I, pp. 1089–1092).
Zarrouk, E., & Ben Ayed, Y. (2011). Automatic speech recognition with hybrid models. In Proceedings of SPED conference (pp. 183–188).
Zarrouk, E., & Ben Ayed, Y. (2012, in press). Hybrid SVM/HMM model for the Arab phonemes recognition. The International Arab Journal of Information Technology. Paper ID:5665.
Zarrouk, E., Ben Ayed, Y., & Gargouri, F. (2013). Hybrid SVM/HMM model for the recognition of Arabic triphones-based continuous speech. In 10th international multi-conference on systems, signals & devices (SSD), Tunisia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zarrouk, E., Ben Ayed, Y. & Gargouri, F. Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study. Int J Speech Technol 17, 223–233 (2014). https://doi.org/10.1007/s10772-013-9221-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9221-5