Skip to main content
Log in

Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Despite the fact that perceptual evaluation is considered as a gold standard for assessing pathological voice quality, the considerably high inter- and intra-listeners variability associated with different perceptual ratings cannot be ignored. This is probably due to other confounding factors such as listeners’ perceptual bias, listeners’ experience and type of rating scale being used. Automatic objective assessment can serve as a useful tool for diagnosis of pathological voices. Acoustic analysis can be useful in determining severity of dysphonia. The present study aimed to develop a complementary automatic voice assessment system by using multidimensional acoustical measures based on the well-known GRBAS perceptual rating scale. A total of 65 dimensionality measures including traditional acoustic methods, MFCC, Glottal-to-Noise Excitation Methods and nonlinear dynamical analysis were used to compose a matrix of features. To reduce redundancy in features, four different feature extraction techniques were applied. The multiclass classification was carried out by means of RBF kernel-SVM and Extreme Learning Machine. The classification results were moderately correlated with GRBAS ratings of severity, with the best accuracy around 77.55 and 80.58 %, respectively. This suggests that such multidimensional acoustic analysis can be an appropriate assessment tool in determining the presence and severity of voice disorders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Kreiman, J., Gerratt, B. R., & Precoda, K. (1990). Listener experience and perception of voice quality. Journal of Speech, Language, and Hearing Research, 33(1), 103–115.

    Article  Google Scholar 

  2. Rabinov, C. R., Kreiman, J., Gerratt, B. R., & Bielamowicz, S. (1995). Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter. Journal of Speech, Language, and Hearing Research, 38(1), 26–32.

    Article  Google Scholar 

  3. Kreiman, J., Gerratt, B. R., Precoda, K., & Berke, G. S. (1992). Individual differences in voice quality perception. Journal of Speech, Language, and Hearing Research, 35(3), 512–520.

    Article  Google Scholar 

  4. Hirano, M. (1981). Clinical examination of voice. New York: Springer.

    Google Scholar 

  5. Baken, R. J., & Orlikoff, R. F. (2000) Clinical measurement of speech and voice. Cengage Learning.

  6. Michaelis, D., Gramss, T., & Strube, H. W. (1997). Glottal-to-noise excitation ratio–a new measure for describing pathological voices. Acta Acustica United with Acustica, 83(4), 700–706.

    Google Scholar 

  7. Tsanas, A., Little, M. A., McSharry, P. E., & Ramig, L. O. (2011). Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. Journal of the Royal Society Interface, 8(59), 842–855.

    Article  Google Scholar 

  8. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., & Liu, H. H. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Proceedings of the Royal Society of London Series A Mathematical Physical and Engineering Sciences, 454(1971), 903–995.

    Article  MATH  MathSciNet  Google Scholar 

  9. Yan, N., Ng, M. L., Wang, D., Zhang, L., Chan, V., & Ho, R. S. (2013). Nonlinear dynamical analysis of laryngeal, esophageal, and tracheoesophageal speech of Cantonese. Journal of Voice, 27(1), 101–110.

    Article  Google Scholar 

  10. MacCallum, J. K., Cai, L., Zhou, L., Zhang, Y., & Jiang, J. J. (2009). Acoustic analysis of aperiodic voice: perturbation and nonlinear dynamic properties in esophageal phonation. Journal of Voice, 23(3), 283–290.

    Article  Google Scholar 

  11. Godino-Llorente, J. I., Gómez-Vilda, P., Sáenz-Lechón, N., Blanco-Velasco, M., Cruz-Roldán, F., Ferrer, M. A. (2005). Discriminative methods for the detection of voice disorders. In ISCA Tutorial and Research Workshop (ITRW) on Non-Linear Speech Processing.

  12. Dimitriadis, D., Potamianos, A., & Maragos, P. (2009). A comparison of the squared energy and Teager-Kaiser operators for short-term energy estimation in additive noise. IEEE Transactions on Signal Processing, 57(7), 2569–2581.

    Article  MathSciNet  Google Scholar 

  13. Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Transactions on Audio Speech and Language Processing, 15(1), 34–43.

    Article  Google Scholar 

  14. Little, M. A., Costello, D. A., & Harries, M. L. (2011). Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures. Journal of Voice, 25(1), 21–31.

    Article  Google Scholar 

  15. Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

    Article  Google Scholar 

  16. Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. Paper presented at the proceedings of the ninth international workshop on machine learning. Scotland: Aberdeen.

    Google Scholar 

  17. Kononenko, I. (1994). Estimating attributes: analysis and extensions of RELIEF. In Machine Learning: ECML-94 (pp. 171–182). Springer Berlin Heidelberg.

  18. Fletcher, R. (1987). Practical methods of optimization (2nd ed.). Chichester: Wiley.

    MATH  Google Scholar 

  19. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.

    Book  MATH  Google Scholar 

  20. Hsu, C. W., Chang, C. C., Lin, C. J. (2003). A practical guide to support vector classification.

  21. Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1), 489–501.

    Article  Google Scholar 

  22. Ortega, J. M. (1987). Matrix theory. New York: Plenum Press.

    Book  MATH  Google Scholar 

  23. Huang, G. B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems Man and Cybernetics Part B Cybernetics, 42(2), 513–529.

    Article  Google Scholar 

  24. ELM code: http://www.ntu.edu.sg/home/egbhuang/elm_codes.html.

  25. Duda, R. O., Hart, P.E., Stork, D. G.(1999) Pattern classification. Wiley.

  26. Ferreiros, J., & Pardo, J. M. (1999). Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations. Speech Communication, 29(1), 65–76.

    Article  Google Scholar 

  27. Hariharan, M., Polat, K., Sindhu, R., & Yaacob, S. (2013). A hybrid expert system approach for telemonitoring of vocal fold pathology. Applied Soft Computing, 13(10), 4148–4161.

    Article  Google Scholar 

  28. Arias-Londoño, J. D., Godino-Llorente, J. I., Sáenz-Lechón, N., Osma-Ruiz, V., & Castellanos-Domínguez, G. (2010). An improved method for voice pathology detection by means of a HMM-based feature space transformation. Pattern Recognition, 43(9), 3100–3112.

    Article  MATH  Google Scholar 

  29. Sáenz-Lechón, N., Godino-Llorente, J. I., Osma-Ruiz, V., Blanco-Velasco, M., Cruz-Roldán, F. (2006). Automatic assessment of voice quality according to the GRBAS scale. In Engineering in Medicine and Biology Society, 2006.EMBS’06. 28th Annual International Conference of the IEEE. 2478–2481.

  30. Wolfe, V. I., & Ratusnik, D. L. (1988). Acoustic and perceptual measurements of roughness influencing judgments of pitch. Journal of Speech and Hearing Disorders, 53(1), 15–22.

    Article  Google Scholar 

  31. Markaki, M., & Stylianou, Y. (2009). Using modulation spectra for voice pathology detection and classification. In Engineering in Medicine and Biology Society, 2009.EMBC 2009. Annual International Conference of the IEEE. 2514–2517.

  32. Yu, P., Ouaknine, M., Revis, J., & Giovanni, A. (2001). Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements. Journal of Voice, 15(4), 529–542.

    Article  Google Scholar 

  33. Yu, P., Wang, Z., Liu, S., Yan, N., Wang, L., Ng, M. (2014). Multidimensional acoustic analysis for voice quality assessment based on the GRBAS scale. In Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on. 321–325.

  34. Maryn, Y., Corthals, P., Van Cauwenberge, P., Roy, N., & De Bodt, M. (2010). Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. Journal of Voice, 24(5), 540–555.

    Article  Google Scholar 

Download references

Acknowledgments

The research was partially supported by a grant from National Natural Science Foundation of China (NSFC 61135003 and NFSC 61401452), Shenzhen Speech Rehabilitation Technology Laboratory and Guangdong Innovative Research Team Program (No.201001D0104648280).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Yan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(DOC 36 kb)

ESM 2

(DOC 36 kb)

ESM 3

(DOC 37 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Yu, P., Yan, N. et al. Automatic Assessment of Pathological Voice Quality Using Multidimensional Acoustic Analysis Based on the GRBAS Scale. J Sign Process Syst 82, 241–251 (2016). https://doi.org/10.1007/s11265-015-1016-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-1016-2

Keywords

Navigation