ABSTRACT
Audio-visual emotion and mood disorder cues have been recently explored to develop tools to assist psychologists and psychiatrists in evaluating a patient's level of depression. In this paper, we present a number of different multimodal depression level predictors using a model fusion approach, in the context of the AVEC14 challenge. We show that an i-vector based representation for short term audio features contains useful information for depression classification and prediction. We also employed a classification step prior to regression to allow having different regression models depending on the presence or absence of depression. Our experiments show that a combination of our audio-based model and two other models based on the LGBP-TOP video features lead to an improvement of 4% over the baseline model proposed by the challenge organizers.
- A. T. Beck, R. A. Steer, R. Ball, and W. F. Ranieri. Comparison of beck depression inventories-ia and-ii in psychiatric outpatients. Journal of personality assessment, 67(3):588--597, 1996.Google Scholar
- N. Cummins, J. Epps, M. Breakspear, and R. Goecke. An investigation of depressed speech detection: Features and normalization. In INTERSPEECH, pages 2997--3000. ISCA, 2011.Google Scholar
- N. Cummins, J. Epps, V. Sethu, and J. Krajewski. Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 970--974, May 2014.Google ScholarCross Ref
- N. Cummins, J. Joshi, A. Dhall, V. Sethu, R. Goecke, and J. Epps. Diagnosis of depression by behavioural signals: A multimodal approach. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, AVEC '13, pages 11--20, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. Front-end factor analysis for speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 19(4):788--798, 2011. Google ScholarDigital Library
- N. Dehak, P. A. Torres-Carrasquillo, D. A. Reynolds, and R. Dehak. Language recognition via i-vectors and dimensionality reduction. In INTERSPEECH, pages 857--860, 2011.Google Scholar
- A. Dobson. An Introduction to Genelarized Linear Models. Chapman & Hall/CRC; 3 edition, 2008.Google Scholar
- H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. Advances in neural information processing systems, 9:155--161, 1997.Google Scholar
- J. S. Garofolo, L. D. Consortium, et al. TIMIT: acoustic-phonetic continuous speech corpus, 1993.Google Scholar
- R. Horwitz, T. F. Quatieri, B. S. Helfer, B. Yu, J. R. Williamson, and J. Mundt. On the relative importance of vocal source, system, and prosody in human depression. In Body Sensor Networks (BSN), 2013 IEEE International Conference on, pages 1--6, May 2013.Google ScholarCross Ref
- P. Kenny. A small foot-print i-vector extractor. In Proc. Odyssey, 2012.Google Scholar
- P. Lopez-Otero, L. Docio-Fernandez, and C. Garcia-Mateo. A study of acoustic features for the classification of depressed speech. In Proceedings of the International Convention Mipro Conference On Intelligent Systems (CIS), Special Session on Biometrics & Forensics & De-identification and Privacy Protection (BiForD). MIPRO, May 2014.Google ScholarCross Ref
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted gaussian mixture models. Digital signal processing, 10(1):19--41, 2000. Google ScholarDigital Library
- S. O. Sadjadi, M. Slaney, and L. Heck. Msr identity toolbox v1.0: A matlab toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter, November 2013.Google Scholar
- M. Senoussaoui, P. Kenny, T. Stafylakis, and P. Dumouchel. A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(1):217--227, 2014. Google ScholarDigital Library
- S. Shum, N. Dehak, E. Chuangsuwanich, D. A. Reynolds, and J. R. Glass. Exploiting intra-conversation variability for speaker diarization. In INTERSPEECH, pages 945--948, 2011.Google Scholar
- D. Sturim, P. Torres-carrasquillo, T. F. Quatieri, N. Malyska, and A. Mccree. Automatic detection of depression in speech using gaussian mixture modeling with factor analysis. In Proceedings of Interspeech, 2011.Google Scholar
- E. M. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal Machine Learning Research, 1:211--244, 2001. Google ScholarDigital Library
- M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic. AVEC 2014 - 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Audio/Visual Emotion Challenge and Workshop (to appear). SSPNET, November 2014. Google ScholarDigital Library
- V. Vapnik. Statistical Learning Theory. Wiley-Interscience, September 1998.Google Scholar
- J. R. Williamson, T. F. Quatieri, B. S. Helfer, R. Horwitz, B. Yu, and D. D. Mehta. Vocal biomarkers of depression based on motor incoordination. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, AVEC '13, pages Google ScholarDigital Library
Index Terms
- Model Fusion for Multimodal Depression Classification and Level Detection
Recommendations
Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion ChallengeAutomatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). ...
Facebook use, envy, and depression among college students
A survey of 736 college students found that Facebook use can trigger feelings of envy.Feelings of envy were found to predict depression symptoms.The effect of surveillance use of Facebook on depression is mediated by feelings of envy.Surveillance use of ...
Detecting Depression Severity from Vocal Prosody
To investigate the relation between vocal prosody and change in depression severity over time, 57 participants from a clinical trial for treatment of depression were evaluated at seven-week intervals using a semistructured clinical interview for ...
Comments