research-article

Model Fusion for Multimodal Depression Classification and Level Detection

Authors:
Mohammed Senoussaoui

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada
View Profile

,
Milton Sarria-Paja

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada
View Profile

,
João F. Santos

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada
View Profile

,
Tiago H. Falk

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada

Institut National de la Recherche Scientifique, Centre EMT, Montreal, PQ, Canada
View Profile

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion ChallengeNovember 2014Pages 57–63https://doi.org/10.1145/2661806.2661819

Published:07 November 2014Publication History

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Pages 57–63

ABSTRACT

Audio-visual emotion and mood disorder cues have been recently explored to develop tools to assist psychologists and psychiatrists in evaluating a patient's level of depression. In this paper, we present a number of different multimodal depression level predictors using a model fusion approach, in the context of the AVEC14 challenge. We show that an i-vector based representation for short term audio features contains useful information for depression classification and prediction. We also employed a classification step prior to regression to allow having different regression models depending on the presence or absence of depression. Our experiments show that a combination of our audio-based model and two other models based on the LGBP-TOP video features lead to an improvement of 4% over the baseline model proposed by the challenge organizers.

References

A. T. Beck, R. A. Steer, R. Ball, and W. F. Ranieri. Comparison of beck depression inventories-ia and-ii in psychiatric outpatients. Journal of personality assessment, 67(3):588--597, 1996.Google Scholar
N. Cummins, J. Epps, M. Breakspear, and R. Goecke. An investigation of depressed speech detection: Features and normalization. In INTERSPEECH, pages 2997--3000. ISCA, 2011.Google Scholar
N. Cummins, J. Epps, V. Sethu, and J. Krajewski. Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 970--974, May 2014.Google ScholarCross Ref
N. Cummins, J. Joshi, A. Dhall, V. Sethu, R. Goecke, and J. Epps. Diagnosis of depression by behavioural signals: A multimodal approach. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, AVEC '13, pages 11--20, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet. Front-end factor analysis for speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 19(4):788--798, 2011. Google ScholarDigital Library
N. Dehak, P. A. Torres-Carrasquillo, D. A. Reynolds, and R. Dehak. Language recognition via i-vectors and dimensionality reduction. In INTERSPEECH, pages 857--860, 2011.Google Scholar
A. Dobson. An Introduction to Genelarized Linear Models. Chapman & Hall/CRC; 3 edition, 2008.Google Scholar
H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. Advances in neural information processing systems, 9:155--161, 1997.Google Scholar
J. S. Garofolo, L. D. Consortium, et al. TIMIT: acoustic-phonetic continuous speech corpus, 1993.Google Scholar
R. Horwitz, T. F. Quatieri, B. S. Helfer, B. Yu, J. R. Williamson, and J. Mundt. On the relative importance of vocal source, system, and prosody in human depression. In Body Sensor Networks (BSN), 2013 IEEE International Conference on, pages 1--6, May 2013.Google ScholarCross Ref
P. Kenny. A small foot-print i-vector extractor. In Proc. Odyssey, 2012.Google Scholar
P. Lopez-Otero, L. Docio-Fernandez, and C. Garcia-Mateo. A study of acoustic features for the classification of depressed speech. In Proceedings of the International Convention Mipro Conference On Intelligent Systems (CIS), Special Session on Biometrics & Forensics & De-identification and Privacy Protection (BiForD). MIPRO, May 2014.Google ScholarCross Ref
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted gaussian mixture models. Digital signal processing, 10(1):19--41, 2000. Google ScholarDigital Library
S. O. Sadjadi, M. Slaney, and L. Heck. Msr identity toolbox v1.0: A matlab toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter, November 2013.Google Scholar
M. Senoussaoui, P. Kenny, T. Stafylakis, and P. Dumouchel. A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(1):217--227, 2014. Google ScholarDigital Library
S. Shum, N. Dehak, E. Chuangsuwanich, D. A. Reynolds, and J. R. Glass. Exploiting intra-conversation variability for speaker diarization. In INTERSPEECH, pages 945--948, 2011.Google Scholar
D. Sturim, P. Torres-carrasquillo, T. F. Quatieri, N. Malyska, and A. Mccree. Automatic detection of depression in speech using gaussian mixture modeling with factor analysis. In Proceedings of Interspeech, 2011.Google Scholar
E. M. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal Machine Learning Research, 1:211--244, 2001. Google ScholarDigital Library
M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic. AVEC 2014 - 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Audio/Visual Emotion Challenge and Workshop (to appear). SSPNET, November 2014. Google ScholarDigital Library
V. Vapnik. Statistical Learning Theory. Wiley-Interscience, September 1998.Google Scholar
J. R. Williamson, T. F. Quatieri, B. S. Helfer, R. Horwitz, B. Yu, and D. D. Mehta. Vocal biomarkers of depression based on motor incoordination. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, AVEC '13, pages Google ScholarDigital Library

Index Terms

Model Fusion for Multimodal Depression Classification and Level Detection
1. Computing methodologies
  1. Machine learning

Recommendations

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). ...
Read More
Facebook use, envy, and depression among college students

A survey of 736 college students found that Facebook use can trigger feelings of envy.Feelings of envy were found to predict depression symptoms.The effect of surveillance use of Facebook on depression is mediated by feelings of envy.Surveillance use of ...
Read More
Detecting Depression Severity from Vocal Prosody

To investigate the relation between vocal prosody and change in depression severity over time, 57 participants from a clinical trial for treatment of depression were evaluated at seven-week intervals using a semistructured clinical interview for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge
November 2014
110 pages
ISBN:9781450331197
DOI:10.1145/2661806
General Chairs:
Michel Valstar
University of Nottingham, UK
,
Björn Schuller
Technische Universität Münich/Imperial College London, DE/UK
,
Jarek Krajewski
University of Wuppertal, Germany
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
depression
generalized linear models
i-vectors
support vector machine
support vector regression
Qualifiers
- research-article
Conference

Acceptance Rates
AVEC '14 Paper Acceptance Rate8of22submissions,36%Overall Acceptance Rate52of98submissions,53%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 560
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Model Fusion for Multimodal Depression Classification and Level Detection

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Facebook use, envy, and depression among college students

Detecting Depression Severity from Vocal Prosody

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Model Fusion for Multimodal Depression Classification and Level Detection

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Facebook use, envy, and depression among college students

Detecting Depression Severity from Vocal Prosody

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media