research-article

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition

Authors:
Linlin Chao

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Jianhua Tao

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Minghao Yang

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Ya Li

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Zhengqi Wen

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion ChallengeOctober 2015Pages 65–72https://doi.org/10.1145/2808196.2811634

Published:26 October 2015Publication History

AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

Pages 65–72

ABSTRACT

This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated for dimensional emotion recognition problem. The first one is ε -insensitive loss is utilized as the loss function to optimize. Compared to squared loss function, which is the most widely used loss function for dimension emotion recognition, ε -insensitive loss is more robust for the label noises and it can ignore small errors to get stronger correlation between predictions and labels. The other one is temporal pooling. This technique enables temporal modeling in the input features and increases the diversity of the features fed into the forward prediction architecture. Experiments results show the efficiency of key points of the proposed method and competitive results are obtained.

References

J. Tao and T. Tan, "Affective Computing: A Review," Proc. First Int'l Conf. Affective Computing and Intelligent Interaction, J. Tao, T. Tan, and R.W. Picard, eds., pp. 981--995, 2005. Google ScholarDigital Library
Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang, "A survey of affect recognition methods: Audio, visual, and spontaneous expressions," IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39--58. doi:10.1109/TPAMI.2008.52. Google ScholarDigital Library
H. Gunes and M. Pantic, "Automatic, dimensional and continuous emotion recognition," Int. J. Synthetic Emotions, vol. 1, no. 1, pp. 68--99, 2010. Google ScholarDigital Library
J. R. Fontaine, K. R. Scherer, E. B. Roesch, P. C. Ellsworth, "The world of emotions is not two-dimensional," Psychological science, 18(12), 1050--1057,2007.Google ScholarCross Ref
C. Breazeal, "Emotion and sociable humanoid robots" {J}. International Journal of Human-Computer Studies, 2003, 59(1): 119--155. Google ScholarDigital Library
A. Mehrabian, and J. Russell, An approach to environmental psychology. Cambridge, MA: MIT Press.Google Scholar
J. Davitz, Auditory correlates of vocal expression of emotional feeling. In J. Davitz (Ed.), The communication of emotional meaning (pp. 101--112). New York: McGraw-Hill, 1964.Google Scholar
B. Schuller, M. Valstar, F. Eyben, G. Mckeown, R. Cowie and M. Pantic, "Avec 2011--the first international audio/visual emotion challenge." Affective Computing and Intelligent Interaction. Springer Berlin Heidelberg, 2011. 415--424. Google ScholarDigital Library
M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework, Image and Vision Computing, 2012.Google Scholar
Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J. P., Ebrahimi, T.... & Schuller, B. (2014). Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognition Letters.Google Scholar
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J. & Pantic, M. (2014, November). Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 3--10). ACM. Google ScholarDigital Library
Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D. & Pantic, M. (2015). The AV+EC 2015 Multimodal Affect Recognition Challenge: Bridging Across Audio, Video, and Physiological Data.Google Scholar
Chao, L., Tao, J., Yang, M., Li, Y., & Wen, Z. (2014, November). Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 11--18). ACM. Google ScholarDigital Library
Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013, April). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on (pp. 1--8). IEEE.Google ScholarCross Ref
Zaremba W, Sutskever I. Learning to execute {J}. arXiv preprint arXiv:1410.4615, 2014.Google Scholar
Graves A., Jaitly N. Towards end-to-end speech recognition with recurrent neural networks{C}//Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014: 1764--1772.Google Scholar
Cherkassky, V., & Ma, Y. (2002). Selecting of the loss function for robust linear regression. Neural computation.Google Scholar
F. Eyben, K. Scherer, B. Schuller, J. Sundberg, E. André, C.Busso, L. Devillers, J. Epps, P. Laukka, S. Narayanan, and K. Truong. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing, 2015. to appear.Google Scholar
Almaev T R, Valstar M F. Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition{C}//Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 2013: 356--361. Google ScholarDigital Library
Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian, Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2012.Google Scholar
Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.Google ScholarCross Ref
Zeiler M D. ADADELTA: an adaptive learning rate method {J}. arXiv preprint arXiv:1212.5701, 2012.Google Scholar
Schuller B, Valster M, Eyben F, et al. Avec 2012: the continuous audio/visual emotion challenge{C}//Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 2012: 449--456. Google ScholarDigital Library
Valstar M, Schuller B, Smith K., et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge{C}//Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge. ACM, 2013: 3--10. Google ScholarDigital Library
Valstar M, Schuller B, Smith K, et al. Avec 2014: 3d dimensional affect and depression recognition challenge{C}//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014: 3--10. Google ScholarDigital Library
Gunes, H., Schuller, B., 2013. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing: Affect Analysis in Continuous Input 31,120--136. Google ScholarDigital Library
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv: 1408. 5093, 2015.Google Scholar
X. Zhang, L. Zhang, X.-J. Wang and H.-Y. Shum. Finding celebrities in billions of web images. Multimedia, IEEE Transactions on, 14 (4):995--1007, 2012. Google ScholarDigital Library
H.-W. Ng, S. Winkler. A data-driven approach to cleaning large face datasets. Proc. IEEE International Conference on Image Processing (ICIP), Paris, France, Oct. 27--30, 2014.Google ScholarCross Ref
Krizhevsky, A., Sutskever, I., Hinton. G., ImageNet Classification with Deep. Convolutional Neural Networks, NIPS 2012.Google Scholar
H. Gunes, M. Piccardi, M. Pantic, From the Lab to the Real World: Affect Recognition Usng, Affective Computing: Focus on Emotion Expression, Synthesis, and Recognition. I-Tech Education and Publishing, Vienna, Austria, pp. 185 - 218, 2008.Google Scholar

Index Terms

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition
1. Applied computing
  1. Law, social and behavioral sciences
    1. Sociology

Recommendations

AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence, arousal and dominance. In ...
Read More
AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) "Depression, Mood and Emotion" will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological ...
Read More
Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge
October 2015
90 pages
ISBN:9781450337434
DOI:10.1145/2808196
General Chairs:
Fabien Ringeval
University of Passau, Germany
,
Björn Schuller
University of Passau/Imperial College London, Germany/UK
,
Michel Valstar
University of Nottingham, UK
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
affective computing
challenge
emotion recognition
facial expression
multimodal
speech
Qualifiers
- research-article
Conference

Acceptance Rates
AVEC '15 Paper Acceptance Rate9of15submissions,60%Overall Acceptance Rate52of98submissions,53%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 78
  Total Citations
  View Citations
- 1,284
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition

AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge

AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition

AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge

AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media