skip to main content
10.1145/2808196.2811634acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition

Authors Info & Claims
Published:26 October 2015Publication History

ABSTRACT

This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated for dimensional emotion recognition problem. The first one is ε -insensitive loss is utilized as the loss function to optimize. Compared to squared loss function, which is the most widely used loss function for dimension emotion recognition, ε -insensitive loss is more robust for the label noises and it can ignore small errors to get stronger correlation between predictions and labels. The other one is temporal pooling. This technique enables temporal modeling in the input features and increases the diversity of the features fed into the forward prediction architecture. Experiments results show the efficiency of key points of the proposed method and competitive results are obtained.

References

  1. J. Tao and T. Tan, "Affective Computing: A Review," Proc. First Int'l Conf. Affective Computing and Intelligent Interaction, J. Tao, T. Tan, and R.W. Picard, eds., pp. 981--995, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Zeng, M. Pantic, G.I. Roisman, and T.S. Huang, "A survey of affect recognition methods: Audio, visual, and spontaneous expressions," IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39--58. doi:10.1109/TPAMI.2008.52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Gunes and M. Pantic, "Automatic, dimensional and continuous emotion recognition," Int. J. Synthetic Emotions, vol. 1, no. 1, pp. 68--99, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. R. Fontaine, K. R. Scherer, E. B. Roesch, P. C. Ellsworth, "The world of emotions is not two-dimensional," Psychological science, 18(12), 1050--1057,2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Breazeal, "Emotion and sociable humanoid robots" {J}. International Journal of Human-Computer Studies, 2003, 59(1): 119--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Mehrabian, and J. Russell, An approach to environmental psychology. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  7. J. Davitz, Auditory correlates of vocal expression of emotional feeling. In J. Davitz (Ed.), The communication of emotional meaning (pp. 101--112). New York: McGraw-Hill, 1964.Google ScholarGoogle Scholar
  8. B. Schuller, M. Valstar, F. Eyben, G. Mckeown, R. Cowie and M. Pantic, "Avec 2011--the first international audio/visual emotion challenge." Affective Computing and Intelligent Interaction. Springer Berlin Heidelberg, 2011. 415--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework, Image and Vision Computing, 2012.Google ScholarGoogle Scholar
  10. Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J. P., Ebrahimi, T.... & Schuller, B. (2014). Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognition Letters.Google ScholarGoogle Scholar
  11. Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J. & Pantic, M. (2014, November). Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 3--10). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D. & Pantic, M. (2015). The AV+EC 2015 Multimodal Affect Recognition Challenge: Bridging Across Audio, Video, and Physiological Data.Google ScholarGoogle Scholar
  13. Chao, L., Tao, J., Yang, M., Li, Y., & Wen, Z. (2014, November). Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 11--18). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013, April). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on (pp. 1--8). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  15. Zaremba W, Sutskever I. Learning to execute {J}. arXiv preprint arXiv:1410.4615, 2014.Google ScholarGoogle Scholar
  16. Graves A., Jaitly N. Towards end-to-end speech recognition with recurrent neural networks{C}//Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014: 1764--1772.Google ScholarGoogle Scholar
  17. Cherkassky, V., & Ma, Y. (2002). Selecting of the loss function for robust linear regression. Neural computation.Google ScholarGoogle Scholar
  18. F. Eyben, K. Scherer, B. Schuller, J. Sundberg, E. André, C.Busso, L. Devillers, J. Epps, P. Laukka, S. Narayanan, and K. Truong. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing, 2015. to appear.Google ScholarGoogle Scholar
  19. Almaev T R, Valstar M F. Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition{C}//Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on. IEEE, 2013: 356--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian, Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2012.Google ScholarGoogle Scholar
  21. Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. Zeiler M D. ADADELTA: an adaptive learning rate method {J}. arXiv preprint arXiv:1212.5701, 2012.Google ScholarGoogle Scholar
  23. Schuller B, Valster M, Eyben F, et al. Avec 2012: the continuous audio/visual emotion challenge{C}//Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, 2012: 449--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Valstar M, Schuller B, Smith K., et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge{C}//Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge. ACM, 2013: 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Valstar M, Schuller B, Smith K, et al. Avec 2014: 3d dimensional affect and depression recognition challenge{C}//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. ACM, 2014: 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gunes, H., Schuller, B., 2013. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing: Affect Analysis in Continuous Input 31,120--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv: 1408. 5093, 2015.Google ScholarGoogle Scholar
  28. X. Zhang, L. Zhang, X.-J. Wang and H.-Y. Shum. Finding celebrities in billions of web images. Multimedia, IEEE Transactions on, 14 (4):995--1007, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H.-W. Ng, S. Winkler. A data-driven approach to cleaning large face datasets. Proc. IEEE International Conference on Image Processing (ICIP), Paris, France, Oct. 27--30, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  30. Krizhevsky, A., Sutskever, I., Hinton. G., ImageNet Classification with Deep. Convolutional Neural Networks, NIPS 2012.Google ScholarGoogle Scholar
  31. H. Gunes, M. Piccardi, M. Pantic, From the Lab to the Real World: Affect Recognition Usng, Affective Computing: Focus on Emotion Expression, Synthesis, and Recognition. I-Tech Education and Publishing, Vienna, Austria, pp. 185 - 218, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge
      October 2015
      90 pages
      ISBN:9781450337434
      DOI:10.1145/2808196

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      AVEC '15 Paper Acceptance Rate9of15submissions,60%Overall Acceptance Rate52of98submissions,53%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader