research-article

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

Authors:
Gil Levi

The Open University of Israel, Raanana, Israel

The Open University of Israel, Raanana, Israel
View Profile

,
Tal Hassner

USC/Information Sciences Institute & The Open University of Israel, Los Angeles, CA, USA

USC/Information Sciences Institute & The Open University of Israel, Los Angeles, CA, USA
View Profile

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionNovember 2015Pages 503–510https://doi.org/10.1145/2818346.2830587

Published:09 November 2015Publication History

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 503–510

ABSTRACT

We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Unlike the settings often assumed there, far less labeled data is typically available for training emotion classification systems. Our method is therefore designed with the goal of simplifying the problem domain by removing confounding factors from the input images, with an emphasis on image illumination variations. This, in an effort to reduce the amount of data required to effectively train deep CNN models. To this end, we propose novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations. These are applied to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion labeled training data to obtain final classification models. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).

References

T. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary patterns: Application to face recognition. Trans. Pattern Anal. Mach. Intell., 28(12):2037--2041, 2006. Google ScholarDigital Library
I. Borg and P. J. Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.Google Scholar
A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proc. Int. Conf. on Image and video retrieval, pages 401--408. ACM, 2007. Google ScholarDigital Library
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.Google Scholar
H.-C. Chen, M. Z. Comiter, H. Kung, and B. McDanel. Sparse coding trees with application to emotion classification. In Proc. Conf. Comput. Vision Pattern Recognition Workshops. IEEE, 2015.Google ScholarCross Ref
S. E. Choi, Y. J. Lee, S. J. Lee, K. R. Park, and J. Kim. Age estimation using a hierarchical classifier based on global and local facial features. Pattern Recognition, 44(6):1262--1281, 2011. Google ScholarDigital Library
S. Cohen and L. Guibas. The earth mover's distance: Lower bounds and invariance under translation. Technical report, DTIC Document, 1997. Google ScholarDigital Library
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Proc. Int. Conf. Comput. Vision Workshops, pages 2106--2112. IEEE, 2011.Google ScholarCross Ref
A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: EmotiW 2015. In Int. Conf. on Multimodal Interaction. ACM, 2015. Google ScholarDigital Library
S. DaAZMello, N. Blanchard, R. Baker, J. Ocumpaugh, and K. Brawner. Affect-sensitive instructional strategies. Design Recommendations for Intelligent Tutoring Systems: Volume 2-Instructional Management, 2:35, 2014.Google Scholar
S. D'Mello, R. W. Picard, and A. Graesser. Toward an affect-sensitive autotutor. Intelligent Systems, (4):53--61, 2007. Google ScholarDigital Library
E. Eidinger, R. Enbar, and T. Hassner. Age and gender estimation of unfiltered faces. Trans. on Inform. Forensics and Security, 9(12), 2014. Google ScholarDigital Library
T. Hassner, S. Harel, E. Paz, and R. Enbar. Effective face frontalization in unconstrained images. In Proc. Conf. Comput. Vision Pattern Recognition, 2015.Google ScholarCross Ref
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google Scholar
S. E. Kahou, P. Froumenty, and C. Pal. Facial expression analysis based on high dimensional binary features. In European Conf. Comput. Vision, pages 135--147. Springer, 2014.Google Scholar
S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, C. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, et al. Combining modality specific deep neural networks for emotion recognition in video. In Int. Conf. on Multimodal Interaction, pages 543--550. ACM, 2013. Google ScholarDigital Library
T. Kanade, J. F. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition, pages 46--53. IEEE, 2000. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Inform. Process. Syst., pages 1097--1105, 2012.Google Scholar
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541--551, 1989. Google ScholarDigital Library
G. Levi and T. Hassner. Age and gender classification using convolutional neural networks. In Proc. Conf. Comput. Vision Pattern Recognition Workshops, June 2015.Google ScholarCross Ref
M. Liu, S. Li, S. Shan, and X. Chen. AU-aware deep networks for facial expression recognition. In Automatic Face and Gesture Recognition. IEEE, 2013.Google Scholar
M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek. The japanese female facial expression (jaffe) database, 1998.Google Scholar
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Int. Conf. Mach. Learning, pages 689--696. ACM, 2009. Google ScholarDigital Library
T. Ojala, M. Pietikäinen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1):51--59, 1996.Google ScholarCross Ref
T. Ojala, M. Pietikäinen, and T. Mäenpää. A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. In Int. Conf. Pattern Recognition, volume 1, pages 397--406. Springer, 2001. Google ScholarDigital Library
T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Trans. Pattern Anal. Mach. Intell., 24(7):971--987, 2002. Google ScholarDigital Library
V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Image and signal processing, pages 236--243. Springer, 2008. Google ScholarDigital Library
M. Pantic and L. J. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370--1390, 2003.Google ScholarCross Ref
M. Pantic, M. Valstar, R. Rademaker, and L. Maat. Web-based database for facial expression analysis. In Int. Conf. on Multimedia and Expo. IEEE, 2005.Google ScholarCross Ref
Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vision, 40(2):99--121, 2000. Google ScholarDigital Library
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, pages 1--42, 2014. Google ScholarDigital Library
F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. Conf. Comput. Vision Pattern Recognition, pages 815--823, 2015.Google ScholarCross Ref
G. A. Seber. Multivariate observations, volume 252. John Wiley & Sons, 2009.Google Scholar
C. Shan. Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters, 33(4):431--437, 2012. Google ScholarDigital Library
C. Shan, S. Gong, and P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6):803--816, 2009. Google ScholarDigital Library
Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015.Google Scholar
Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. arXiv preprint arXiv:1412.1265, 2014.Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google Scholar
M. Werman, S. Peleg, and A. Rosenfeld. A distance metric for multidimensional histograms. Computer Vision, Graphics, and Image Processing, 32(3):328--336, 1985.Google Scholar
L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. In European Conf. Comput. Vision Workshops, 2008.Google Scholar
L. Wolf, T. Hassner, and Y. Taigman. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. Trans. Pattern Anal. Mach. Intell., 33(10):1978--1990, 2011. Google ScholarDigital Library
D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.Google Scholar
R. Zabih and J. Woodfill. Non-parametric local transforms for computing visual correspondence. In European Conf. Comput. Vision, pages 151--158. Springer, 1994. Google ScholarDigital Library
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Neural Inform. Process. Syst., pages 487--495, 2014.Google Scholar
X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. Conf. Comput. Vision Pattern Recognition, pages 2879--2886. IEEE, 2012. Google ScholarDigital Library

Index Terms

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to ...
Read More
Local binary patterns for multi-view facial expression recognition

Research into facial expression recognition has predominantly been applied to face images at frontal view only. Some attempts have been made to produce pose invariant facial expression classifiers. However, most of these attempts have only considered ...
Read More
Bi-modality Fusion for Emotion Recognition in the Wild
ICMI '19: 2019 International Conference on Multimodal Interaction

The emotion recognition in the wild has been a hot research topic in the field of affective computing. Though some progresses have been achieved, the emotion recognition in the wild is still an unsolved problem due to the challenge of head movement, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
November 2015
678 pages
ISBN:9781450339124
DOI:10.1145/2818346
General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
emotion recognition
emotiw 2015 challenge
local binary patterns
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '15 Paper Acceptance Rate52of127submissions,41%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 201
  Total Citations
  View Citations
- 1,599
  Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild

Local binary patterns for multi-view facial expression recognition

Bi-modality Fusion for Emotion Recognition in the Wild

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild

Local binary patterns for multi-view facial expression recognition

Bi-modality Fusion for Emotion Recognition in the Wild

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media