ABSTRACT
We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Unlike the settings often assumed there, far less labeled data is typically available for training emotion classification systems. Our method is therefore designed with the goal of simplifying the problem domain by removing confounding factors from the input images, with an emphasis on image illumination variations. This, in an effort to reduce the amount of data required to effectively train deep CNN models. To this end, we propose novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations. These are applied to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion labeled training data to obtain final classification models. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).
- T. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary patterns: Application to face recognition. Trans. Pattern Anal. Mach. Intell., 28(12):2037--2041, 2006. Google ScholarDigital Library
- I. Borg and P. J. Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.Google Scholar
- A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proc. Int. Conf. on Image and video retrieval, pages 401--408. ACM, 2007. Google ScholarDigital Library
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.Google Scholar
- H.-C. Chen, M. Z. Comiter, H. Kung, and B. McDanel. Sparse coding trees with application to emotion classification. In Proc. Conf. Comput. Vision Pattern Recognition Workshops. IEEE, 2015.Google ScholarCross Ref
- S. E. Choi, Y. J. Lee, S. J. Lee, K. R. Park, and J. Kim. Age estimation using a hierarchical classifier based on global and local facial features. Pattern Recognition, 44(6):1262--1281, 2011. Google ScholarDigital Library
- S. Cohen and L. Guibas. The earth mover's distance: Lower bounds and invariance under translation. Technical report, DTIC Document, 1997. Google ScholarDigital Library
- C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
- A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Proc. Int. Conf. Comput. Vision Workshops, pages 2106--2112. IEEE, 2011.Google ScholarCross Ref
- A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: EmotiW 2015. In Int. Conf. on Multimodal Interaction. ACM, 2015. Google ScholarDigital Library
- S. DaAZMello, N. Blanchard, R. Baker, J. Ocumpaugh, and K. Brawner. Affect-sensitive instructional strategies. Design Recommendations for Intelligent Tutoring Systems: Volume 2-Instructional Management, 2:35, 2014.Google Scholar
- S. D'Mello, R. W. Picard, and A. Graesser. Toward an affect-sensitive autotutor. Intelligent Systems, (4):53--61, 2007. Google ScholarDigital Library
- E. Eidinger, R. Enbar, and T. Hassner. Age and gender estimation of unfiltered faces. Trans. on Inform. Forensics and Security, 9(12), 2014. Google ScholarDigital Library
- T. Hassner, S. Harel, E. Paz, and R. Enbar. Effective face frontalization in unconstrained images. In Proc. Conf. Comput. Vision Pattern Recognition, 2015.Google ScholarCross Ref
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google Scholar
- S. E. Kahou, P. Froumenty, and C. Pal. Facial expression analysis based on high dimensional binary features. In European Conf. Comput. Vision, pages 135--147. Springer, 2014.Google Scholar
- S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, C. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, et al. Combining modality specific deep neural networks for emotion recognition in video. In Int. Conf. on Multimodal Interaction, pages 543--550. ACM, 2013. Google ScholarDigital Library
- T. Kanade, J. F. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition, pages 46--53. IEEE, 2000. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Inform. Process. Syst., pages 1097--1105, 2012.Google Scholar
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541--551, 1989. Google ScholarDigital Library
- G. Levi and T. Hassner. Age and gender classification using convolutional neural networks. In Proc. Conf. Comput. Vision Pattern Recognition Workshops, June 2015.Google ScholarCross Ref
- M. Liu, S. Li, S. Shan, and X. Chen. AU-aware deep networks for facial expression recognition. In Automatic Face and Gesture Recognition. IEEE, 2013.Google Scholar
- M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek. The japanese female facial expression (jaffe) database, 1998.Google Scholar
- J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Int. Conf. Mach. Learning, pages 689--696. ACM, 2009. Google ScholarDigital Library
- T. Ojala, M. Pietikäinen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1):51--59, 1996.Google ScholarCross Ref
- T. Ojala, M. Pietikäinen, and T. Mäenpää. A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. In Int. Conf. Pattern Recognition, volume 1, pages 397--406. Springer, 2001. Google ScholarDigital Library
- T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Trans. Pattern Anal. Mach. Intell., 24(7):971--987, 2002. Google ScholarDigital Library
- V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Image and signal processing, pages 236--243. Springer, 2008. Google ScholarDigital Library
- M. Pantic and L. J. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370--1390, 2003.Google ScholarCross Ref
- M. Pantic, M. Valstar, R. Rademaker, and L. Maat. Web-based database for facial expression analysis. In Int. Conf. on Multimedia and Expo. IEEE, 2005.Google ScholarCross Ref
- Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vision, 40(2):99--121, 2000. Google ScholarDigital Library
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, pages 1--42, 2014. Google ScholarDigital Library
- F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. Conf. Comput. Vision Pattern Recognition, pages 815--823, 2015.Google ScholarCross Ref
- G. A. Seber. Multivariate observations, volume 252. John Wiley & Sons, 2009.Google Scholar
- C. Shan. Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters, 33(4):431--437, 2012. Google ScholarDigital Library
- C. Shan, S. Gong, and P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6):803--816, 2009. Google ScholarDigital Library
- Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015.Google Scholar
- Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. arXiv preprint arXiv:1412.1265, 2014.Google Scholar
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google Scholar
- M. Werman, S. Peleg, and A. Rosenfeld. A distance metric for multidimensional histograms. Computer Vision, Graphics, and Image Processing, 32(3):328--336, 1985.Google Scholar
- L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. In European Conf. Comput. Vision Workshops, 2008.Google Scholar
- L. Wolf, T. Hassner, and Y. Taigman. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. Trans. Pattern Anal. Mach. Intell., 33(10):1978--1990, 2011. Google ScholarDigital Library
- D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.Google Scholar
- R. Zabih and J. Woodfill. Non-parametric local transforms for computing visual correspondence. In European Conf. Comput. Vision, pages 151--158. Springer, 1994. Google ScholarDigital Library
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Neural Inform. Process. Syst., pages 487--495, 2014.Google Scholar
- X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. Conf. Comput. Vision Pattern Recognition, pages 2879--2886. IEEE, 2012. Google ScholarDigital Library
Index Terms
- Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
Recommendations
Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionThe Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to ...
Local binary patterns for multi-view facial expression recognition
Research into facial expression recognition has predominantly been applied to face images at frontal view only. Some attempts have been made to produce pose invariant facial expression classifiers. However, most of these attempts have only considered ...
Bi-modality Fusion for Emotion Recognition in the Wild
ICMI '19: 2019 International Conference on Multimodal InteractionThe emotion recognition in the wild has been a hot research topic in the field of affective computing. Though some progresses have been achieved, the emotion recognition in the wild is still an unsolved problem due to the challenge of head movement, ...
Comments