skip to main content
10.1145/2818346.2830587acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

Authors Info & Claims
Published:09 November 2015Publication History

ABSTRACT

We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Unlike the settings often assumed there, far less labeled data is typically available for training emotion classification systems. Our method is therefore designed with the goal of simplifying the problem domain by removing confounding factors from the input images, with an emphasis on image illumination variations. This, in an effort to reduce the amount of data required to effectively train deep CNN models. To this end, we propose novel transformations of image intensities to 3D spaces, designed to be invariant to monotonic photometric transformations. These are applied to CASIA Webface images which are then used to train an ensemble of multiple architecture CNNs on multiple representations. Each model is then fine-tuned with limited emotion labeled training data to obtain final classification models. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).

References

  1. T. Ahonen, A. Hadid, and M. Pietikainen. Face description with local binary patterns: Application to face recognition. Trans. Pattern Anal. Mach. Intell., 28(12):2037--2041, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Borg and P. J. Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.Google ScholarGoogle Scholar
  3. A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proc. Int. Conf. on Image and video retrieval, pages 401--408. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.Google ScholarGoogle Scholar
  5. H.-C. Chen, M. Z. Comiter, H. Kung, and B. McDanel. Sparse coding trees with application to emotion classification. In Proc. Conf. Comput. Vision Pattern Recognition Workshops. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. E. Choi, Y. J. Lee, S. J. Lee, K. R. Park, and J. Kim. Age estimation using a hierarchical classifier based on global and local facial features. Pattern Recognition, 44(6):1262--1281, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cohen and L. Guibas. The earth mover's distance: Lower bounds and invariance under translation. Technical report, DTIC Document, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Proc. Int. Conf. Comput. Vision Workshops, pages 2106--2112. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Dhall, O. R. Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: EmotiW 2015. In Int. Conf. on Multimodal Interaction. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. DaAZMello, N. Blanchard, R. Baker, J. Ocumpaugh, and K. Brawner. Affect-sensitive instructional strategies. Design Recommendations for Intelligent Tutoring Systems: Volume 2-Instructional Management, 2:35, 2014.Google ScholarGoogle Scholar
  12. S. D'Mello, R. W. Picard, and A. Graesser. Toward an affect-sensitive autotutor. Intelligent Systems, (4):53--61, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Eidinger, R. Enbar, and T. Hassner. Age and gender estimation of unfiltered faces. Trans. on Inform. Forensics and Security, 9(12), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hassner, S. Harel, E. Paz, and R. Enbar. Effective face frontalization in unconstrained images. In Proc. Conf. Comput. Vision Pattern Recognition, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  15. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  16. S. E. Kahou, P. Froumenty, and C. Pal. Facial expression analysis based on high dimensional binary features. In European Conf. Comput. Vision, pages 135--147. Springer, 2014.Google ScholarGoogle Scholar
  17. S. E. Kahou, C. Pal, X. Bouthillier, P. Froumenty, C. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R. C. Ferrari, et al. Combining modality specific deep neural networks for emotion recognition in video. In Int. Conf. on Multimodal Interaction, pages 543--550. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Kanade, J. F. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition, pages 46--53. IEEE, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Neural Inform. Process. Syst., pages 1097--1105, 2012.Google ScholarGoogle Scholar
  20. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541--551, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Levi and T. Hassner. Age and gender classification using convolutional neural networks. In Proc. Conf. Comput. Vision Pattern Recognition Workshops, June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Liu, S. Li, S. Shan, and X. Chen. AU-aware deep networks for facial expression recognition. In Automatic Face and Gesture Recognition. IEEE, 2013.Google ScholarGoogle Scholar
  23. M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek. The japanese female facial expression (jaffe) database, 1998.Google ScholarGoogle Scholar
  24. J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In Int. Conf. Mach. Learning, pages 689--696. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Ojala, M. Pietikäinen, and D. Harwood. A comparative study of texture measures with classification based on featured distributions. Pattern recognition, 29(1):51--59, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  26. T. Ojala, M. Pietikäinen, and T. Mäenpää. A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. In Int. Conf. Pattern Recognition, volume 1, pages 397--406. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Trans. Pattern Anal. Mach. Intell., 24(7):971--987, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Ojansivu and J. Heikkilä. Blur insensitive texture classification using local phase quantization. In Image and signal processing, pages 236--243. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Pantic and L. J. Rothkrantz. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE, 91(9):1370--1390, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  30. M. Pantic, M. Valstar, R. Rademaker, and L. Maat. Web-based database for facial expression analysis. In Int. Conf. on Multimedia and Expo. IEEE, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  31. Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vision, 40(2):99--121, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision, pages 1--42, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In Proc. Conf. Comput. Vision Pattern Recognition, pages 815--823, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  34. G. A. Seber. Multivariate observations, volume 252. John Wiley & Sons, 2009.Google ScholarGoogle Scholar
  35. C. Shan. Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters, 33(4):431--437, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Shan, S. Gong, and P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6):803--816, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015.Google ScholarGoogle Scholar
  38. Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. arXiv preprint arXiv:1412.1265, 2014.Google ScholarGoogle Scholar
  39. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google ScholarGoogle Scholar
  40. M. Werman, S. Peleg, and A. Rosenfeld. A distance metric for multidimensional histograms. Computer Vision, Graphics, and Image Processing, 32(3):328--336, 1985.Google ScholarGoogle Scholar
  41. L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. In European Conf. Comput. Vision Workshops, 2008.Google ScholarGoogle Scholar
  42. L. Wolf, T. Hassner, and Y. Taigman. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. Trans. Pattern Anal. Mach. Intell., 33(10):1978--1990, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.Google ScholarGoogle Scholar
  44. R. Zabih and J. Woodfill. Non-parametric local transforms for computing visual correspondence. In European Conf. Comput. Vision, pages 151--158. Springer, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Neural Inform. Process. Syst., pages 487--495, 2014.Google ScholarGoogle Scholar
  46. X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. Conf. Comput. Vision Pattern Recognition, pages 2879--2886. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
      November 2015
      678 pages
      ISBN:9781450339124
      DOI:10.1145/2818346

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 November 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICMI '15 Paper Acceptance Rate52of127submissions,41%Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader