short-paper

Public Access

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions

Authors:
Xin Guo

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Bin Zhu

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Luisa F. Polanía

American Family Mutual Insurance Company, Madison, WI, USA

American Family Mutual Insurance Company, Madison, WI, USA
View Profile

,
Charles Boncelet

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Kenneth E. Barner

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionOctober 2018Pages 635–639https://doi.org/10.1145/3242969.3264990

Published:02 October 2018Publication History

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 635–639

ABSTRACT

This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, scenes, skeletons and salient regions using visual attention mechanisms are fused to classify the emotion of a group of people in an image as positive, neutral or negative. Experimental results show that the proposed hybrid network achieves 78.98% and 68.08% classification accuracy on the validation and testing sets, respectively. These results outperform the baseline of 64% and 61%, and achieved the first place in the challenge.

References

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-Up and Top-Down Attention for Image Captioning and VQA. CoRR Vol. abs/1707.07998 (2017). {arxiv}1707.07998 http://arxiv.org/abs/1707.07998.Google Scholar
J. Bullington. 2005. Affective computing and emotion recognition systems: the future of biometric surveillance? In Proceedings of the 2nd annual conference on Information security curriculum development. ACM, 95--99. Google ScholarDigital Library
Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2017. VGGFace2: A dataset for recognising faces across pose and age. CoRR Vol. abs/1710.08092 (2017). {arxiv}1710.08092 http://arxiv.org/abs/1710.08092.Google Scholar
Z. Cao, T. Simon, S. Wei, and Y. Sheikh. 2016. Realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016).Google Scholar
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database CVPR.Google Scholar
A. Dhall, A. Asthana, and R. Goecke. 2010. Facial expression based automatic album creation. In International Conference on Neural Information Processing. Springer, 485--492. Google ScholarDigital Library
A. Dhall, R. Goecke, and T. Gedeon. 2015. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing Vol. 6, 1 (2015), 13--26.Google ScholarDigital Library
A. Dhall, J. Joshi, K. Sikka, R. Goecke, and N. Sebe. 2015. The more the merrier: Analysing the affect of a group of people in images IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Vol. Vol. 1. IEEE, 1--8.Google Scholar
Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction (ACM International Conference on Multimodal Interaction 2018 (in press)). ACM. Google ScholarDigital Library
I. J. Goodfellow et al.. 2013. Challenges in representation learning: A report on three machine learning contests International Conference on Neural Information Processing. Springer, 117--124.Google Scholar
X. Guo, L.F. Polan#237;a, and K. E. Barner. 2017. Group-level emotion recognition using deep models on image scene, faces, and skeletons. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 603--608. Google ScholarDigital Library
Xin Guo, Luisa F. Polania, and Kenneth E. Barner. 2018. Smile detection in the wild based on transfer learning. (2018).Google Scholar
Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao. 2016. MS-Celeb-1M: A Dataset and Benchmark for Large Scale Face Recognition European Conference on Computer Vision. Springer.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google Scholar
S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. Vol. 9, 8 (Nov. 1997), 1735--1780. Google ScholarDigital Library
Jie Hu, Li Shen, and Gang Sun. 2017. Squeeze-and-Excitation Networks. CoRR Vol. abs/1709.01507 (2017). {arxiv}1709.01507 http://arxiv.org/abs/1709.01507.Google Scholar
Xiaohua Huang, Abhinav Dhall, Guoying Zhao, Roland Goecke, and Matti Pietikäinen. 2015. Riesz-based Volume Local Binary Pattern and A Novel Group Expression Model for Group Happiness Intensity Analysis. In BMVC. 1--9.Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
J. Li, S. Roy, J. Feng, and T. Sim. 2016. Happiness level prediction with sequential inputs via multiple regressions Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 487--493. Google ScholarDigital Library
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-Margin Softmax Loss for Convolutional Neural Networks Proceedings of The 33rd International Conference on Machine Learning. 507--516. Google ScholarDigital Library
Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning.Google Scholar
Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2204--2212. http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf. Google ScholarDigital Library
W. Mou, O. Celiktutan, and H. Gunes. 2015. Group-level arousal and valence recognition in static images: Face, body and context IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. Vol. 5. IEEE, 1--6.Google Scholar
P. M. Niedenthal and M. Brauer. 2012. Social functionality of human emotion. Annual review of psychology Vol. 63 (2012), 259--285.Google Scholar
O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep Face Recognition. In British Machine Vision Conference.Google Scholar
F. E. Pollick, H. M. Paterson, A. Bruderlin, and A. J. Sanford. 2001. Perceiving affect from arm movement. Cognition Vol. 82, 2 (2001), B51--B61.Google ScholarCross Ref
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical Sequence Training for Image Captioning. CoRR Vol. abs/1612.00563 (2016). {arxiv}1612.00563 http://arxiv.org/abs/1612.00563.Google Scholar
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
L. Tan, K. Zhang, K. Wang, X. Zeng, X. Peng, and Y. Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 549--552. Google ScholarDigital Library
S. Tomas, J. Hanbyul, M. Iain, and S. Yaser. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping CVPR.Google Scholar
T. Vandal, D. McDuff, and R. El Kaliouby. 2015. Event detection: Ultra large-scale clustering of facial expressions IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Vol. Vol. 1. IEEE, 1--8.Google Scholar
Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. CoRR Vol. abs/1704.06904 (2017). {arxiv}1704.06904 http://arxiv.org/abs/1704.06904.Google Scholar
S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. 2016. Convolutional pose machines. In CVPR.Google Scholar
J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J. Movellan. 2009. Toward practical smile detection. IEEE transactions on pattern analysis and machine intelligence Vol. 31, 11 (2009), 2106--2111. Google ScholarDigital Library
J. Wu and J.M. Rehg. 2011. CENTRIST: A Visual Descriptor for Scene Categorization. IEEE Trans. Pattern Anal. Mach. Intell. Vol. 33, 8 (2011), 1489--1501. Google ScholarDigital Library
Huijuan Xu and Kate Saenko. 2016. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII. 451--466.Google Scholar
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2015. Stacked Attention Networks for Image Question Answering. CoRR Vol. abs/1511.02274 (2015). http://arxiv.org/abs/1511.02274.Google Scholar
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters Vol. 23, 10 (Oct. 2016), 1499--1503.Google ScholarCross Ref

Index Terms

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
        Scene understanding
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Recommendations

From individual to group-level emotion recognition: EmotiW 5.0
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

Research in automatic affect recognition has come a long way. This paper describes the fifth Emotion Recognition in the Wild (EmotiW) challenge 2017. EmotiW aims at providing a common benchmarking platform for researchers working on different aspects ...
Read More
Group-level emotion recognition using deep models on image scene, faces, and skeletons
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

This paper presents the work submitted to the Group-level Emotion Recognition sub-challenge, which is part of the 5th Emotion Recognition in the Wild (EmotiW 2017) Challenge. The task of this sub-challenge is to classify the emotion of a group of ...
Read More
Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
emotiw 2018
group-level emotion recognition
multi-model
scene understanding
visual attention
Qualifiers
- short-paper
Conference

Acceptance Rates
ICMI '18 Paper Acceptance Rate63of149submissions,42%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,022
  Total Downloads
- Downloads (Last 12 months)151
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

From individual to group-level emotion recognition: EmotiW 5.0

Group-level emotion recognition using deep models on image scene, faces, and skeletons

Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media