research-article

Multi-rate Attention Based GRU Model for Engagement Prediction

Authors:
Bin Zhu

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Xinjie Lan

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Xin Guo

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Kenneth E. Barner

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Charles Boncelet

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

ICMI '20: Proceedings of the 2020 International Conference on Multimodal InteractionOctober 2020Pages 841–848https://doi.org/10.1145/3382507.3417965

Published:22 October 2020Publication History

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

Pages 841–848

ABSTRACT

Engagement detection is essential in many areas such as driver attention tracking, employee engagement monitoring, and student engagement evaluation. In this paper, we propose a novel approach using attention based hybrid deep models for the 8th Emotion Recognition in the Wild (EmotiW 2020) Grand Challenge in the category of engagement prediction in the wild EMOTIW2020. The task aims to predict the engagement intensity of subjects in videos, and the subjects are students watching educational videos from Massive Open Online Courses (MOOCs). To complete the task, we propose a hybrid deep model based on multi-rate and multi-instance attention. The novelty of the proposed model can be summarized in three aspects: (a) an attention based Gated Recurrent Unit (GRU) deep network, (b) heuristic multi-rate processing on video based data, and (c) a rigorous and accurate ensemble model. Experimental results on the validation set and test set show that our method makes promising improvements, achieving a competitively low MSE of 0.0541 on the test set, improving on the baseline results by 64%. The proposed model won the first place in the engagement prediction in the wild challenge.

Supplemental Material

3382507.3417965.mp4

mp4

56.2 MB

Download

References

Brandon Amos, Bartosz Ludwiczuk, and Mahadev Satyanarayanan. 2016. OpenFace: A general-purpose face recognition library with mobile applications. Technical Report. Carnegie Mellon University-CS-16--118, Carnegie Mellon University School of Computer Science.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv: cs.CL/1409.0473Google Scholar
Joseph E. Beck. 2005. Engagement Tracing: Using Response Times to Model Student Disengagement. In Proceedings of the 2005 Conference on Artificial Intelligence in Education: Supporting Learning Through Intelligent and Socially Informed Technology. IOS Press, Amsterdam, The Netherlands, The Netherlands, 88--95. http://dl.acm.org/citation.cfm?id=1562524.1562542Google Scholar
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CoRR, Vol. abs/1812.08008 (2018). arxiv: 1812.08008 http://arxiv.org/abs/1812.08008Google Scholar
Cheng Chang, Cheng Zhang, Lei Chen, and Yang Liu. 2018. An Ensemble Model Using Face and Body Tracking for Engagement Detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI '18). ACM, New York, NY, USA, 616--622. https://doi.org/10.1145/3242969.3264986Google ScholarDigital Library
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014a. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arxiv: cs.CL/1406.1078Google Scholar
Kyunghyun Cho, Bart van Merrienboer, cC aglar Gü lcc ehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014b. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR, Vol. abs/1406.1078 (2014). arxiv: 1406.1078 http://arxiv.org/abs/1406.1078Google Scholar
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arxiv: cs.NE/1412.3555Google Scholar
Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. 2020. EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges. In ACM International Conference on Multimodal Interaction 2020.Google ScholarDigital Library
Matthias Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and robust automated machine learning. Advances in Neural Information Processing Systems, Vol. 28 (01 2015), 2944--2952.Google Scholar
V. Garc'ia, J. S. Sánchez, and R. A. Mollineda. 2012. On the Effectiveness of Preprocessing Methods When Dealing with Different Levels of Class Imbalance. Know.-Based Syst., Vol. 25, 1 (Feb. 2012), 13--21. https://doi.org/10.1016/j.knosys.2011.06.013Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (Nov. 1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarDigital Library
Michael I. Jordan. 1990. Attractor Dynamics and Parallelism in a Connectionist Sequential Machine .IEEE Press, 112--127.Google Scholar
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-Scale Video Classification with Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14). IEEE Computer Society, Washington, DC, USA, 1725--1732. https://doi.org/10.1109/CVPR.2014.223Google ScholarDigital Library
Kenneth R. Koedinger, John R. Anderson, William H. Hadley, and Mary A. Mark. 1997. Intelligent Tutoring Goes To School in the Big City. International Journal of Artificial Intelligence in Education (IJAIED), Vol. 8 (1997), 30--43. https://telearn.archives-ouvertes.fr/hal-00197383Google Scholar
Reed W. Larson and Maryse H. Richards. 1991. Boredom in the Middle School Years: Blaming Schools versus Blaming Students. American Journal of Education, Vol. 99, 4 (1991), 418--443. https://doi.org/10.1086/443992 https://doi.org/10.1109/lsp.2016.2603342Google ScholarCross Ref

Index Terms

Multi-rate Attention Based GRU Model for Engagement Prediction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Activity recognition and understanding
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Supervised learning
        Supervised learning by classification
        Supervised learning by regression

Recommendations

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different ...
Read More
Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed ...
Read More
Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction
ICMI '19: 2019 International Conference on Multimodal Interaction

This paper proposes a novel engagement intensity prediction approach, which is also applied in the EmotiW Challenge 2019 and resulted in good performance. The task is to predict the engagement level when a subject student is watching an educational ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
October 2020
920 pages
ISBN:9781450375818
DOI:10.1145/3382507
General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention mechanism
engagement intensity prediction
multi-feature engineering
multi-instance learning
multi-rate processing
transfer learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 459
  Total Downloads
- Downloads (Last 12 months)74
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-rate Attention Based GRU Model for Engagement Prediction

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction

Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction

Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction