skip to main content
10.1145/3382507.3417965acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multi-rate Attention Based GRU Model for Engagement Prediction

Published:22 October 2020Publication History

ABSTRACT

Engagement detection is essential in many areas such as driver attention tracking, employee engagement monitoring, and student engagement evaluation. In this paper, we propose a novel approach using attention based hybrid deep models for the 8th Emotion Recognition in the Wild (EmotiW 2020) Grand Challenge in the category of engagement prediction in the wild EMOTIW2020. The task aims to predict the engagement intensity of subjects in videos, and the subjects are students watching educational videos from Massive Open Online Courses (MOOCs). To complete the task, we propose a hybrid deep model based on multi-rate and multi-instance attention. The novelty of the proposed model can be summarized in three aspects: (a) an attention based Gated Recurrent Unit (GRU) deep network, (b) heuristic multi-rate processing on video based data, and (c) a rigorous and accurate ensemble model. Experimental results on the validation set and test set show that our method makes promising improvements, achieving a competitively low MSE of 0.0541 on the test set, improving on the baseline results by 64%. The proposed model won the first place in the engagement prediction in the wild challenge.

Skip Supplemental Material Section

Supplemental Material

3382507.3417965.mp4

mp4

56.2 MB

References

  1. Brandon Amos, Bartosz Ludwiczuk, and Mahadev Satyanarayanan. 2016. OpenFace: A general-purpose face recognition library with mobile applications. Technical Report. Carnegie Mellon University-CS-16--118, Carnegie Mellon University School of Computer Science.Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv: cs.CL/1409.0473Google ScholarGoogle Scholar
  3. Joseph E. Beck. 2005. Engagement Tracing: Using Response Times to Model Student Disengagement. In Proceedings of the 2005 Conference on Artificial Intelligence in Education: Supporting Learning Through Intelligent and Socially Informed Technology. IOS Press, Amsterdam, The Netherlands, The Netherlands, 88--95. http://dl.acm.org/citation.cfm?id=1562524.1562542Google ScholarGoogle Scholar
  4. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CoRR, Vol. abs/1812.08008 (2018). arxiv: 1812.08008 http://arxiv.org/abs/1812.08008Google ScholarGoogle Scholar
  5. Cheng Chang, Cheng Zhang, Lei Chen, and Yang Liu. 2018. An Ensemble Model Using Face and Body Tracking for Engagement Detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI '18). ACM, New York, NY, USA, 616--622. https://doi.org/10.1145/3242969.3264986Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014a. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arxiv: cs.CL/1406.1078Google ScholarGoogle Scholar
  7. Kyunghyun Cho, Bart van Merrienboer, cC aglar Gü lcc ehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014b. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR, Vol. abs/1406.1078 (2014). arxiv: 1406.1078 http://arxiv.org/abs/1406.1078Google ScholarGoogle Scholar
  8. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arxiv: cs.NE/1412.3555Google ScholarGoogle Scholar
  9. Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. 2020. EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges. In ACM International Conference on Multimodal Interaction 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Matthias Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and robust automated machine learning. Advances in Neural Information Processing Systems, Vol. 28 (01 2015), 2944--2952.Google ScholarGoogle Scholar
  11. V. Garc'ia, J. S. Sánchez, and R. A. Mollineda. 2012. On the Effectiveness of Preprocessing Methods When Dealing with Different Levels of Class Imbalance. Know.-Based Syst., Vol. 25, 1 (Feb. 2012), 13--21. https://doi.org/10.1016/j.knosys.2011.06.013Google ScholarGoogle Scholar
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (Nov. 1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael I. Jordan. 1990. Attractor Dynamics and Parallelism in a Connectionist Sequential Machine .IEEE Press, 112--127.Google ScholarGoogle Scholar
  14. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-Scale Video Classification with Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14). IEEE Computer Society, Washington, DC, USA, 1725--1732. https://doi.org/10.1109/CVPR.2014.223Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kenneth R. Koedinger, John R. Anderson, William H. Hadley, and Mary A. Mark. 1997. Intelligent Tutoring Goes To School in the Big City. International Journal of Artificial Intelligence in Education (IJAIED), Vol. 8 (1997), 30--43. https://telearn.archives-ouvertes.fr/hal-00197383Google ScholarGoogle Scholar
  16. Reed W. Larson and Maryse H. Richards. 1991. Boredom in the Middle School Years: Blaming Schools versus Blaming Students. American Journal of Education, Vol. 99, 4 (1991), 418--443. https://doi.org/10.1086/443992 https://doi.org/10.1109/lsp.2016.2603342Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-rate Attention Based GRU Model for Engagement Prediction

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
              October 2020
              920 pages
              ISBN:9781450375818
              DOI:10.1145/3382507

              Copyright © 2020 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 22 October 2020

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate453of1,080submissions,42%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader