On the Effectiveness of Self-Training in MOOC Dropout Prediction

Yamini Goel; Rinkaj Goyal

doi:10.1515/comp-2020-0153

Open Access Published by De Gruyter Open Access July 22, 2020

On the Effectiveness of Self-Training in MOOC Dropout Prediction

Yamini Goel and Rinkaj Goyal

From the journal Open Computer Science

https://doi.org/10.1515/comp-2020-0153

Abstract

Massive open online courses (MOOCs) have gained enormous popularity in recent years and have attracted learners worldwide. However, MOOCs face a crucial challenge in the high dropout rate, which varies between 91%-93%. An interplay between different learning analytics strategies and MOOCs have emerged as a research area to reduce dropout rate. Most existing studies use click-stream features as engagement patterns to predict at-risk students. However, this study uses a combination of click-stream features and the influence of the learner’s friends based on their demographics to identify potential dropouts. Existing predictive models are based on supervised learning techniques that require the bulk of hand-labelled data to train models. In practice, however, scarcity of massive labelled data makes training difficult. Therefore, this study uses self-training, a semi-supervised learning model, to develop predictive models. Experimental results on a public data set demonstrate that semi-supervised models attain comparable results to state-ofthe-art approaches, while also having the flexibility of utilizing a small quantity of labelled data. This study deploys seven well-known optimizers to train the self-training classifiers, out of which, Stochastic Gradient Descent (SGD) outperformed others with the value of F1 score at 94.29%, affirming the relevance of this exposition.

Keywords: Semi-Supervised Learning; Deep Learning; Self-Training; MOOCs; Dropout Prediction

References

[1] Masters K., A brief guide to understanding MOOCs, The Internet Journal of Medical Education, 1(2), 2011, 210.5580/1f21Search in Google Scholar

[2] Hew K.F., Cheung W.S., Students’ and instructors’ use of massive open online courses (MOOCs): Motivations and challenges, Educational research review, 12, 2014, 45–5810.1016/j.edurev.2014.05.001Search in Google Scholar

[3] McAuley A., Stewart B., Siemens G., Cormier D., The MOOC model for digital practice, 2010Search in Google Scholar

[4] Dalipi F., Imran A.S., Kastrati Z., MOOC dropout prediction using machine learning techniques: Review and research challenges, in 2018 IEEE Global Engineering Education Conference (EDUCON), IEEE, 2018, 1007–101410.1109/EDUCON.2018.8363340Search in Google Scholar

[5] Shah D., Online Degrees Slowdown: A Review of MOOC Stats and Trends in 2019 - Class Central, 2020Search in Google Scholar

[6] Shah D., Year of MOOC-based Degrees: A Review of MOOC Stats and Trends in 2018 - Class Central, 2019Search in Google Scholar

[7] Jewitt K., The MOOC Revolution–massive open online courses: the answer to problems facing education or an experiment that could destroy centuries of tradition., Compass: Journal of Learning and Teaching, 10(1), 2017Search in Google Scholar

[8] Hood N., Littlejohn A., Quality in MOOCs: Surveying the terrain, 2016Search in Google Scholar

[9] Clark D., Donald Clark Plan B: MOOCs: taxonomy of 8 types of MOOC, http://donaldclarkplanb.blogspot.com/2013/04/moocs-taxonomy-of-8-types-of-mooc.html, 2013, (Accessed on 02/16/2020)Search in Google Scholar

[10] Swenson P., Taylor N.A., Online teaching in the digital age, Sage Publications, 201210.4135/9781452244174Search in Google Scholar

[11] Jordan K., Massive open online course completion rates revisited: Assessment, length and attrition, The International Review of Research in Open and Distributed Learning, 16(3), 2015, 10.19173/irrodl.v16i3.211210.19173/irrodl.v16i3.2112Search in Google Scholar

[12] Catropa D., Big (MOOC) Data: Inside Higher Ed, 2013Search in Google Scholar

[13] Khalil H., Ebner M., MOOCs completion rates and possible methods to improve retention-A literature review, in EdMedia+ Innovate Learning, Association for the Advancement of Computing in Education (AACE), 2014, 1305–1313Search in Google Scholar

[14] Yuan L., Powell S., MOOCs and open education: Implications for higher education, 2013Search in Google Scholar

[15] Belanger Y., Thornton J., Barr R.C., Bioelectricity: A quantitative approach–Duke University’s first MOOC, EducationXPress, 2013(2), 2013, 1–1Search in Google Scholar

[16] Conole G.G., MOOCs as disruptive technologies: strategies for enhancing the learner experience and quality of MOOCs, Revista de Educación a Distancia, (39), 2013Search in Google Scholar

[17] Onah D.F., Sinclair J., Boyatt R., Dropout rates of massive open online courses: behavioural patterns, EDULEARN14 proceedings, 1, 2014, 5825–5834Search in Google Scholar

[18] Peltier J.W., Drago W., Schibrowsky J.A., Virtual communities and the assessment of online marketing education, Journal of Marketing Education, 25(3), 2003, 260–27610.1177/0273475303257762Search in Google Scholar

[19] Hone K.S., El Said G.R., Exploring the factors affecting MOOC retention: A survey study, Computers & Education, 98, 2016, 157–16810.1016/j.compedu.2016.03.016Search in Google Scholar

[20] Peltier J.W., Schibrowsky J.A., Drago W., The interdependence of the factors influencing the perceived quality of the online learning experience: A causal model, Journal of Marketing Education, 29(2), 2007, 140–15310.1177/0273475307302016Search in Google Scholar

[21] O’Brien B., Online student retention: can it be done?, Association for the Advancement of Computing in Education (AACE), 2002Search in Google Scholar

[22] Open Culture, The Big Problem for MOOCs Visualized, http://www.openculture.com/2013/04/the_big_problem_for_moocs_visualized.html, 2013, (Accessed on 01/30/2020)Search in Google Scholar

[23] Kolowich S., Coursera Takes a Nuanced View of MOOC Dropout Rates, 2013Search in Google Scholar

[24] Grover S., Franz P., Schneider E., Pea R., The MOOC as Distributed Intelligence: Dimensions of a Framework & Evaluation of MOOCs., in CSCL (2), 2013, 42–45Search in Google Scholar

[25] Parr C., Mooc completion rates ’below 7%’, 2013Search in Google Scholar

[26] Toven-Lindsey B., Rhoads R.A., Lozano J.B., Virtually unlimited classrooms: Pedagogical practices in massive open online courses, The internet and higher education, 24, 2015, 1–1210.1016/j.iheduc.2014.07.001Search in Google Scholar

[27] Margaryan A., Bianco M., Littlejohn A., Instructional quality of massive open online courses (MOOCs), Computers & Education, 80, 2015, 77–8310.1016/j.compedu.2014.08.005Search in Google Scholar

[28] Parker A., Interaction in distance education: The critical conversation, AACE Journal, 1(12), 1999, 13–17Search in Google Scholar

[29] Sunar A.S., White S., Abdullah N.A., Davis H.C., How learners’ interactions sustain engagement: a MOOC case study, IEEE Transactions on Learning Technologies, 10(4), 2016, 475–48710.1109/TLT.2016.2633268Search in Google Scholar

[30] Alario-Hoyos C., Pérez-Sanagustín M., Delgado-Kloos C., Muñoz-Organero M., Rodríguez-de-las Heras A., et al., Analysing the impact of built-in and external social tools in a MOOC on educational technologies, in European Conference on Technology Enhanced Learning, Springer, 2013, 5–1810.1007/978-3-642-40814-4_2Search in Google Scholar

[31] Nagrecha S., Dillon J.Z., Chawla N.V., MOOC dropout prediction: lessons learned from making pipelines interpretable, in Proceedings of the 26th International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, 2017, 351–35910.1145/3041021.3054162Search in Google Scholar

[32] Qiu J., Tang J., Liu T.X., Gong J., Zhang C., Zhang Q., Xue Y., Modeling and predicting learning behavior in MOOCs, in Proceedings of the ninth ACM international conference on web search and data mining, ACM, 2016, 93–10210.1145/2835776.2835842Search in Google Scholar

[33] Liang J., Li C., Zheng L., Machine learning application in MOOCs: Dropout prediction, in 2016 11th International Conference on Computer Science & Education (ICCSE), IEEE, 2016, 52–5710.1109/ICCSE.2016.7581554Search in Google Scholar

[34] Whitehill J., Williams J., Lopez G., Coleman C., Reich J., Beyond prediction: First steps toward automatic intervention in MOOC student stopout, Available at SSRN 2611750, 201510.2139/ssrn.2611750Search in Google Scholar

[35] Boyer S., Veeramachaneni K., Transfer learning for predictive models in massive open online courses, in International conference on artificial intelligence in education, Springer, 2015, 54–6310.1007/978-3-319-19773-9_6Search in Google Scholar

[36] Kizilcec R.F., Halawa S., Attrition and achievement gaps in online learning, in Proceedings of the second (2015) ACM conference on learning@ scale, ACM, 2015, 57–6610.1145/2724660.2724680Search in Google Scholar

[37] He J., Bailey J., Rubinstein B.I., Zhang R., Identifying at-risk students in massive open online courses, in Twenty-Ninth AAAI Conference on Artificial Intelligence, 201510.1609/aaai.v29i1.9471Search in Google Scholar

[38] Taylor C., Veeramachaneni K., O’Reilly U.M., Likely to stop? predicting stopout in massive open online courses, arXiv preprint arXiv:1408.3382, 2014Search in Google Scholar

[39] Kloft M., Stiehler F., Zheng Z., Pinkwart N., Predicting MOOC dropout over weeks using machine learning methods, in Proceedings of the EMNLP 2014 workshop on analysis of large scale social interaction in MOOCs, 2014, 60–6510.3115/v1/W14-4111Search in Google Scholar

[40] Amnueypornsakul B., Bhat S., Chinprutthiwong P., Predicting attrition along the way: The UIUC model, in Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 2014, 55–5910.3115/v1/W14-4110Search in Google Scholar

[41] Fei M., Yeung D.Y., Temporal models for predicting student dropout in massive open online courses, in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), IEEE, 2015, 256–26310.1109/ICDMW.2015.174Search in Google Scholar

[42] Wang W., Yu H., Miao C., Deep model for dropout prediction in MOOCs, in Proceedings of the 2nd International Conference on Crowd Science and Engineering, ACM, 2017, 26–3210.1145/3126973.3126990Search in Google Scholar

[43] Xing W., Chen X., Stein J., Marcinkowski M., Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization, Computers in human behavior, 58, 2016, 119–12910.1016/j.chb.2015.12.007Search in Google Scholar

[44] Al-Shabandar R., Hussain A., Laws A., Keight R., Lunn J., Radi N., Machine learning approaches to predict learning outcomes in Massive open online courses, in 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, 713–72010.1109/IJCNN.2017.7965922Search in Google Scholar

[45] Al-Shabandar R., Hussain A., Laws A., Keight R., Lunn J., Towards the differentiation of initial and final retention in massive open online courses, in International Conference on Intelligent Computing, Springer, 2017, 26–3610.1007/978-3-319-63309-1_3Search in Google Scholar

[46] Chaplot D.S., Rhim E., Kim J., Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks, in AIED Workshops, volume 53, 2015, 54–57Search in Google Scholar

[47] Whitehill J., Mohan K., Seaton D., Rosen Y., Tingley D., Delving deeper into MOOC student dropout prediction, arXiv preprint arXiv:1702.06404, 201710.1145/3051457.3053974Search in Google Scholar

[48] Crossley S., Paquette L., Dascalu M., McNamara D.S., Baker R.S., Combining click-stream data with NLP tools to better understand MOOC completion, in Proceedings of the sixth international conference on learning analytics & knowledge, ACM, 2016, 6–1410.1145/2883851.2883931Search in Google Scholar

[49] Robinson C., Yeomans M., Reich J., Hulleman C., Gehlbach H., Forecasting student achievement in MOOCs with natural language processing, in Proceedings of the sixth international conference on learning analytics & knowledge, ACM, 2016, 383–38710.1145/2883851.2883932Search in Google Scholar

[50] Coleman C.A., Seaton D.T., Chuang I., Probabilistic use cases: Discovering behavioral patterns for predicting certification, in Proceedings of the Second (2015) ACM Conference on Learning@ Scale, ACM, 2015, 141–14810.1145/2724660.2724662Search in Google Scholar

[51] Li W., Gao M., Li H., Xiong Q., Wen J., Wu Z., Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning, in 2016 international joint conference on neural networks (IJCNN), IEEE, 2016, 3130–313710.1109/IJCNN.2016.7727598Search in Google Scholar

[52] Gardner J., Brooks C., Student success prediction in MOOCs, User Modeling and User-Adapted Interaction, 28(2), 2018, 127–20310.1007/s11257-018-9203-zSearch in Google Scholar

[53] Moreno-Marcos P.M., Muñoz-Merino P.J., Maldonado-Mahauad J., Pérez-Sanagustín M., Alario-Hoyos C., Kloos C.D., Temporal analysis for dropout prediction using self-regulated learning strategies in self-paced MOOCs, Computers & Education, 145, 2020, 10372810.1016/j.compedu.2019.103728Search in Google Scholar

[54] Xing W., Du D., Dropout prediction in MOOCs: Using deep learning for personalized intervention, Journal of Educational Computing Research, 57(3), 2019, 547–57010.1177/0735633118757015Search in Google Scholar

[55] Liu T.y., Li X., Finding out reasons for low completion in MOOC environment: an explicable approach using hybrid data mining methods, DEStech Transactions on Social Science, Education and Human Science, (meit), 201710.12783/dtssehs/meit2017/12893Search in Google Scholar

[56] Chen Y., Zhang M., MOOC student dropout: pattern and prevention, in Proceedings of the ACM Turing 50th Celebration Conference-China, 2017, 1–610.1145/3063955.3063959Search in Google Scholar

[57] Mourdi Y., Sadgal M., Berrada Fathi W., El Kabtane H., A machine learning based approach to enhance MOOC users’ classification., Turkish Online Journal of Distance Education (TOJDE), 21(2), 202010.17718/tojde.727976Search in Google Scholar

[58] Mubarak A.A., Cao H., Zhang W., Prediction of students’ early dropout based on their interaction logs in online learning environment, Interactive Learning Environments, 2020, 1–2010.1080/10494820.2020.1727529Search in Google Scholar

[59] Chen C., Sonnert G., Sadler P.M., Sasselov D.D., Fredericks C., Malan D.J., Going over the cliff: MOOC dropout behavior at chapter transition, Distance Education, 41(1), 2020, 6–2510.1080/01587919.2020.1724772Search in Google Scholar

[60] Sun D., Mao Y., Du J., Xu P., Zheng Q., Sun H., Deep Learning for Dropout Prediction in MOOCs, in 2019 Eighth International Conference on Educational Innovation through Technology (EITT), IEEE, 2019, 87–9010.1109/EITT.2019.00025Search in Google Scholar

[61] Chen J., Feng J., Sun X., Wu N., Yang Z., Chen S., MOOC dropout prediction using a hybrid algorithm based on decision tree and extreme learning machine, Mathematical Problems in Engineering, 2019, 201910.1155/2019/8404653Search in Google Scholar

[62] Liao J., Tang J., Zhao X., Course drop-out prediction on MOOC platform via clustering and tensor completion, Tsinghua Science and Technology, 24(4), 2019, 412–42210.26599/TST.2018.9010110Search in Google Scholar

[63] Alamri A., Alshehri M., Cristea A., Pereira F.D., Oliveira E., Shi L., Stewart C., Predicting MOOCs dropout using only two easily obtainable features from the first week’s activities, in International Conference on Intelligent Tutoring Systems, Springer, 2019, 163– 17310.1007/978-3-030-22244-4_20Search in Google Scholar

[64] Hassan S.U., Waheed H., Aljohani N.R., Ali M., Ventura S., Herrera F., Virtual learning environment to predict withdrawal by leveraging deep learning, International Journal of Intelligent Systems, 34(8), 2019, 1935–195210.1002/int.22129Search in Google Scholar

[65] Wen Y., Tian Y., Wen B., Zhou Q., Cai G., Liu S., Consideration of the local correlation of learning behaviors to predict dropouts from MOOCs, Tsinghua Science and Technology, 25(3), 2019, 336–34710.26599/TST.2019.9010013Search in Google Scholar

[66] Feng W., Tang J., Liu T.X., Understanding dropouts in MOOCs, Association for the Advancement of Artificial Intelligence, 201910.1609/aaai.v33i01.3301517Search in Google Scholar

[67] Cristea A.I., Alamri A., Kayama M., Stewart C., Alshehri M., Shi L., Earliest predictor of dropout in moocs: a longitudinal study of futurelearn courses, 2018Search in Google Scholar

[68] Haiyang L., Wang Z., Benachour P., Tubman P., A time series classification method for behaviour-based dropout prediction, in 2018 IEEE 18th international conference on advanced learning technologies (ICALT), IEEE, 2018, 191–19510.1109/ICALT.2018.00052Search in Google Scholar

[69] Qiu L., Liu Y., Liu Y., An integrated framework with feature selection for dropout prediction in massive open online courses, IEEE Access, 6, 2018, 71474–7148410.1109/ACCESS.2018.2881275Search in Google Scholar

[70] Ardchir S., Talhaoui M.A., Jihal H., Azzouazi M., Predicting MOOC Dropout Based on Learner’s Activity, International Journal of Engineering & Technology, 7(4.32), 2018, 124–126Search in Google Scholar

[71] Vitiello M., Walk S., Chang V., Hernandez R., Helic D., Guetl C., MOOC dropouts: A multi-system classifier, in European Conference on Technology Enhanced Learning, Springer, 2017, 300–31410.1007/978-3-319-66610-5_22Search in Google Scholar

[72] Cobos R., Wilde A., Zaluska E., Predicting attrition from massive open online courses in FutureLearn and edX, in Proceedings of the 7th International Learning Analytics and Knowledge Conference, Simon Fraser University, Vancouver, BC, Canada, 2017, 13–17Search in Google Scholar

[73] Wang F., Chen L., A Nonlinear State Space Model for Identifying At-Risk Students in Open Online Courses, International Educational Data Mining Society, 2016Search in Google Scholar

[74] Vitiello M., Walk S., Hernández R., Helic D., Gütl C., Classifying students to improve MOOC dropout rates, Research Track, 2016, 501Search in Google Scholar

[75] Tang J.K., Xie H., Wong T.L., A big data framework for early identification of dropout students in MOOC, in International Conference on Technology in Education, Springer, 2015, 127–13210.1007/978-3-662-48978-9_12Search in Google Scholar

[76] Yang D., Wen M., Howley I., Kraut R., Rose C., Exploring the effect of confusion in discussion forums of massive open online courses, in Proceedings of the second (2015) ACM conference on learning@ scale, 2015, 121–13010.1145/2724660.2724677Search in Google Scholar

[77] Jiang S., Williams A., Schenke K., Warschauer M., O’dowd D., Predicting MOOC performance with week 1 behavior, in Educational data mining 2014, 2014Search in Google Scholar

[78] Rosé C.P., Carlson R., Yang D., Wen M., Resnick L., Goldman P., Sherer J., Social factors that contribute to attrition in MOOCs, in Proceedings of the first ACM conference on Learning@ scale conference, ACM, 2014, 197–19810.1145/2556325.2567879Search in Google Scholar

[79] Feld S.L., The focused organization of social ties, American journal of sociology, 86(5), 1981, 1015–103510.1086/227352Search in Google Scholar

[80] Bahns A.J., Pickett K.M., Crandall C.S., Social ecology of similarity: Big schools, small schools and social relationships, Group Processes & Intergroup Relations, 15(1), 2012, 119–13110.1177/1368430211410751Search in Google Scholar

[81] Chen T., He L., Collaborative filtering based on demographic attribute vector, in 2009 ETP International Conference on Future Computer and Communication, IEEE, 2009, 225–22910.1109/FCC.2009.68Search in Google Scholar

[82] Vozalis M.G., Margaritis K.G., Using SVD and demographic data for the enhancement of generalized collaborative filtering, Information Sciences, 177(15), 2007, 3017–303710.1016/j.ins.2007.02.036Search in Google Scholar

[83] Mazhari S., Fakhrahmad S.M., Sadeghbeygi H., A user-profile-based friendship recommendation solution in social networks, Journal of Information Science, 41(3), 2015, 284–29510.1177/0165551515569651Search in Google Scholar

[84] MoocData, http://moocdata.cn/data/user-activity, (Accessed on 05/29/2020)Search in Google Scholar

[85] Li M., Zhou Z.H., SETRED: Self-training with editing, in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2005, 611–62110.1007/11430919_71Search in Google Scholar

[86] Nartey O.T., Yang G., Wu J., Asare S.K., Semi-Supervised Learning for Fine-Grained Classification with Self-Training, IEEE Access, 201910.1109/ACCESS.2019.2962258Search in Google Scholar

[87] McClosky D., Charniak E., 0001 M.J., Effective Self-Training for Parsing, in R.C. Moore, J.A. Bilmes, J. Chu-Carroll, M. Sanderson, eds., Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 4-9, 2006, New York, New York, USA, The Association for Computational Linguistics, 200610.3115/1220835.1220855Search in Google Scholar

[88] Chollet F., et al., Keras, https://keras.io, 2015Search in Google Scholar

[89] Breiman L., Random forests, Machine learning, 45(1), 2001, 5–3210.1023/A:1010933404324Search in Google Scholar

Received: 2020-04-23

Accepted: 2020-06-03

Published Online: 2020-07-22

This work is licensed under the Creative Commons Attribution 4.0 International License.

On the Effectiveness of Self-Training in MOOC Dropout Prediction

Abstract

References

Journal and Issue

Articles in the same Issue