Skip to main content
Log in

Using machine learning to predict student difficulties from learning session data

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The student’s performance prediction is an important research topic because it can help teachers prevent students from dropping out before final exams and identify students that need additional assistance. The objective of this study is to predict the difficulties that students will encounter in a subsequent digital design course session. We analyzed the data logged by a technology-enhanced learning (TEL) system called digital electronics education and design suite (DEEDS) using machine learning algorithms. The machine learning algorithms included an artificial neural networks (ANNs), support vector machines (SVMs), logistic regression, Naïve bayes classifiers and decision trees. The DEEDS system allows students to solve digital design exercises with different levels of difficulty while logging input data. The input variables of the current study were average time, total number of activities, average idle time, average number of keystrokes and total related activity for each exercise during individual sessions in the digital design course; the output variables were the student(s) grades for each session. We then trained machine learning algorithms on the data from the previous session and tested the algorithms on the data from the upcoming session. We performed k-fold cross-validation and computed the receiver operating characteristic and root mean square error metrics to evaluate the models’ performances. The results show that ANNs and SVMs achieve higher accuracy than do other algorithms. ANNs and SVMs can easily be integrated into the TEL system; thus, we would expect instructors to report improved student’s performance during the subsequent session.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abu Saa A (2016) Educational data mining and students’ performance prediction. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2016.070531

    Article  Google Scholar 

  • Acharya A, Sinha D (2014) Early prediction of students performance using machine learning techniques. Int J Comput Appl 107(1):37–43. https://doi.org/10.5120/18717-9939

    Article  Google Scholar 

  • Ameri S, Fard MJ, Chinnam RB, Reddy CK (2016) Survival analysis based framework for early prediction of student dropouts. In: 25th Procedding of the ACM conference information and knowledge management, pp 903–912. https://doi.org/10.1145/2983323.2983351

  • Arnold KE, Pistilli (2012) Course signals at purdue: using learning analytics to increase student success. In: 2nd International conference on learning analytics and knowledge (LAK’12), pp 267–270. https://doi.org/10.1145/2330601.2330666

  • Bakki A, Oubahssi L, Cherkaoui C, George S (2015) Motivation and engagement in MOOCs: How to increase learning motivation by adapting pedagogical scenarios? Desing for teaching and learning in a network world. Lecture notes in computer science 9307:556–559

  • Barata G, Gama S, Jorge J, Goncalved D (2016) Early prediction of student profiles based on performance and gaming preferences. IEEE Trans Learn Technol 3(9):272–284. https://doi.org/10.1109/TLT.2016.2541664

    Article  Google Scholar 

  • Chaudhuri S (1998) Data mining and database systems: Where is the intersection? Data Eng Bull 21(1):1998

    MathSciNet  Google Scholar 

  • Chen G-D, Liu C, Ou K-L, Liu B-J (2000) Discovering decision knowledge from web log portfolio for managing classroom processes by applying decision tree and data cube technology. J Educ Comput Res 23(3):305–332. https://doi.org/10.2190/5JNM-B6HP-YC58-PM5Y

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46. https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  • De Albuquerque RM, Bezerra AA, de Souza DA, do Nascimento LBP, de Mesquita sa JJ, do Nascimento JC (2015) Using neural networks to predict the future performance of students. In: IEEE international symposium on computers in education (SIIE) 2015, pp 109–113. https://doi.org/10.1109/SIIE.2015.7451658

  • Devasia MT, Vinushree, HV (2016) Prediction of students performance using educational data mining. In: International conference on data mining and advanced computing (SAPIENCE). https://doi.org/10.1109/SAPIENCE.2016.7684167

  • Di Mitir D, Scheffel M, Drachsler H, Börner D, Ternier S, Specht M (2017) Learning pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data. In: 2017 seven international conference on learning analytics and knowledge, pp 188–197. https://doi.org/10.1145/3027385.3027447

  • Donzellini G, Ponta D (2007) A simulation environment for e-learning in digital design. IEEE Trans Ind Electron 54(6):3078–3085. https://doi.org/10.1109/TIE.2007.907011

    Article  Google Scholar 

  • Ducher M, Cerutti C, Marquand A, Mounier VC, Hanon O, Girerd X, Ader C, Juillard L, Fauvel JP, Club DJ (2005) How to limit screening of patients for atheromatous renal artery stenosis in two-drug resistant hypertension? J Nephrol 18(2):161–165

    Google Scholar 

  • Elbadrawy A, Studham RS, Karypis G (2015) Collaborative multi-regression models for predicting students’ performance in course activities. In: 5th International conference on learning analytics and knowledge (LAK ’15), pp 103–107. https://doi.org/10.1145/2723576.2723590

  • Fawcett T (2004) Roc graphs: notes and practical considerations for researchers. HP Laboratoreis, Palo Alto. 31(8):1–38

  • Fernandez-Delgado M, Mucientes M, Vazquez-Barreiros B, Lama M (2014) Learning analytices for the prediction of the educational objectives achievement. In: 44th IEEE Frontiers in Eeducation conference (FIE), pp 2500–2503. https://doi.org/10.1109/FIE.2014.7044402

  • Ge X, Liu J, Qi Q, Chen Z (2011) A new prediction approach based on linear regression for collaborative filtering. In: 8th International 2011 conference on fuzzy systems and knowledge discovery (FSKD), pp 2586–2590. https://doi.org/10.1109/FSKD.2011.6020007

  • Hämäläinen W, Vinni M (2010) Classifiers for educational data mining. Handbook of educational data mining. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series,CRC Press, pp 57–74. https://doi.org/10.1201/b10274-7

  • Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  • He J, Bailey J, Rubinstein BIP, Zhang R (2015) Identifying at-risk students in massive open online courses. In: 29th AAA conference on artificial intelligence 2015, pp 1749–1755

  • Hlosta M, Zdrahal Z, Zendulka J (2017) Ouroboros: early identification of at-risk students without models based on legacy data. In: 7th International conference on learning analytics & knowledge (LAK’17), pp 6–15. https://doi.org/10.1145/3027385.3027449

  • Hu Y-H, Lo C-L, Shih S-P (2014) Developing early warning systems to predict students online learning performance. Comput Human Behav 36:469–478. https://doi.org/10.1016/j.chb.2014.04.002

    Article  Google Scholar 

  • Huang S, Fang N (2013) Predicting student academic performance in an engineering dynamics course: a comparison of four types of predictive mathematical models. Comput Educ 61:133–145. https://doi.org/10.1016/j.compedu.2012.08.015

    Article  Google Scholar 

  • Imran H, Hoang Q, Chang T-W, Kinshuk, Graf S (2014) A framework to provide personalization in learning management systems through a recommender system approach. In: Intelligent information and database system. ACIIDS 2014. Lecture notes in computer science 8397, pp 271–280. https://doi.org/10.1007/978-3-319-05476-6_28

  • Jayaprakash SM, Moody EW, Lauria E, Regan JR, Baron JD (2014) Early alert of academically at-risk students: an open source analytics initiative. J Learn Anal 1(1):6–47. https://doi.org/10.18608/jla.2014.11.3

    Article  Google Scholar 

  • Kai S, Miguel J, Andres L, Paquette L, Baker RS, Molnar K, Watkins H, Moore M (2017) Predicting student retention from behavior in an online orientation course. In: 10th International conference on education data mining

  • Käser T, Hallinen NR, Schwartz DL (2017) Modeling exploration strategies to predict student performance within a learning environment and beyond. In: 17th International conference on learning analytics and knowledge 2017, pp 31–40. https://doi.org/10.1145/3027385.3027422

  • Kaur K, Kaur K (2015) Analyzing the effect of difficulty level of a course on students performance prediction using data mining. In: 1st international conference on next generation computing technologies 2015, pp 756–761. https://doi.org/10.1109/NGCT.2015.7375222

  • Kloft M, Stiehler F, Zheng Z, Pinkwart N (2014) Predicting MOOC dropout over weaks using machine learning methods. In: Proceeding of the EMNLP 2014 workshop on analysis of large scale social interacion in MOOCs, pp 60–65

  • Kotsiantis S, Pierrakeas C, Zaharakis I, Pintelas P (2003) Efficiency of machine learning techniques in predicting students performance in distance learning systems. Recent advances in mechanics and related fields. University of Patras Press, pp 297–306

  • Kuzilek J, Hlosta M, Herrmannova D, Zdrahal Z, Vaclavek J, Wolff A (2015) OU analyse: analysing at-risk student at the open university. Learn Anal Rev 15(1):1–16

    Google Scholar 

  • Liu S, d’Aquin M (2017) Unsupervised learning for understanding student achievement in a distance learning setting. In: IEEE global engineering education conference (EDUCON), pp 25–28. https://doi.org/10.1109/EDUCON.2017.7943026

  • Lykourentzou I, Giannoukos I, Mpardis G, Nikolopoulos V, Loumos V (2009) Early and dynamic student achievement prediction in e-learning courses using neural networks. J Am Soc Inf Sci Technol 60(2):372–380. https://doi.org/10.1002/asi.v60:2

    Article  Google Scholar 

  • Marbouti F, Diefes-Dux HA, Madhavan K (2016) Models for early prediction of at-risk students in a course using standards-based grading. Comput Educ 103:1–15. https://doi.org/10.1016/j.compedu.2016.09.005

    Article  Google Scholar 

  • Marquez-Vera C, Cano A, Remero C, Noman YM, Fardoun HM, Ventura S (2015) Early dropout prediction using data mining :a case study with high school grade. Expert Syst 33(1):107–124. https://doi.org/10.1111/exsy.12135

    Article  Google Scholar 

  • Meier Y, Xu J, Atan O, Van Der Schaar M (2016) Predicting grades. IEEE Trans Signal Process 64(4):959–972. https://doi.org/10.1109/TSP.2015.2496278

    Article  MathSciNet  MATH  Google Scholar 

  • Moseley LG, Mead DM (2008) Predicting who will drop out of nursing courses: a machine learning exercise. Nurse Educ Today 28(4):469–475. https://doi.org/10.1016/j.nedt.2007.07.012

    Article  Google Scholar 

  • Murphy PM, Aha DW (1995) UCI repository of machine learning databases, (Machine Readable Data Repository). Dept. Inf. Comput. Sci., Univ. California, Irvine, CA

  • Pahl C, Donnellan D (2002) Data mining technology for the evaluation of web-based teaching and learning systems. In: 7th International conference on e-learning in business, government and higher education, pp 15–19

  • Pai P-F, Hong W-C (2005) Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms. Electric Power Syst Res 74(3):417–425. https://doi.org/10.1016/j.epsr.2005.01.006

    Article  Google Scholar 

  • Pelanek R (2015) Metrics for evaluation of student models. J Educ Data Min 7(2):1–19

    Google Scholar 

  • Ramesh V, Parkavi P, Ramar K (2013) Predicting student performance: a statistical and data mining approach. Int J Comput Appl 63(8):35–39. https://doi.org/10.5120/10489-5242

    Article  Google Scholar 

  • Rovira S, Puertas E, lgual L (2017) Data-driven system to predict academic grades and dropout. PloS ONE 12(2):e0171207. https://doi.org/10.1371/journal.pone.0171207

    Article  Google Scholar 

  • Smith-Gratto K (1999) Best practices and problems. Report to the distance education evaluation task force distance educaiton. North Carolina A & T state University, Raleigh

    Google Scholar 

  • Sweeney M, Rangwala H, Lester J, Johri A (2016) Next-term student performance prediction: a recommender systems approach. J Educ Data Min 8:1–27

    Google Scholar 

  • Ungar LH, Zhou J, Foster DP, Stine BA (2005) Streaming feature selection using iic. In: Proceedings of the 10th international conference on artificial intelligence and statistics

  • Vahdat M, Oneto L, Anguita D, Funk M, Rauterberg M (2015) A Learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: Conole G et al (eds): 10th International European conference on technology enhanced learning (EC-TEL) 2015. pp 352–366. https://doi.org/10.1007/978-3-319-24258-326

  • Ward ME, Peters G, Shelley K (2010) Student and faculty perceptions of the quality of online learning experiences. Int Rev Res Open Distrib Learn 11(3):57–77. https://doi.org/10.19173/irrodl.v11i3.867

    Article  Google Scholar 

  • Zacharis NZ (2015) A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet High Educ 27:44–53. https://doi.org/10.1016/j.iheduc.2015.05.002

    Article  Google Scholar 

  • Zheng J, Chen Z, Zhou C (2013) Applying NN-based data mining to learning performance assessment. In: 13th IEEE joint international computer science and information technology conference (JICSIT). https://doi.org/10.1109/ANTHOLOGY.2013.6784924

  • Zhou J, Foster D, Stine R, Ungar L (2005) Streaming feature selection using alpha-investing.In: 11th ACM international conference on knowledge discovery in data mining, pp 384–393. https://doi.org/10.1145/1081870.1081914

Download references

Acknowledgements

The work of this paper is supported by National Natural Science Foundation of china (Nos.61572434, 91630206 and 61303097) and the National Key R&D Program of China (No. 2017YFB0701501).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenhao Zhu.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, M., Zhu, W., Zhang, W. et al. Using machine learning to predict student difficulties from learning session data. Artif Intell Rev 52, 381–407 (2019). https://doi.org/10.1007/s10462-018-9620-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-018-9620-8

Keywords

Navigation