Skip to main content

Comparison of Predictive Models with Balanced Classes for the Forecast of Student Dropout in Higher Education

  • Conference paper
  • First Online:
Highlights in Practical Applications of Agents, Multi-Agent Systems, and Social Good. The PAAMS Collection (PAAMS 2021)

Abstract

Based on the premise that university student dropout is a social problem in the university ecosystem of any country, technological leverage is a way that allows us to build technological proposals to solve a poorly met need in university education systems. Under this scenario, the study presents and analyzes eight predictive models to forecast university dropout, based on data mining methods and techniques, using WEKA for its implementation, with a dataset of 4365 academic records of students from the National University of Moquegua (UNAM), in Peru. The objective is to determine which model presents the best performance indicators to forecast and hence prevent student dropout. The aim of the study is to propose and compare the accuracy of eight predictive models with balanced classes, using the SMOTE method for the generation of synthetic data. The results allow us to confirm that the predictive model based on Random Forest is the one that presents the highest accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.the-modeling-agency.com/crisp-dm.pdf.

  2. 2.

    Attribute selection algorithms used: OneR Attribute Evaluation, Relief Factor Attribute Evaluation, Info Gain Attribute Evaluation, Gain Ratio Attribute Evaluation, and Symmetrical Uncertainty Attribute Evaluation.

References

  1. Abe, K.: Data mining and machine learning applications for educational big data in the university. In: 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 350–355. IEEE (2019)

    Google Scholar 

  2. Arias-Gómez, D., Durán-Aponte, E.: Persistencia académica en un programa de nivelación universitario venezolano: caso universidad simón bolívar. Rev. Digit. de Inv. en Docencia Universitaria 11(2), 289–307 (2017)

    Article  Google Scholar 

  3. Beaulac, C., Rosenthal, J.S.: Predicting university students’ academic success and major using random forests. Res. High. Educ. 60(7), 1048–1064 (2019)

    Article  Google Scholar 

  4. Behr, A., Giese, M., Theune, K., et al.: Early prediction of university dropouts-a random forest approach. Jahrbücher für Nationalökonomie und Statistik, 1(ahead-of-print) (2020)

    Google Scholar 

  5. Bharara, S., Sabitha, S., Bansal, A.: Application of learning analytics using clustering data mining for students’ disposition analysis. Educ. Inf. Technol. 23(2), 957–984 (2018)

    Article  Google Scholar 

  6. Carvajal, C.M., González, J.A., Sarzoza, S.J.: Variables sociodemográficas y académicas explicativas de la deserción de estudiantes en la facultad de ciencias naturales de la universidad de playa ancha (chile). Formación universitaria 11(2), 3–12 (2018)

    Article  Google Scholar 

  7. Castillo-Sánchez, M., Gamboa-Araya, R., Hidalgo-Mora, R.: Factores que influyen en la deserción y reprobación de estudiantes de un curso universitario de matemáticas. Uniciencia 34(1), 219–245 (2020)

    Article  Google Scholar 

  8. Castro, Y.G., Durán, O.M., Zamudio, M.T.: Riesgos de deserción en las universidades virtuales de colombia, frente a las estrategias de retención. Libre Empresa 14(2), 177–197 (2017)

    Article  Google Scholar 

  9. Castro-Montoya, B.A., Lopera-Gómez, C.M., Manrique-Hernández, R.D., Gonzalez-Gómez, D.: Modelo de riesgos competitivos para deserción y graduación en estudiantes universitarios de programas de pregrado de una universidad privada de medellín (colombia). Formación universitaria 14(1), 81–98 (2021)

    Article  Google Scholar 

  10. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  11. Cohen, W.W.: Fast effective rule induction. In: Machine Learning Proceedings 1995, pp. 115–123. Elsevier (1995)

    Google Scholar 

  12. Díaz Peralta, C.: Modelo conceptual para la deserción estudiantil universitaria chilena. Estudios Pedagógicos (Valdivia) 34(2), 65–86 (2008)

    Article  Google Scholar 

  13. Eckert, K.B., Suénaga, R.: Análisis de deserción-permanencia de estudiantes universitarios utilizando técnica de clasificación en minería de datos. Formación Universitaria 8(5), 03–12 (2015)

    Article  Google Scholar 

  14. Frank, E., Hall, M.A., Witten, I.H.: The WEKA workbench. Morgan Kaufmann (2016)

    Google Scholar 

  15. Hernández-Leal, E.J., Quintero-Lorza, D.P., Escobar-Naranjo, J.C., Ramírez-Gómez, J.S., Duque-Méndez, N.D.: Educational data mining for the analysis of student desertion. Learn. Anal. Latin Am. 2018(2231), 51–60 (2018)

    Google Scholar 

  16. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)

    Article  Google Scholar 

  17. Huebner, R.A.: A survey of educational data-mining research. Res. High. Educ. J. 19 (2013)

    Google Scholar 

  18. Kinsumba, P.A., Fernández, R.L., Alonso, M.J.B.: Análisis de factores relacionados con el éxito académico en la universidad agostinho neto. Luz 16(3), 4–15 (2017)

    Google Scholar 

  19. Kurdi, M.M., Al-Khafagi, H., Elzein, I.: Mining educational data to analyze students’ behavior and performance. In: 2018 JCCO Joint International Conference on ICT in Education and Training, International Conference on Computing in Arabic, and International Conference on Geocomputing (JCCO: TICET-ICCA-GECO), pp. 1–5. IEEE (2018)

    Google Scholar 

  20. Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)

    Google Scholar 

  21. Maya, N.E.R., Alfaro, A.J.J., Hernandez, L.A.R., Carranza, B.A.S., Garduno, J.K.R.: Data mining: a scholar dropout predictive model. In: 2017 IEEE Mexican Humanitarian Technology Conference (MHTC), pp. 89–93. IEEE (2017)

    Google Scholar 

  22. Merlino, A., Ayllón, S., Escanés, G.: Variables que influyen en la deserción de estudiantes universitarios de primer año. construcción de índices de riesgo de abandono/variables that influence first year university students’ dropout rates. construction of dropout risk indexes. Actualidades Investigativas en Educación 11(2) (2011)

    Google Scholar 

  23. Meza-Obando, F.: Estimating the redshift of galaxies from their photometric colors using machine learning methods. A first approach to Acoustic Characterization of Costa Rican Children’s Speech

    Google Scholar 

  24. Miranda, M.A., Guzmán, J.: Análisis de la deserción de estudiantes universitarios usando técnicas de minería de datos. Formación Universitaria 10(3), 61–68 (2017)

    Article  Google Scholar 

  25. Moreno, J., Rodríguez, D., Sicilia, M., Riquelme, J., Ruiz, R.: SMOTE-I: mejora del algoritmo SMOTE para balanceo de clases minoritarias. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 3(1) (2009)

    Google Scholar 

  26. Muñoz-Olano, J.F., Hurtado-Parrado, C.: Effects of goal clarification on impulsivity and academic procrastination of college students. Rev. Latinoamericana de Psicología 49(3), 173–181 (2017)

    Article  Google Scholar 

  27. Parrino, M.C. : Aristas de la problemática de la deserción universitaria (2005)

    Google Scholar 

  28. Pedró, F.: Covid-19 y educación superior en américa latina y el caribe: efectos, impactos y recomendaciones políticas. Análisis Carolina 36(1), 1–15 (2020)

    Google Scholar 

  29. Pérez, A.M., Escobar, C.R., Toledo, M.R., Gutierrez, L.B., Reyes, G.M.: Prediction model of first-year student desertion at universidad bernardo o’ higgins (ubo). Educ. Pesqui. 44, e172094–e172094 (2018)

    Article  Google Scholar 

  30. Proaño, J.P.Z., Villamar, V.C.P.: Systematic mapping study of literature on educational data mining to determine factors that affect school performance. In: 2018 International Conference on Information Systems and Computer Science (INCISCOS), pp. 239–245. IEEE (2018)

    Google Scholar 

  31. Raju, R., Kalaiselvi, N., Divya, I., Selvarani, A., et al.: Educational data mining: A comprehensive study. In: 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), pp. 1–5. IEEE (2020)

    Google Scholar 

  32. Ruiz-Ramírez, R., García-Cué, J.L., Pérez-Olvera, M.A.: Causas y consecuencias de la deserción escolar en el bachillerato: Caso universidad autónoma de sinaloa. Ra Ximhai 10(5), 51–74 (2014)

    Article  Google Scholar 

  33. Shrestha, S., Pokharel, M.: Machine learning algorithm in educational data. In: 2019 Artificial Intelligence for Transforming Business and Society (AITB), vol. 1, pp. 1–11. IEEE (2019)

    Google Scholar 

  34. Solís, M., Moreira, T., Gonzalez, R., Fernandez, T., Hernandez, M.: In: Perspectives to predict dropout in university students with machine learning, pp. 1–6. IEEE (2018)

    Google Scholar 

  35. Suca, C., Córdova, A., Condori, A., Cayra, J., Sulla, J.: Comparación de algoritmos de clasificación para la predicción de casos de obesidad infantil. Universidad Nacional de San Agustín, Perú (2016)

    Google Scholar 

  36. Taylor, J.D., Miller, T.K.: Necessary components for evaluating minority retention programs. NASPA J. 39(3), 266–283 (2002)

    Article  Google Scholar 

  37. Thompson, P.: Deserción universitaria. análisis de los egresados de la carrera de administración. cohorte 2011–2016. Población y Desarrollo (45), 107–112 (2017)

    Google Scholar 

  38. Tinto, V.: Dropout from higher education: a theoretical synthesis of recent research. Rev. Educ. Res. 45(1), 89–125 (1975)

    Article  Google Scholar 

  39. Torres, C.Z., Ramos, C.A., Moraga, J.L.: Estudio de variables que influyen en la deserción de estudiantes universitarios de primer año, mediante minería de datos. Ciencia Amazónica (Iquitos) 6(1), 73–84 (2016)

    Article  Google Scholar 

  40. Vilalta Alonso, J.A., Becerra Alonso, M.J., Lau Fernández, R.: El éxito académico en el primer año de la carrera de ingeniería industrial y su vínculo con factores académicos previos. Páginas de Educación 13(1), 42–57 (2020)

    Article  Google Scholar 

  41. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining., vol. 1. Springer-Verlag London, UK (2000)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the Spanish Government project TIN2017-89156-R, and the Valencian Government project PROMETEO/2018/002. The research was developed thanks to the support of the National University of Moquegua, which provided the information for the creation of the dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stella Heras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Flores, V., Heras, S., Julián, V. (2021). Comparison of Predictive Models with Balanced Classes for the Forecast of Student Dropout in Higher Education. In: De La Prieta, F., El Bolock, A., Durães, D., Carneiro, J., Lopes, F., Julian, V. (eds) Highlights in Practical Applications of Agents, Multi-Agent Systems, and Social Good. The PAAMS Collection. PAAMS 2021. Communications in Computer and Information Science, vol 1472. Springer, Cham. https://doi.org/10.1007/978-3-030-85710-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85710-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85709-7

  • Online ISBN: 978-3-030-85710-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics