Abstract
Based on the premise that university student dropout is a social problem in the university ecosystem of any country, technological leverage is a way that allows us to build technological proposals to solve a poorly met need in university education systems. Under this scenario, the study presents and analyzes eight predictive models to forecast university dropout, based on data mining methods and techniques, using WEKA for its implementation, with a dataset of 4365 academic records of students from the National University of Moquegua (UNAM), in Peru. The objective is to determine which model presents the best performance indicators to forecast and hence prevent student dropout. The aim of the study is to propose and compare the accuracy of eight predictive models with balanced classes, using the SMOTE method for the generation of synthetic data. The results allow us to confirm that the predictive model based on Random Forest is the one that presents the highest accuracy and robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Attribute selection algorithms used: OneR Attribute Evaluation, Relief Factor Attribute Evaluation, Info Gain Attribute Evaluation, Gain Ratio Attribute Evaluation, and Symmetrical Uncertainty Attribute Evaluation.
References
Abe, K.: Data mining and machine learning applications for educational big data in the university. In: 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 350–355. IEEE (2019)
Arias-Gómez, D., Durán-Aponte, E.: Persistencia académica en un programa de nivelación universitario venezolano: caso universidad simón bolívar. Rev. Digit. de Inv. en Docencia Universitaria 11(2), 289–307 (2017)
Beaulac, C., Rosenthal, J.S.: Predicting university students’ academic success and major using random forests. Res. High. Educ. 60(7), 1048–1064 (2019)
Behr, A., Giese, M., Theune, K., et al.: Early prediction of university dropouts-a random forest approach. Jahrbücher für Nationalökonomie und Statistik, 1(ahead-of-print) (2020)
Bharara, S., Sabitha, S., Bansal, A.: Application of learning analytics using clustering data mining for students’ disposition analysis. Educ. Inf. Technol. 23(2), 957–984 (2018)
Carvajal, C.M., González, J.A., Sarzoza, S.J.: Variables sociodemográficas y académicas explicativas de la deserción de estudiantes en la facultad de ciencias naturales de la universidad de playa ancha (chile). Formación universitaria 11(2), 3–12 (2018)
Castillo-Sánchez, M., Gamboa-Araya, R., Hidalgo-Mora, R.: Factores que influyen en la deserción y reprobación de estudiantes de un curso universitario de matemáticas. Uniciencia 34(1), 219–245 (2020)
Castro, Y.G., Durán, O.M., Zamudio, M.T.: Riesgos de deserción en las universidades virtuales de colombia, frente a las estrategias de retención. Libre Empresa 14(2), 177–197 (2017)
Castro-Montoya, B.A., Lopera-Gómez, C.M., Manrique-Hernández, R.D., Gonzalez-Gómez, D.: Modelo de riesgos competitivos para deserción y graduación en estudiantes universitarios de programas de pregrado de una universidad privada de medellín (colombia). Formación universitaria 14(1), 81–98 (2021)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cohen, W.W.: Fast effective rule induction. In: Machine Learning Proceedings 1995, pp. 115–123. Elsevier (1995)
Díaz Peralta, C.: Modelo conceptual para la deserción estudiantil universitaria chilena. Estudios Pedagógicos (Valdivia) 34(2), 65–86 (2008)
Eckert, K.B., Suénaga, R.: Análisis de deserción-permanencia de estudiantes universitarios utilizando técnica de clasificación en minería de datos. Formación Universitaria 8(5), 03–12 (2015)
Frank, E., Hall, M.A., Witten, I.H.: The WEKA workbench. Morgan Kaufmann (2016)
Hernández-Leal, E.J., Quintero-Lorza, D.P., Escobar-Naranjo, J.C., Ramírez-Gómez, J.S., Duque-Méndez, N.D.: Educational data mining for the analysis of student desertion. Learn. Anal. Latin Am. 2018(2231), 51–60 (2018)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)
Huebner, R.A.: A survey of educational data-mining research. Res. High. Educ. J. 19 (2013)
Kinsumba, P.A., Fernández, R.L., Alonso, M.J.B.: Análisis de factores relacionados con el éxito académico en la universidad agostinho neto. Luz 16(3), 4–15 (2017)
Kurdi, M.M., Al-Khafagi, H., Elzein, I.: Mining educational data to analyze students’ behavior and performance. In: 2018 JCCO Joint International Conference on ICT in Education and Training, International Conference on Computing in Arabic, and International Conference on Geocomputing (JCCO: TICET-ICCA-GECO), pp. 1–5. IEEE (2018)
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)
Maya, N.E.R., Alfaro, A.J.J., Hernandez, L.A.R., Carranza, B.A.S., Garduno, J.K.R.: Data mining: a scholar dropout predictive model. In: 2017 IEEE Mexican Humanitarian Technology Conference (MHTC), pp. 89–93. IEEE (2017)
Merlino, A., Ayllón, S., Escanés, G.: Variables que influyen en la deserción de estudiantes universitarios de primer año. construcción de índices de riesgo de abandono/variables that influence first year university students’ dropout rates. construction of dropout risk indexes. Actualidades Investigativas en Educación 11(2) (2011)
Meza-Obando, F.: Estimating the redshift of galaxies from their photometric colors using machine learning methods. A first approach to Acoustic Characterization of Costa Rican Children’s Speech
Miranda, M.A., Guzmán, J.: Análisis de la deserción de estudiantes universitarios usando técnicas de minería de datos. Formación Universitaria 10(3), 61–68 (2017)
Moreno, J., Rodríguez, D., Sicilia, M., Riquelme, J., Ruiz, R.: SMOTE-I: mejora del algoritmo SMOTE para balanceo de clases minoritarias. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 3(1) (2009)
Muñoz-Olano, J.F., Hurtado-Parrado, C.: Effects of goal clarification on impulsivity and academic procrastination of college students. Rev. Latinoamericana de Psicología 49(3), 173–181 (2017)
Parrino, M.C. : Aristas de la problemática de la deserción universitaria (2005)
Pedró, F.: Covid-19 y educación superior en américa latina y el caribe: efectos, impactos y recomendaciones políticas. Análisis Carolina 36(1), 1–15 (2020)
Pérez, A.M., Escobar, C.R., Toledo, M.R., Gutierrez, L.B., Reyes, G.M.: Prediction model of first-year student desertion at universidad bernardo o’ higgins (ubo). Educ. Pesqui. 44, e172094–e172094 (2018)
Proaño, J.P.Z., Villamar, V.C.P.: Systematic mapping study of literature on educational data mining to determine factors that affect school performance. In: 2018 International Conference on Information Systems and Computer Science (INCISCOS), pp. 239–245. IEEE (2018)
Raju, R., Kalaiselvi, N., Divya, I., Selvarani, A., et al.: Educational data mining: A comprehensive study. In: 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), pp. 1–5. IEEE (2020)
Ruiz-Ramírez, R., García-Cué, J.L., Pérez-Olvera, M.A.: Causas y consecuencias de la deserción escolar en el bachillerato: Caso universidad autónoma de sinaloa. Ra Ximhai 10(5), 51–74 (2014)
Shrestha, S., Pokharel, M.: Machine learning algorithm in educational data. In: 2019 Artificial Intelligence for Transforming Business and Society (AITB), vol. 1, pp. 1–11. IEEE (2019)
Solís, M., Moreira, T., Gonzalez, R., Fernandez, T., Hernandez, M.: In: Perspectives to predict dropout in university students with machine learning, pp. 1–6. IEEE (2018)
Suca, C., Córdova, A., Condori, A., Cayra, J., Sulla, J.: Comparación de algoritmos de clasificación para la predicción de casos de obesidad infantil. Universidad Nacional de San Agustín, Perú (2016)
Taylor, J.D., Miller, T.K.: Necessary components for evaluating minority retention programs. NASPA J. 39(3), 266–283 (2002)
Thompson, P.: Deserción universitaria. análisis de los egresados de la carrera de administración. cohorte 2011–2016. Población y Desarrollo (45), 107–112 (2017)
Tinto, V.: Dropout from higher education: a theoretical synthesis of recent research. Rev. Educ. Res. 45(1), 89–125 (1975)
Torres, C.Z., Ramos, C.A., Moraga, J.L.: Estudio de variables que influyen en la deserción de estudiantes universitarios de primer año, mediante minería de datos. Ciencia Amazónica (Iquitos) 6(1), 73–84 (2016)
Vilalta Alonso, J.A., Becerra Alonso, M.J., Lau Fernández, R.: El éxito académico en el primer año de la carrera de ingeniería industrial y su vínculo con factores académicos previos. Páginas de Educación 13(1), 42–57 (2020)
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining., vol. 1. Springer-Verlag London, UK (2000)
Acknowledgements
This work is partially supported by the Spanish Government project TIN2017-89156-R, and the Valencian Government project PROMETEO/2018/002. The research was developed thanks to the support of the National University of Moquegua, which provided the information for the creation of the dataset.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Flores, V., Heras, S., Julián, V. (2021). Comparison of Predictive Models with Balanced Classes for the Forecast of Student Dropout in Higher Education. In: De La Prieta, F., El Bolock, A., Durães, D., Carneiro, J., Lopes, F., Julian, V. (eds) Highlights in Practical Applications of Agents, Multi-Agent Systems, and Social Good. The PAAMS Collection. PAAMS 2021. Communications in Computer and Information Science, vol 1472. Springer, Cham. https://doi.org/10.1007/978-3-030-85710-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-85710-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85709-7
Online ISBN: 978-3-030-85710-3
eBook Packages: Computer ScienceComputer Science (R0)