Educational data mining: A survey from 1995 to 2005

https://doi.org/10.1016/j.eswa.2006.04.005Get rights and content

Abstract

Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.

Introduction

During the past decades, the most important innovations in educational systems are related to the introduction of new technologies (Ha, Bae, & Park, 2000) as web-based education. This is a form of computer-aided instruction virtually independent of a specific location and any specific hardware platform (Brusilovsky & Peylo, 2003). It has considerably gained in importance and thousands of web courses have been deployed in the past few years. But many of the current web-based courses are based on static learning materials, which do not take into account the diversity of students. Adaptive and intelligent web-based educational systems have been seen as a solution to individually richer learning environments. These systems try to offer learners personalized education by building a model of the individual’s goals, preferences, and knowledge. Data mining or knowledge discovery in databases (KDD) is the automatic extraction of implicit and interesting patterns from large data collections (Klosgen & Zytkow, 2002). KDD can be used not only to learn the model for the learning process (Hamalainen, Suhonen, Sutinen, & Toivonen, 2004) or student modeling (Tang & McCalla, 2002) but also to evaluate and to improve e-learning systems (Zaïane & Luo, 2001) by discovering useful learning information from learning portfolios (Hwang, Chang, & Chen, 2004).

In conventional teaching environments, educators are able to obtain feedback on student learning experiences in face-to-face interactions with students, enabling a continual evaluation of their teaching programs (Sheard, Ceddia, Hurst, & Tuovinen, 2003). Decision making of classroom processes involves observing a student’s behavior, analyzing historical data, and estimating the effectiveness of pedagogical strategies. However, when students work in electronic environments, this informal monitoring is not possible; educators must look for other ways to attain this information. Organizations, which run distance education sites, collect large volumes of data, automatically generated by web servers and collected in server access logs. Web-based learning environments are able to record most learning behaviors of the students, and are hence able to provide a huge amount of learning profile. Recently, there is a growing interest in the automatic analysis of learner interaction data with web-based learning environments (Muehlenbrock, 2005). In order to provide a more effective learning environment, data mining techniques can be applied (Ingram, 1999). Data mining is a step in the overall process of KDD that consists of preprocessing, data mining and postprocessing. Data mining has already been successfully applied in e-commerce (Srivastava, Cooley, Deshpande, & Tan, 2000), and it has begun to be used in e-learning with promising results. Although the discovery methods used in both areas (e-commerce and e-learning) are similar (Hanna, 2004), there are some important differences between them:

  • Domain. The e-commerce purpose is to guide clients in purchasing while the e-learning purpose is to guide students in learning (Romero, Ventura, & Bra, 2004).

  • Data. In e-commerce the used data are normally simple web server access log, but in e-learning there is more information about a student’s interaction (Pahl & Donnellan, 2003). The user model is also different in both systems.

  • Objective. The objective of data mining in e-commerce is increasing profit, that is tangible and can be measured in terms of amounts of money, number of customers and customer loyalty. And the objective of data mining in e-learning is to improving the learning. This goal is more subjective and more subtle to measure.

  • Techniques. Educational systems have special characteristics that require a different treatment of the mining problem. As a consequence, some specific data mining techniques are needed to address in particular the process of learning (Li and Zaïane, 2004, Pahl and Donnellan, 2003). Some traditional techniques can be adapted, some cannot.

The application of knowledge extraction techniques to educational systems in order to improve learning can be viewed as a formative evaluation technique. Formative evaluation (Arruabarrena, Pérez, López-Cuadrado, & Vadillo, 2002) is the evaluation of an educational program while it is still in development, and with the purpose of continually improving the program. Examining how students use the system is one way to evaluate the instructional design in a formative manner and it may help the educator to improve the instructional materials (Ingram, 1999). Data mining techniques can discover useful information that can be used in formative evaluation to assist educators establish a pedagogical basis for decisions when designing or modifying an environment or teaching approach. The application of data mining in educational systems is an iterative cycle of hypothesis formation, testing, and refinement (see Fig. 1). Mined knowledge should enter the loop of the system and guide, facilitate and enhance learning as a whole. Not only turning data into knowledge, but also filtering mined knowledge for decision making.

As we can see in Fig. 1, educators and academics responsible are in charge of designing, planning, building and maintaining the educational systems. Students use and interact with them. Starting from all the available information about courses, students, usage and interaction, different data mining techniques can be applied in order to discover useful knowledge that helps to improve the e-learning process. The discovered knowledge can be used not only by providers (educators) but also by own users (students). So, the application of data mining in educational systems can be oriented to different actors with each particular point of view (Zorrilla, Menasalvas, Marin, Mora, & Segovia, 2005):

  • Oriented towards students (Heraud et al., 2004, Farzan, 2004, Lu, 2004, Tang and McCalla, 2005, Zaïane, 2002). The objective is to recommend to learners activities, resources and learning tasks that would favour and improve their learning, suggest good learning experiences for the students, suggest path pruning and shortening or simply links to follow, based on the tasks already done by the learner and their successes, and on tasks made by other similar learners, etc.

  • Oriented towards educators (Ha et al., 2000, Hamalainen et al., 2004, Merceron and Yacef, 2004, Minaei-Bidgoli and Punch, 2003, Mor and Minguillon, 2004, Muehlenbrock, 2005, Pahl and Donnellan, 2003, Romero et al., 2004, Silva and Vieira, 2002, Talavera and Gaudioso, 2004, Tang et al., 2000, Ueno, 2004b, Zaïane and Luo, 2001). The objective is to get more objective feedback for instruction, evaluate the structure of the course content and its effectiveness on the learning process, classify learners into groups based on their needs in guidance and monitoring, find learning learner’s regular as well as irregular patterns, find the most frequently made mistakes, find activities that are more effective, discover information to improve the adaptation and customization of the courses, restructure sites to better personalize courseware, organize the contents efficiently to the progress of the learner and adaptively constructing instructional plans, etc.

  • Oriented towards academics responsible and administrators (Becker et al., 2000, Grob et al., 2004, Luan, 2002, Ma et al., 2000, Peled and Rashty, 1999, Sanjeev and Zytkow, 1995, Urbancic et al., 2002). The objective is to have parameters about how to improve site efficiency and adapt it to the behavior of their users (optimal server size, network traffic distribution, etc.), have measures about how to better organize institutional resources (human and material) and their educational offer, enhance educational programs offer and determine effectiveness of the new computer mediated distance learning approach.

There are many general data mining tools that provide mining algorithms, filtering and visualization techniques. Some examples of commercial and academic tool are DBMiner, Clementine, Intelligent Miner, Weka, etc. (Klosgen & Zytkow, 2002). However these tools are not specifically designed and maintained for pedagogical purposes and it is cumbersome for an educator who does not have an extensive knowledge in data mining to use these tools (Zaïane, Xin, & Han, 1998). In order to solve this problem, some specific educational data mining, statistical and visualization tools have been developed to help educators in analyzing the different aspects of the learning process (see Table 1).

We have divided this paper into the following sections. We first review some different types of educational systems and how data mining can be applied in each of them. We then describe the data mining techniques that have been applied in educational systems grouping them by task. Finally, we summarize the main conclusions and we draw some future research.

Section snippets

Educational systems: data and objectives

Data mining can be applied to data coming from two types of educational systems: traditional classroom and distance education. It is necessary to deal separately with the application of data mining techniques in each type due to the fact that they have different data sources and objectives.

Data preprocessing

Data preprocessing allows to transform the original data into a suitable shape to be used by a particular mining algorithm. So, before applying the data mining algorithm, a number of general data preprocessing tasks have to be addressed (Koutri et al., 2004, Zorrilla et al., 2005):

  • Data cleaning. It is one of the major preprocessing tasks, to remove irrelevant items and log entries that are not needed for the mining process such us graphics, scripts.

  • User identification. Process of associating

Data mining techniques in educational systems

Data mining is a multidisciplinary area in which several computing paradigms converge: decision tree construction, rule induction, artificial neural networks, instance-based learning, Bayesian learning, logic programming, statistical algorithms, etc. (Klosgen & Zytkow, 2002). Next, we are going to describe some specific application of data mining techniques grouped by tasks, in web-based educational systems (see Table 2).

Conclusions and future research

Educational data mining is an upcoming field related to several well-established areas of research including e-learning, adaptive hypermedia, intelligent tutoring systems, web mining, data mining, etc. The application of data mining in educational systems has specific requirements not present in other domains, mainly the need to take into account pedagogical aspects of the learner and the system. Although the educational data mining is a very recent research area there is an important number of

Acknowledgement

The authors gratefully acknowledge the financial support provided by the Spanish Department of Research of the Ministry of Science and Technology under TIN2005-08386-C05-02 Project.

References (81)

  • L. Dringus et al.

    Using data mining as a strategy for assessing asynchronous discussion forums

    Computer & Education Journal

    (2005)
  • W. Hwang et al.

    The relationship of learning traits, motivation and performance-learning response dynamics

    Computers & Education Journal

    (2004)
  • Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In...
  • R. Agrawal et al.

    Mining sequential patterns

  • Arroyo, I., Murray, T., Woolf, B., & Beal, C. (2004). Inferring unobservable learning variables from students’ help...
  • Arruabarrena, R., Pérez, T. A., López-Cuadrado, J., & Vadillo, J. G. J. (2002). On evaluating adaptive systems for...
  • Avouris, N., Komis, V., Fiotakis, G., Margaritis, M., & Voyiatzaki, E. (2005). Why logging of fingertip actions is not...
  • Baker, R., Corbett, A., & Koedinger, K. (2004). Detecting student misuse of intelligent tutoring systems. In...
  • Bari, M., & Benzater, B. (2005). Retrieving data from pdf interactive multimedia productions. In International...
  • Beck, J., & Woolf, B. (2000). High-level student modeling with machine learning. In Intelligent tutoring systems (pp....
  • Becker, K., Ghedini, C., & Terra, E. (2000). Using kdd to analyze the impact of curriculum revisions in a Brazilian...
  • Becker, K., Vanzin, M., & Ruiz, D. D. A. (2005). Ontology-based filtering mechanisms for web usage patterns retrieval....
  • P. Brusilovsky et al.

    Adaptive and intelligent web-based educational systems

    International Journal of Artificial Intelligence in Education

    (2003)
  • Castro, F., Vellido, A., Nebot, A., & Minguillon, J. (2005). Detecting atypical student behaviour on an e-learning...
  • G. Chen et al.

    Discovering decision knowledge from web log portfolio for managing classroom processes by applying decision tree and data cube technology

    Journal of Educational Computing Research

    (2000)
  • Chen, J., Li, Q., Wang, L., & Jia, W. (2004). Automatically generating an e-textbook on the web. In International...
  • Damez, M., Marsala, C., Dang, T., & Bouchon-Meunier, B. (2005). Fuzzy decision tree for user modeling from...
  • Farzan, R. (2004). Adaptive socio-recommender system for open-corpus e-learning. In Doctoral consortium of the third...
  • Feng, M., Heffernan, N., & Koedinger, K. (2005). Looking for sources of error in predicting student’s knowledge. In...
  • Freyberger, J., Heffernan, N., & Ruiz, C. (2004). Using association rules to guide a search for best fitting transfer...
  • Grob, H., Bensberg, F., & Kaderali, F. (2004). Controlling open source intermediaries – a web log mining approach. In...
  • Grobelnik, M., Mladenic, D., & Jermol, M. (2002). Exploiting text mining in publishing and education. In Proceedings of...
  • Ha, S., Bae, S., & Park, S. (2000). Web mining for distance education. In IEEE international conference on management...
  • Hamalainen, W., Suhonen, J., Sutinen, E., & Toivonen, H. (2004). Data mining in personalizing distance education...
  • Hammouda, K., & Kamel, M. (2005). Ch. Data mining in...
  • M. Hanna

    Data mining in the e-learning domain

    Computers & Education Journal

    (2004)
  • Heiner, C., Beck, J., & Mostow, J. (2004). Lessons on using its data to answer educational research questions. In...
  • Heraud, J., France, L., & Mille, A. (2004). Pixed: an its that guides students with the help of learners’ interaction...
  • Iksal, S., & Choquet, C. (2005). Usage analysis driven by models in a pedagogical...
  • A. Ingram

    Using web server logs in evaluating instructional web sites

    Journal of Educational Technology Systems

    (1999)
  • S. Johnson et al.

    Comparative analysis of learner satisfaction and learning outcomes in online and face-to-face learning environments

    Journal of Interactive Learning Research

    (2000)
  • W. Klosgen et al.

    Handbook of data mining and knowledge discovery

    (2002)
  • Koutri, M., Avouris, N., & Daskalaki, S. (2004). Ch. A survey on web usage mining techniques for web-based adaptive...
  • Li, J., & Zaïane, O. (2004). Combining usage, content, and structure data to improve web site recommendation. In...
  • Lu, J. (2004). Personalized e-learning material recommender system. In International conference on information...
  • Luan, J. (2002). Data mining, knowledge management in higher education, potential applications. In Workshop associate...
  • Ma, Y., Liu, B., Wong, C., Yu, P., & Lee, S. (2000). Targeting the right students using data mining. In KDD ’00:...
  • Markellou, P., Mousourouli, I., Spiros, S., & Tsakalidis, A. (2005). Using semantic web mining technologies for...
  • Markham, S., Ceddia, J., Sheard, J., Burvill, C., Weir, J., Field, B., et al. (2003). Applying agent technology to...
  • Mazza, R., & Milani, C. (2005). Exploring usage analysis in learning systems: Gaining insights from visualisations. In...
  • Cited by (1094)

    • Modern Education: Advanced Prediction Techniques for Student Achievement Data

      2024, International Journal of Advanced Computer Science and Applications
    View all citing articles on Scopus
    View full text