Article Info

Ablation Study On Feature Group Importance For Automated Essay Scoring

Jih Soong Tan, Ian K.T. Tan
dx.doi.org/10.17576/apjitm-2022-1101-08

Abstract

Grading of written academic essays by humans requires significant effort. It is a time-consuming task and is vulnerable to human biases. Ever since the introduction of modern computing, this has been one of the many automations being explored. Researches in automated essay scoring have been on-going, where the majority of the researches in recent years are based on extracting multiple linguistic features and using them to build a classification model for automated essay scoring. The 3 main types of features used are lexical, grammatical, and semantic. In our work, we conducted an ablation study to discover the engineered features that has the weakest influence. We did this using a generic feature engineering and classification approach that was used by the winners of the Automated Student Assessment Prize (ASAP). This is to mitigate biases that may have addressed specific feature engineering or models. Our results show that a semantic feature called the prompt has been the weakest feature in influencing the models. From further investigations, this was due to it being over-fitted in the classification model.

keyword

Automated Essay Scoring, Ablation Study, Feature Engineering, Semantic, ASAP

Area

Knowledge Technology