Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach

Thomas Gaillat; Andrew Simpkin; Nicolas Ballier; Bernardo Stearns; Annanda Sousa; Manon Bouyé; Manel Zarrouk

doi:10.1017/S095834402100029X

Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach

Published online by Cambridge University Press: 10 November 2021

and

Thomas Gaillat: Affiliation:
Université Rennes 2, France (thomas.gaillat@univ-rennes2.fr)
Andrew Simpkin: Affiliation:
School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway (andrew.simpkin@insight-centre.org)
Nicolas Ballier: Affiliation:
Université de Paris, France (nicolas.ballier@univ-paris.fr)
Bernardo Stearns: Affiliation:
Data Science Institute (DSI), National University of Ireland, Galway (bernardo.stearns@insight-centre.org)
Annanda Sousa: Affiliation:
Data Science Institute (DSI), National University of Ireland, Galway (annanda.sousa@insight-centre.org)
Manon Bouyé: Affiliation:
Université de Paris, France (manon.bouye@etu.u-paris.fr)
Manel Zarrouk: Affiliation:
Université Sorbonne Paris Nord, France (zarrouk@lipn.univ-paris13.fr)

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper focuses on automatically assessing language proficiency levels according to linguistic complexity in learner English. We implement a supervised learning approach as part of an automatic essay scoring system. The objective is to uncover Common European Framework of Reference for Languages (CEFR) criterial features in writings by learners of English as a foreign language. Our method relies on the concept of microsystems with features related to learner-specific linguistic systems in which several forms operate paradigmatically. Results on internal data show that different microsystems help classify writings from A1 to C2 levels (82% balanced accuracy). Overall results on external data show that a combination of lexical, syntactic, cohesive and accuracy features yields the most efficient classification across several corpora (59.2% balanced accuracy).

Keywords

microsystem criterial features supervised learning language functions automatic essay scoring linguistic complexity

Type: Research Article
Information: ReCALL , Volume 34 , Issue 2 , May 2022 , pp. 130 - 146

DOI: https://doi.org/10.1017/S095834402100029X [Opens in a new window]
Copyright: © The Author(s), 2021. Published by Cambridge University Press on behalf of European Association for Computer Assisted Language Learning

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arnold, T., Ballier, N., Gaillat, T. & Lissòn, P. (2018) Predicting CEFR levels in learner English on the basis of metrics and full texts. Proceedings of the 20th Conférence Sur l’Apprentissage Automatique. INSA de Rouen, 20–22 June.Google Scholar

Ballier, N., Canu, S., Petitjean, C., Gasso, G., Balhana, C., Alexopoulou, T. & Gaillat, T. (2020) Machine learning for learner English: A plea for creating learner data challenges. International Journal of Learner Corpus Research, 6(1): 72–103. https://doi.org/10.1075/ijlcr.18012.bal CrossRef Google Scholar

Ballier, N. & Gaillat, T. (2016) Classifying French learners of English with written-based lexical and complexity metrics. JEP-TALN-RECITAL 2016, 9: 1–14.Google Scholar

Ballier, N., Gaillat, T., Simpkin, A., Stearns, B., Bouyé, M. & Zarrouk, M. (2019) A supervised learning model for the automatic assessment of language levels based on learner errors. In Scheffel, M., Broisin, J., Pammer-Schindler, V., Ioannou, A. & Schneider, J. (eds.), Transforming learning with meaningful technologies: 14th European Conference on Technology Enhanced Learning, EC-TEL 2019, Delft, The Netherlands, September 16–19, 2019, proceedings. Switzerland: Springer International Publishing, 308–320. https://doi.org/10.1007/978-3-030-29736-7_23 CrossRef Google Scholar

Biber, D., Gray, B., Staples, S. & Egbert, J. (2020) Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes, 46: 1–15. https://doi.org/10.1016/j.jeap.2020.100869 CrossRef Google Scholar

Boulton, A. (2017) Data-driven learning and language pedagogy. In Thorne, S. L. & May, S. (eds.), Language, education and technology (3rd ed.). Cham: Springer International Publishing, 181–192. https://doi.org/10.1007/978-3-319-02237-6_15 Google Scholar

Chen, M. & Zechner, K. (2011) Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human language technologies: Volume 1: Long papers. Stroudsburg: Association for Computations Linguistics, 722–731.Google Scholar

Chen, X. & Meurers, D. (2016) CTAP: A web-based tool supporting automatic complexity analysis. In Brunato, D., Dell’Orletta, G., Venturi, G., François, T. & Blache, P. (eds.), Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC). Osaka: The COLING 2016 Organizing Committee, 113–119.Google Scholar

Crossley, S. A., Kyle, K., Allen, L. K., Guo, L. & McNamara, D. S. (2014) Linguistic microfeatures to predict L2 writing proficiency: A case study in automated writing evaluation. The Journal of Writing Assessment, 7(1): 1–34.Google Scholar

Crossley, S. A., Kyle, K. & McNamara, D. S. (2016) The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48(4): 1227–1237. https://doi.org/10.3758/s13428-015-0651-7 CrossRef Google Scholar PubMed

Crossley, S. A. & McNamara, D. S. (2012) Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2): 115–135. https://doi.org/10.1111/j.1467-9817.2010.01449.x CrossRef Google Scholar

Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2011) Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28(4): 561–580. https://doi.org/10.1177/0265532210378031 CrossRef Google Scholar

de Jong, J. H. A. L. & Benigno, V. (2017) Alignment of the Global Scale of English to other scales: The concordance between PTE Academic, IELTS, and TOEFL (Global Scale of English Research Series). London: Pearson.Google Scholar

Depraetere, I. & Langford, C. (2012) Advanced English grammar: A linguistic approach. London: Continuum International.Google Scholar

Ellis, R. (1994) The study of second language acquisition. Oxford: Oxford University Press.Google Scholar

Friedman, J. H., Hastie, T. & Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1–22. https://doi.org/10.18637/jss.v033.i01 CrossRef Google Scholar PubMed

Gaillat, T. (2016) Reference in interlanguage: The case of this and that. From linguistic annotation to corpus interoperability. Université Paris Diderot, unpublished PhD.Google Scholar

Garner, J., Crossley, S. & Kyle, K. (2019) N-gram measures and L2 writing proficiency. System, 80: 176–187. https://doi.org/10.1016/j.system.2018.12.001 CrossRef Google Scholar

Geertzen, J., Alexopoulou, T. & Korhonen, A. (2014) Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCamDat). In Miller, R. T., Martin, K. I., Eddington, C. M., Henery, A., Miguel, N., Tseng, A., Tuninetti, A. & Walter, D. (eds.), Selected proceedings of the 2021 Second Language Research Forum: Building bridges between disciplines. Somerville: Cascadilla Proceedings Project, 240–254.Google Scholar

Gentilhomme, Y. (1979) Microsystèmes linguistiques et langagiers: Fonctions heuristiques et didactiques. Introduction méthodologique. Travaux du Centre de Recherches Sémiologiques, 34: 1–31.Google Scholar

Gentilhomme, Y. (1980) Microsystèmes et acquisition des langues. Encrages, Numéro spécial: 79–84.Google Scholar

Granger, S., Kraif, O., Ponton, C., Antoniadis, G. & Zampa, V. (2007) Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL, 19(3): 252–268. https://doi.org/10.1017/S0958344007000237 CrossRef Google Scholar

Hawkins, J. A. & Buttery, P. (2010) Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1(1): 1–23. https://doi.org/10.1017/S2041536210000103 CrossRef Google Scholar

Hawkins, J. A. & Filipović, L. (2012) Criterial features in L2 English: Specifying the reference levels of the Common European Framework. Cambridge: Cambridge University Press.Google Scholar

Hoerl, A. E. & Kennard, R. W. (2000) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 42(1): 80–86. https://doi.org/10.1080/00401706.2000.10485983 CrossRef Google Scholar

Housen, A., Kuiken, F. & Vedder, I. (eds.) (2012) Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA. Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.32 CrossRef Google Scholar

Huang, Y., Murakami, A., Alexopoulou, T. & Korhonen, A. (2018) Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1): 28–54. https://doi.org/10.1075/ijcl.16080.hua CrossRef Google Scholar

Khushik, G. A. & Huhta, A. (2020) Investigating syntactic complexity in EFL learners’ writing across Common European Framework of Reference Levels A1, A2, and B1. Applied Linguistics, 41(4): 506–532. https://doi.org/10.1093/applin/amy064 CrossRef Google Scholar

Kuhn, M. (2008) Building predictive models in R using the caret package. Journal of Statistical Software, 28(5): 1–26. https://doi.org/10.18637/jss.v028.i05 CrossRef Google Scholar

Kyle, K. & Crossley, S. A. (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4): 757–786. https://doi.org/10.1002/tesq.194 CrossRef Google Scholar

Lan, G., Lucas, K. & Sun, Y. (2019) Does L2 writing proficiency influence noun phrase complexity? A case analysis of argumentative essays written by Chinese students in a first-year composition course. System, 85: 1–13. https://doi.org/10.1016/j.system.2019.102116 CrossRef Google Scholar

Lu, X. (2010) Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4): 474–496. https://doi.org/10.1075/ijcl.15.4.02lu CrossRef Google Scholar

Lu, X. (2012) The Relationship of Lexical Richness to the Quality of ESL Learners’ Oral Narratives. The Modern Language Journal, 96(2): 190–208. https://doi.org/10.1111/j.1540-4781.2011.01232_1.x CrossRef Google Scholar

Lu, X. (2014) Computational methods for corpus annotation and analysis. Dordrecht: Springer. https://doi.org/10.1007/978-94-017-8645-4 CrossRef Google Scholar

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014) The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland: Association for Computational Linguistics, 55–60. https://doi.org/10.3115/v1/P14-5010 CrossRef Google Scholar

Marcus, M. P., Santorini, B. & Marcinkiewicz, M. A. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2): 313–330. https://doi.org/10.21236/ADA273556 Google Scholar

Meurers, D. (2015) Learner corpora and natural language processing. In Granger, S., Gilquin, G. & Meunier, F. (eds.), The Cambridge handbook of learner corpus research. Cambridge: Cambridge University Press, 537–566. https://doi.org/10.1017/CBO9781139649414.024 CrossRef Google Scholar

O’Keeffe, A. & Mark, G. (2017) The English Grammar Profile of learner competence: Methodology and key findings. International Journal of Corpus Linguistics, 22(4): 457–489. https://doi.org/10.1075/ijcl.14086.oke CrossRef Google Scholar

Ortega, L. (2009) Understanding second language acquisition. London: Hodder Education.Google Scholar

Page, E. B. (1968) The use of the computer in analyzing student essays. International Review of Education/Internationale Zeitschrift für Erziehungswissenschaft/Revue Internationale de l’Education, 14(2): 210–225. https://doi.org/10.1007/BF01419938 Google Scholar

Py, B. (1980) Quelques réflexions sur la notion d’interlangue. La Revue Tranel (TRavaux NEuchâtelois de Linguistique), 1: 31–54.Google Scholar

Py, B. (1996) Les données et leur rôle dans l’acquisition d’une langue non maternelle. Les Carnets du Cediscor, 4: 95–110. https://doi.org/10.4000/cediscor.402 CrossRef Google Scholar

Py, B. (2000) Didactique des langues étrangères et recherche sur l’acquisition. Les conditions d’un dialogue. Études de Linguistique Appliquée, 120: 395–404.Google Scholar

Saricaoglu, A. (2019) The impact of automated feedback on L2 learners’ written causal explanations. ReCALL, 31(2): 189–203. https://doi.org/10.1017/S095834401800006X CrossRef Google Scholar

Shute, V. J. (2008) Focus on formative feedback. Review of Educational Research, 78(1): 153–189. https://doi.org/10.3102/0034654307313795 CrossRef Google Scholar

Sousa, A., Ballier, N., Gaillat, T., Stearns, B., Zarrouk, M., Simpkin, A. & Bouyé, M. (2020) From linguistic research projects to language technology platforms: A case study in learner data. In Rehm, G., Bontcheva, K., Choukri, K., Hajič, J., Piperidis, S. & Vasiljevs, A. (eds.), Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). Paris: European Language Resources Association, 112–120.Google Scholar

Tack, A., François, T., Roekhaut, S. & Fairon, C. (2017) Human and automated CEFR-based grading of short answers. In Tetreault, J., Burstein, J., Kockhmar, E., Leacock, C. & Yannakoudakis, H. (eds.), Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. Stroudsburg: Association for Computations Linguistics, 169–179. https://doi.org/10.18653/v1/W17-5018 CrossRef Google Scholar

Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x Google Scholar

Tono, Y. (2013) Automatic extraction of L2 criterial lexico-grammatical features across pseudo-longitudinal learner corpora: Using edit distance and variability-based neighbour clustering. In Bardel, C., Lindqvist, C. & Laufer, B. (eds.), L2 vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus analysis. Amsterdam: European Second Language Association, 149–176.Google Scholar

Vajjala, S. (2018) Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28(1): 79–105. https://doi.org/10.1007/s40593-017-0142-3 CrossRef Google Scholar

van Ek, J. A. & Trim, J. L. M. (1998) Threshold 1990 (Conseil de l’Europe, Ed.). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511667176 Google Scholar

van Rooy, B. & Schafer, L. (2003) An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus. In Archer, D., Rayson, P., Wilson, A. & McEnery, T. (eds.), Proceedings of the Corpus Linguistics 2003 Conference (UCREL Technical Paper Number 16). Lancaster: Lancaster University, 835–844.Google Scholar

Venant, R. & d’Aquin, M. (2019) Towards the prediction of semantic complexity based on concept graphs. In Lynch, C. F., Merceron, A., Desmarais, M. & Nkambou, R. (eds.), Proceedings of the 12th International Conference on Educational Data Mining. Canada: Université du Québec à Montréal; Polytechnique Montréal, 188–197.Google Scholar

Volodina, E., Pilán, I. & Alfter, D. (2016) Classification of Swedish learner essays by CEFR levels. In Papadima-Sophocleous, S., Bradley, L. & Thouësny, S. (eds.), CALL communities and culture – Short papers from EUROCALL 2016. Dublin: Research-publishing.net, 456–461. https://doi.org/10.14705/rpnet.2016.eurocall2016.606 CrossRef Google Scholar

Wolfe-Quintero, K., Inagaki, S. & Kim, H.-Y. (1998) Second language development in writing: Measures of fluency, accuracy, & complexity. Honolulu: Second Language Teaching & Curriculum Center, University of Hawai‘i at Mānoa.Google Scholar

Yannakoudakis, H., Briscoe, T. & Medlock, B. (2011) A new dataset and method for automatically grading ESOL texts. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human language technologies: Volume 1: Long papers. Stroudsburg: Association for Computations Linguistics, 180–189.Google Scholar

Zou, H., & Hastie, T. (2005) Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, 67(2): 301–320.CrossRef Google Scholar

Gaillat et al. supplementary material

Gaillat et al. supplementary material 1

PDF 140.1 KB

Article contents

Predicting CEFR levels in learners of English: The use of microsystem criterial features in a machine learning approach

Abstract

Keywords

Access options

References

Gaillat et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests