Re-training writing raters online: How does it compare with face-to-face training?

https://doi.org/10.1016/j.asw.2007.04.001Get rights and content

Abstract

The training of raters for writing assessment through web-based programmes is emerging as an attractive and flexible alternative to the conventional method of face-to-face training sessions. Although some online training programmes have been developed, there is little published research on them. The current study aims to compare the effectiveness of online and face-to-face training in the context of a large-scale academic writing assessment for students entering a major English-medium university. A team of 16 raters, divided into two groups of 8, all initially rated a set of 70 scripts. In the training phase, the online group rated 15 benchmark scripts online and received immediate feedback, whereas the face-to-face group received individual feedback on their pre-training performance, rated the 15 scripts at home and then met for a face-to-face session. After the training, both groups re-rated the initial 70 scripts and then reported their attitudes towards the different forms of training by means of questionnaires and interviews. According to the statistical results, using multi-faceted Rasch measurement, both types of training were effective overall, but the self-report data revealed various responses favouring one type or the other. The findings are discussed in terms of the factors influencing rater responsiveness and the refinements that are needed for future rater training programmes.

Section snippets

Background

Although there have been substantial advances in automated rating of writing in recent years (Jamieson, 2005), it is still the norm in writing assessment to use human raters. Unfortunately, their judgements are prone to various sources of bias and error which can ultimately compromise the quality of the ratings. A number of studies using a range of psychometric methods have identified various rater effects (Myford and Wolfe, 2003, Myford and Wolfe, 2004) which need to be addressed if an

The assessment instrument

The Diagnostic English Language Needs Assessment is an initiative funded by the university to identify the academic English needs of undergraduate students following their admission to a degree programme. Those that are found to be at risk are offered suitable English language support. DELNA consists of a screening and a diagnostic component. The main purpose of the screening, which is made up of vocabulary and text editing tasks, is to identify students who are highly proficient users of

Results

We will present the results of the study by working systematically through the research questions, covering first the outcomes of the FACETS analysis and then the qualitative data from the questionnaires and interviews.

Discussion and conclusion

The findings indicate that, in terms of severity, both forms of training were successful in bringing the raters closer together in their ratings. There was an indication that the online training might have been slightly more successful. Both groups rated consistently before and after the training. Afterwards, the online group might have become slightly more consistent, whilst the face-to-face group rated with slightly more variation. On the individual level, only one rater moved outside the

References (19)

  • J. Hamilton et al.

    Teachers’ perceptions of on-line rater training and monitoring

    System

    (2001)
  • P.J. Congdon et al.

    The stability of rater severity in large-scale assessment programs

    Journal of Educational Measurement

    (2000)
  • C. Elder et al.

    Evaluating rater responses to an online rater training program

    Language Testing

    (2007)
  • W.P. Fisher

    Reliability statistics

    Rasch Measurement: Transactions of the Rasch Measurement SIG

    (1992)
  • J. Jamieson

    Trend in computer-based second language assessment

    Annual Review of Applied Linguistics

    (2005)
  • D. Kenyon et al.

    Evaluating the efficacy of rater self-training

    (1993)
  • F.J. Landy et al.

    The measurement of work performance: Methods, theory, and application

    (1983)
  • J.M. Linacre

    Facets Rasch measurement computer program

    (2006)
  • T. Lumley et al.

    Rater characteristics and rater bias: Implications for training

    Language Testing

    (1995)
There are more references available in the full text version of this article.

Cited by (82)

  • Halo effects in rating data: Assessing speech fluency

    2023, Research Methods in Applied Linguistics
  • Individualized feedback to raters in language assessment: Impacts on rater effects

    2022, Assessing Writing
    Citation Excerpt :

    Likewise, they used another term, “differential rater leniency,” to describe the case that a rater tends to assign higher scores to one or more particular groups of examinees than model expectations. Knoch et al. (2007) combined these two terms and extended their application from one particular group of examinees to an aspect of one facet. They used “bias effect” to describe the case that a rater tends to assign scores that are relatively low or high in terms of an aspect of one facet.

  • Validating a rubric for assessing integrated writing in an EAP context

    2022, Assessing Writing
    Citation Excerpt :

    One of the key elements in an evidence-based approach is the elicitation and interpretation of rater perceptions about evaluation criteria. Studies drawing on qualitative methods, such as interviews and think-aloud protocols, have shown that rater perceptions play an important role in the standardization of scoring rubrics (Knoch et al., 2007). For example, Cumming, Kantor, and Powers (2001) found that while raters attended to rhetoric and content when scoring integrated writing tasks, they focused more on language use when scoring independent tasks.

  • Introduction to many-facet rasch measurement: Analyzing and evaluating rater-mediated assessments

    2023, Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments
View all citing articles on Scopus
1

Tel.: +64 9 3737599x87673; fax: +64 9 3082360.

2

Tel.: +64 9 3737599x82427; fax: +64 9 3082360.

View full text