Using subject specialists to validate an ESP rating scale: The case of the International Civil Aviation Organization (ICAO) rating scale

doi:10.1016/j.esp.2013.08.002

English for Specific Purposes

Volume 33, January 2014, Pages 77-86

https://doi.org/10.1016/j.esp.2013.08.002 Get rights and content

Highlights

•
The study presents a validation of the ICAO language proficiency requirements.
•
The judgments of pilots and language trained raters are compared.
•
Implications for rating scale design for LSP assessment are discussed.

Abstract

As part of the English-language proficiency requirements for pilots and air traffic controllers, the International Civil Aviation Organization (ICAO) published a rating scale designed to assess pilots’ and air traffic controllers’ aviation English proficiency. However, it is not clear how this scale was developed. As part of an attempt to address the need for validation, this paper presents a study involving focus group interviews with pilots. Ten pilots listened to performances of test takers taking a variety of aviation English tests. The pilots were asked to rate the acceptability of the pilot’s language for (a) communicating with other pilots and (b) radiotelephony communications with air traffic control. The focus groups had two aims: (1) to establish the ‘indigenous’ assessment criteria pilots use when assessing the language ability of peers and (2) to establish what level is sufficient as the operational level. The results showed that the pilots focused on some but not all of the criteria on the ICAO scale. Whilst listening to the performances, they also often focused on the speakers’ technical knowledge. The paper proposes a model of how industry professionals can be involved in the validation of an LSP rating scales.

Introduction

Having recognized the fact that inadequate English proficiency on the part of pilots or air traffic controllers has played a role in the chain of events leading to accident or incidents, the International Civil Aviation Organization (ICAO) decided to strengthen language proficiency requirements for radiotelephony communication provisions, and established a set of Language Proficiency Requirements (ICAO, 2004, ICAO, 2010). That is, airline pilots and air traffic controllers who engage in international flight operation must be able to show that their level of English proficiency is at or above the operational level (Level 4) to practice their professions. In the scale published as part of the proficiency requirements, there are six proficiency levels from Pre-Elementary (Level 1) to Expert (Level 6) across six assessment criteria: Pronunciation, Structure, Vocabulary, Fluency, Comprehension, and Interactions. The initial deadline for the requirements was 5 March 2008. However, three years’ grace was given to those ICAO member states which were not prepared to abide by the testing requirements by the time of the initial deadline. Therefore, the requirements came into effect on 5 March 2011. Unfortunately, very little information is available on how the rating scale and the proficiency requirements were developed. According to the ICAO, the Proficiency Requirements in Common English (PRICE) Study Group, who developed the requirements, consisted of industrial and linguistic experts with a background in aviation. This study aims to establish what criteria pilots use when evaluating the speech of their peers and to establish what passing standards in relation to the ICAO scale levels are applied by pilots. It therefore attempts to fill the gap for validation studies on the ICAO proficiency requirements.

It is important here to describe the nature of the language used by pilots and air traffic controllers during their radiotelephony communication. The language used can be categorized into two types: standard phraseology and plain language. Phraseology, which is used in the majority of cases of communication in this context, is a language consisting of a restricted repertoire. This means that the language needed is strictly controlled and standardized. A list of the basic principles of standard phraseology can be found in ICAO (2001). A further feature of standard phraseology is that it is a simplified language that emphasizes certain features of language, for example whether an instruction or advice is negative or positive. Standard phraseology aims for clarity and avoidance of ambiguity of meaning and pronunciation. Examples of specific instances of phraseology can be found in ICAO (2001) and in Kim (2012). Standard phraseology is part of the regular training of pilots and air traffic controllers as it needs considerable practice; however, it has been shown that it is not always used when required (see e.g. Howard, 2008). Plain language, on the other hand, is used in contexts in which phraseology does not suffice. When using plain language, pilots and air traffic controllers are required to simplify their language as much as possible and avoid using ambiguous language (see e.g. Kim, 2012).

Most LSP assessment systems make use of a rating scale which raters use to judge spoken or written performances. Such scales are generally used because they are a representation of the test construct. To best represent the test construct at hand, rating scales should be grounded in a theory that describes the type of language used in the target language use domain (McNamara, 2002, Turner, 2000). However, rating scales are often not developed in a way that accounts for such a theory. In fact, often very little information is available on how rating scales were developed (e.g. Brindley, 1998, McNamara, 1996, Turner, 2000, Upshur and Turner, 1995).

In the context of LSP testing, Douglas (2001) writes that the content of the target language use (TLU) domain that serves as the basis for the content of the test tasks is usually fairly well understood; however, the way assessment criteria and rating scales should be developed is not through an analysis of the TLU situation, but rather through an understanding of what it means to know and use a language in the specific context (Jacoby & McNamara, 1999). That is, rather than focusing on the specific tasks used for assessment, knowledge of what makes a successful performance in the TLU context is important. Douglas therefore argues that in the development of LSP assessment criteria, theoretically based approaches should be supplemented by taking into account the criteria that experienced professionals in the relevant field employ when evaluating communicative language use. Jacoby (1998) first coined the term ‘indigenous assessment criteria’ to refer to such criteria used by subject specialists when assessing communication in their respective professional fields. Such criteria can vary widely from being linguistic in focus to commenting on professional competence and even commenting on a professional’s appearance (see e.g. Douglas & Myers, 2000). Because of this, some authors have cautioned of potential problems of transferring or superimposing these highly context-specific indigenous criteria back onto the criteria used in language assessments (see e.g. Jacoby & McNamara, 1999) and argued that their fit to the language test needs to be critically evaluated (see e.g. Douglas, 2000). Overall, the literature on the use of indigenous criteria in LSP is promising but more work needs to be done to understand how such criteria can be incorporated into rating scales based on linguistic criteria.

Several studies on LSP assessment have made use of subject specialists for rating scale validation (Douglas and Myers, 2000, Elder, 1993, Elder et al., 2012, Jacoby, 1998, Jacoby and McNamara, 1999). Among the aims of these studies was to elicit the indigenous criteria of professionals in the field and then either feed these back into the assessment cycle or compare them to already existing criteria. Even though most LSP scales use linguistic criteria (Douglas, 2000), using subject specialists’ judgements of language performance adds to the validity of the resulting assessment criteria as they will more closely reflect norms expected in the workplace. Lumley (1998) argues that it is common practice to involve industry specialists in the test design phase but that it is less common to use this group of stakeholders in the formation of rating criteria and standard-setting. Lumley (1998), for example, compared the ratings of ESL professionals and healthcare professionals on the Occupational English Test (OET) speaking section. He found that there were similarities in the ratings. Banerjee and Taylor (2005) conducted a similar study using the IELTS test and also found a general acceptable level of agreement. Most previous research was done in the domain of testing the English proficiency of health professionals (Douglas and Myers, 2000, Elder et al., 2012, Ryan, 2007). A recent validation study of the criteria used in the OET speaking sub-test (Elder et al., 2012), for example, shows that the educators in the health professions of medicine, nursing and physiotherapy hardly mentioned language skills when evaluating the performance of trainee–patient interactions, and what they commented on is not reflected in the current OET assessment criteria (the health professionals commented for example on how well the test taker had managed the interaction). Few studies have been conducted using aviation professionals as informants despite the high stakes of the ICAO language proficiency requirements. Also, very few studies have employed industry professionals as informants for post hoc validation of both the linguistic criteria of an LSP rating scale and a validation of the cut-scores or passing criteria. The aim of this study, therefore, is to explore the utility of using pilot informants in the context of post hoc validation of an aviation-related LSP rating scale.

Section snippets

Methodology

To gather the necessary data for such post hoc validation of the scale, experienced pilots were invited to take part in focus group interviews with the aim of eliciting information on two aspects: (a) what criteria pilots use when evaluating the language competency of their colleagues and (b) what level of English language competence was deemed sufficient to work in an operational environment. The specific research questions are as follows:

1.
What criteria do pilots use when evaluating the

Results

The results of the two research questions are presented in turn below.

Discussion and conclusion

The findings show that the pilot informants drew on a wider range of criteria than those included in the ICAO scale to judge the language ability of the pilots who provided the speech samples. They put much weight on the technical knowledge of the speaker. It is possible that pilots rely on this evidence as they are not trained in the assessment of linguistic criteria.

Many comments were also made about the pronunciation of the speakers. This criterion is represented in the ICAO rating scale.

Dr Ute Knoch is a senior research fellow and the Acting Director of the Language Testing Research Centre at the University of Melbourne. Her research interests are in the areas of second language writing assessment, writing development, and assessing languages for academic and specific purposes.

References (29)

S. Jacoby et al.
Locating competence
English for Specific Purposes
(1999)
T. Lumley
Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency
English for Specific Purposes
(1998)
J. Banerjee et al.
Setting the standard: What English language abilities do overseas trained doctors need? Paper presented at the Language Testing Research Colloqium (LTRC) in Ottawa
(2005)
G. Brindley
Describing language development? Rating scales and SLA
A. Brown
The effect of rater variables in the development of an occupation-specific language performance test
Language Testing
(1995)
A. Davies
The logic of testing languages for specific purposes
Language Testing
(2001)
D. Douglas
Assessing languages for specific purposes
(2000)
D. Douglas
Language for specific purposes assessment criteria: Where do they come from?
Language Testing
(2001)
D. Douglas et al.
Assessing the communication skills of veterinary students: Whose criteria?
D. Douglas et al.
Research methodology in context-based second language research

C. Elder

How do subject specialists construe classroom language proficiency?

Language Testing

(1993)

C. Elder et al.

Health professionals’ view of communication: Implications for assessing performance on a health-specific English language test

TESOL Quarterly

(2012)

J.W. Howard

Tower, am I cleared to land? Problematic communication in aviation discourse

Human Communication Research

(2008)

ICAO

(2001)

Cited by (0)

View full text

Article preview

English for Specific Purposes

Highlights

Abstract

Introduction

Section snippets

Methodology

Results

Discussion and conclusion

References (29)

Locating competence

English for Specific Purposes

Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency

English for Specific Purposes

Setting the standard: What English language abilities do overseas trained doctors need? Paper presented at the Language Testing Research Colloqium (LTRC) in Ottawa

Describing language development? Rating scales and SLA

The effect of rater variables in the development of an occupation-specific language performance test

Language Testing

The logic of testing languages for specific purposes

Language Testing

Assessing languages for specific purposes

Language for specific purposes assessment criteria: Where do they come from?

Language Testing

Assessing the communication skills of veterinary students: Whose criteria?

Research methodology in context-based second language research

How do subject specialists construe classroom language proficiency?

Language Testing

Health professionals’ view of communication: Implications for assessing performance on a health-specific English language test

TESOL Quarterly

Tower, am I cleared to land? Problematic communication in aviation discourse

Human Communication Research

Cited by (0)