Original paperDetermining the intra- and inter-observer reliability of screening tools used in sports injury research
Section snippets
The importance of reliability
Over recent years there has been an increasing call to provide a firm evidence base for sports injury prevention initiatives. As argued by Bahr and Krosshaug,1 provision of this evidence base is limited by knowledge about the etiological factors causing many sports injuries. To redress this imbalance, there needs to be considerably more effort put towards conducting studies to elucidate the intrinsic and extrinsic risk factors for sports injury.
Such studies naturally involve the measurement of
Definition of reliability and its related concepts
Validity of measurement is the degree to which a test measures what it is supposed to measure4 and reliability refers to the consistency, or repeatability, of a measure.4, 5 While a measure can be reliable without being valid, the reverse is not true.4, 6 Low reliability indicates that large variations in measurement will occur upon retesting so that assessment outcomes cannot be meaningfully reproduced or interpreted.7 While factors such as weight and height are typically measured with high
Purpose of this paper
The importance of the reliability of pre-participation musculoskeletal screening protocols, fitness assessments and other clinical assessment tools has been identified in a number of published studies.10, 11, 12, 13, 14, 15, 16 These studies, which include both inter-observer and intra-observer reliability assessments of a variety of clinical musculoskeletal tests used in sport and physical therapy, were retrieved from searches of the Medline database. The search terms used were ‘reliability’,
Data example
This paper is aimed as an Educational Piece for researchers who conduct reliability research. While it presents some statistical formulae, its emphasis is on providing information for application in future studies. To illustrate this, the real-world example of a reliability assessment of a musculoskeletal screening protocol used in a prospective cohort study of cricket fast bowlers is presented.19 The reliability assessment was conducted using two observers and 10 bowlers. The bowlers were each
Statistical methodology
When conducting a reliability study, there are two main situations to consider:
- 1.
the observers are assumed to have been drawn randomly from a larger population (random observers);
- 2.
the observers are the only ones of interest (fixed observers).
This is an important distinction because the formulas for calculating the reliability differ slightly for these two scenarios. Our reliability assessment example with two physiotherapists is a random observers case, because two physiotherapists conducted the
Definition of the ICC
In this paper, the definition of the ICC is the ratio of a covariance term and a variance term, in accordance with the usual definition of correlation coefficients. The ICC ranges from zero, when all observed differences between participants are caused by measurement error, to one when the ability to distinguish participants from each other based on the variable of interest is not at all influenced by random error.3 Therefore, an ICC equal to, or close to, one is the desired result when
The case of random observers
The ICC for inter-observer reliability is: ICCinter = cov(Yijk,Yilk)/var(Yijk), where j and l refer to different observers. This may then be estimated using the formula:
Each of the variance components may be estimated from Table 2.
For intra-observer reliability, the formula is ICCintra = cov(Yijk,Yijl)/var(Yijk), where k and l refer to different measurements taken by the same observer on the same subject. This may be estimated using the formula:
The case of fixed observers
Just as in the case above, the reliability coefficients are calculated as the ratio of a covariance and a variance term. However, we now need to use the right hand side of Table 2 to estimate the reliability coefficients, and so the formulas for the calculating the ICC are different in this case. The formulas are:and
Once again, each of the estimates of the variance components can be estimated
Hypothesis tests to test if the reliability meets a specified level
Hypothesis tests can be easily used to test if the observed reliability meets a specified level.17, 18 There are no universally applicable standards as to how high the ICC must be to constitute acceptable reliability, as this depends on the purpose, the use and consequences resulting from the assessment.7 For example, an ICC of 0.6 may be considered appropriate within the context of a pre-participation screening for sports injury research. However, this may not be appropriate for a clinical
Confidence intervals and sample size
Although it is possible to calculate confidence intervals for ICCs, the formulas are long and complicated, and are therefore included in the attached supplemental file Appendix A in supplementary data. Application of Appendix A in supplementary data to our example, leads to a 95% CI of 0.253–0.896 for inter-observer reliability. For intra-observer reliability, the 95% CI is 0.539–0.961.
It is beyond the scope of this paper to discuss the sample sizes needed for reliability studies, though it is
Measurement error and its relationship to reliability
Measurement error, often called the standard error of measurement (S.E.M.), is particularly important in clinical applications, where it used to detect real changes from those that could have occurred by chance alone. For intra-observer reliability, the formula for the S.E.M. is:
The formula for S.E.M. for inter-observer reliability is given for the case of random observers by:
For the case of fixed observers, it is
Concluding remarks
Sports injury prevention requires a firm evidence base. An important component of this is the accuracy and reliability of measurements taken in studies of risk factors. When measurements are not reliable, it is difficult to distinguish between participants with or without risk factors because of the large measurement error.
There are many instances in which it would be advantageous to assess inter- and intra-observer reliability simultaneously. However, most of the commonly used methods of
Acknowledgments
A.H. was supported by the core funding provided to the NSW Injury Risk Management Research Centre by the NSW Department of Health, the NSW Roads and Traffic Authority and the NSW Motor Accidents Authority. R.D. was supported by an NHMRC Public Health Ph.D. scholarship during the data collection and analysis phase, and by an NHMRC Population Health Capacity Building Grant in Injury Prevention, Trauma and Rehabilitation during the reporting and publication phase. C.F. was supported by an NHMRC
References (21)
- et al.
Reliability in evidence-based clinical practice: A primer for allied health professionals
Phys Therapy Sport
(2003) - et al.
Intra-rater and inter-rater reliability of a weight-bearing lunge measure of ankle dorsiflexion
Aust J Physiotherapy
(1998) - et al.
Reliability of common lower extremity musculoskeletal screening tests
Phys Therapy Sport
(2004) - et al.
Intratester and intertester reliability of goniometric measurement of passive lateral shoulder rotation
J Hand Therapy
(1999) - et al.
The intra- and interrater reliability of hip muscle strength assessments using a handheld versus a portable dynamometer anchoring station
Arch Phys Med Rehabil
(2004) - et al.
Intratester and intertester reliability of the Cybex electronic digital inclinometer (EDI-320) for measurement of active neck flexion and extension in healthy subjects
Manual Therapy
(2001) - et al.
Understanding injury mechanisms: a key component of preventing injuries in sport
Br J Sports Med
(2005) - et al.
Risk factors for sports injuries—a methodological approach
Br J Sports Med
(2003) How to evaluate intraexaminer reliability using an interexaminer reliability study design
J Manipulat Physiol Therap
(1995)- et al.
Measuring research variables
Research methods in physical activity
(1996)