Original paper
Determining the intra- and inter-observer reliability of screening tools used in sports injury research

https://doi.org/10.1016/j.jsams.2006.09.002Get rights and content

Summary

Sports injury etiological studies explore the relationships between potential injury risk factors and injury outcomes. The ability of such studies to clearly identify intrinsic risk factors for sports injury depends on the accuracy of their measurement. Measurements need to be reproducible over time and repeatable by different observers, as well as within a given individual. The importance of the reliability of pre-participation screening protocols and other clinical assessment tools has been identified in a number of published studies. However, a review of these studies indicates that a variety of statistical techniques have been used to calculate intra- and inter-observer reliability. While the intra-class correlation coefficient (ICC) is the most often cited measure, a range of statistical approaches to estimating ICCs have been used. It is therefore difficult to determine which statistical method is most appropriate in the context of measuring intrinsic risk factors in sports injury research. This paper summarises a statistical method for the concurrent assessment of intra- and inter-observer reliability and presents an argument for why this approach should be adopted by sports injury researchers using screening protocols that collect continuous data.

Section snippets

The importance of reliability

Over recent years there has been an increasing call to provide a firm evidence base for sports injury prevention initiatives. As argued by Bahr and Krosshaug,1 provision of this evidence base is limited by knowledge about the etiological factors causing many sports injuries. To redress this imbalance, there needs to be considerably more effort put towards conducting studies to elucidate the intrinsic and extrinsic risk factors for sports injury.

Such studies naturally involve the measurement of

Definition of reliability and its related concepts

Validity of measurement is the degree to which a test measures what it is supposed to measure4 and reliability refers to the consistency, or repeatability, of a measure.4, 5 While a measure can be reliable without being valid, the reverse is not true.4, 6 Low reliability indicates that large variations in measurement will occur upon retesting so that assessment outcomes cannot be meaningfully reproduced or interpreted.7 While factors such as weight and height are typically measured with high

Purpose of this paper

The importance of the reliability of pre-participation musculoskeletal screening protocols, fitness assessments and other clinical assessment tools has been identified in a number of published studies.10, 11, 12, 13, 14, 15, 16 These studies, which include both inter-observer and intra-observer reliability assessments of a variety of clinical musculoskeletal tests used in sport and physical therapy, were retrieved from searches of the Medline database. The search terms used were ‘reliability’,

Data example

This paper is aimed as an Educational Piece for researchers who conduct reliability research. While it presents some statistical formulae, its emphasis is on providing information for application in future studies. To illustrate this, the real-world example of a reliability assessment of a musculoskeletal screening protocol used in a prospective cohort study of cricket fast bowlers is presented.19 The reliability assessment was conducted using two observers and 10 bowlers. The bowlers were each

Statistical methodology

When conducting a reliability study, there are two main situations to consider:

  • 1.

    the observers are assumed to have been drawn randomly from a larger population (random observers);

  • 2.

    the observers are the only ones of interest (fixed observers).

This is an important distinction because the formulas for calculating the reliability differ slightly for these two scenarios. Our reliability assessment example with two physiotherapists is a random observers case, because two physiotherapists conducted the

Definition of the ICC

In this paper, the definition of the ICC is the ratio of a covariance term and a variance term, in accordance with the usual definition of correlation coefficients. The ICC ranges from zero, when all observed differences between participants are caused by measurement error, to one when the ability to distinguish participants from each other based on the variable of interest is not at all influenced by random error.3 Therefore, an ICC equal to, or close to, one is the desired result when

The case of random observers

The ICC for inter-observer reliability is: ICCinter = cov(Yijk,Yilk)/var(Yijk), where j and l refer to different observers. This may then be estimated using the formula:ICCˆinter=σˆS2σˆS2+σˆO2+σˆSO2+σˆe2

Each of the variance components may be estimated from Table 2.

For intra-observer reliability, the formula is ICCintra = cov(Yijk,Yijl)/var(Yijk), where k and l refer to different measurements taken by the same observer on the same subject. This may be estimated using the formula:ICCˆintra=σˆS2+σˆO2+σ

The case of fixed observers

Just as in the case above, the reliability coefficients are calculated as the ratio of a covariance and a variance term. However, we now need to use the right hand side of Table 2 to estimate the reliability coefficients, and so the formulas for the calculating the ICC are different in this case. The formulas are:ICCˆinter=(σˆS2σˆSO2)/oσˆS2+(o1)σˆSO2/o+σˆe2andICCˆintra=(σˆS2+(o1)σˆSO2)/oσˆS2+(o1)σˆSO2/o+σˆe2

Once again, each of the estimates of the variance components can be estimated

Hypothesis tests to test if the reliability meets a specified level

Hypothesis tests can be easily used to test if the observed reliability meets a specified level.17, 18 There are no universally applicable standards as to how high the ICC must be to constitute acceptable reliability, as this depends on the purpose, the use and consequences resulting from the assessment.7 For example, an ICC of 0.6 may be considered appropriate within the context of a pre-participation screening for sports injury research. However, this may not be appropriate for a clinical

Confidence intervals and sample size

Although it is possible to calculate confidence intervals for ICCs, the formulas are long and complicated, and are therefore included in the attached supplemental file Appendix A in supplementary data. Application of Appendix A in supplementary data to our example, leads to a 95% CI of 0.253–0.896 for inter-observer reliability. For intra-observer reliability, the 95% CI is 0.539–0.961.

It is beyond the scope of this paper to discuss the sample sizes needed for reliability studies, though it is

Measurement error and its relationship to reliability

Measurement error, often called the standard error of measurement (S.E.M.), is particularly important in clinical applications, where it used to detect real changes from those that could have occurred by chance alone. For intra-observer reliability, the formula for the S.E.M. is:S.E.M.intra=σˆe=MSE

The formula for S.E.M. for inter-observer reliability is given for the case of random observers by:S.E.M.inter,random=σˆO2+σˆSO2+σˆe2=MSO+(n1)MSSO+n(m1)MSEmn

For the case of fixed observers, it is

Concluding remarks

Sports injury prevention requires a firm evidence base. An important component of this is the accuracy and reliability of measurements taken in studies of risk factors. When measurements are not reliable, it is difficult to distinguish between participants with or without risk factors because of the large measurement error.

There are many instances in which it would be advantageous to assess inter- and intra-observer reliability simultaneously. However, most of the commonly used methods of

Acknowledgments

A.H. was supported by the core funding provided to the NSW Injury Risk Management Research Centre by the NSW Department of Health, the NSW Roads and Traffic Authority and the NSW Motor Accidents Authority. R.D. was supported by an NHMRC Public Health Ph.D. scholarship during the data collection and analysis phase, and by an NHMRC Population Health Capacity Building Grant in Injury Prevention, Trauma and Rehabilitation during the reporting and publication phase. C.F. was supported by an NHMRC

References (21)

There are more references available in the full text version of this article.

Cited by (0)

View full text