Original articlesMethods for assessing responsiveness: a critical review and recommendations
Introduction
It is widely argued that outcome measures in clinical trials should be reliable, valid, and responsive 1, 2, 3. A reliable measure is one that tends to produce the same results when administered on two or more occasions under identical conditions 4, 5, 6. Reliability is typically assessed in test–retest studies by analyses based on the kappa statistic or the intraclass correlation. A valid measure is one that measures what it was intended to measure 4, 5, 6 and is assessed by estimation of sensitivity and specificity, ROC curve analyses, correlation analyses, or regression models.
There does not, however, appear to be a consensus in the literature on what constitutes a responsive measure nor, correspondingly, how responsiveness should be quantified. A review of the literature suggests there are two major aspects of responsiveness, each having its own definition and strategies for assessment. We define the first as “internal responsiveness,” which characterizes the ability of a measure to change over a particular prespecified time frame. One widely used method of assessing internal responsiveness is to evaluate the change in a measure within the context of a randomized clinical trial involving a treatment that has previously been shown to be efficacious 7, 8, 9, 10, 11, 12, 13. Any observed change in the measure is typically attributed to clinically relevant changes in health. Alternatively, change in a measure has been assessed using a single group repeated measures design, where patients are assessed before and after a known efficacious treatment (e.g., total hip arthroplasty, back surgery, physiotherapy). This strategy has frequently been employed to compare change in various health status measures 14, 15, 16, 17, 18, 19. The internal responsiveness of a measure, evaluated by either of these methods, will depend upon both the particular treatment and the particular outcomes used to determine treatment efficacy.
We use the term “external responsiveness” to define the second aspect of responsiveness. External responsiveness reflects the extent to which changes in a measure over a specified time frame relate to corresponding changes in a reference measure of health status. In this context, in contrast to internal responsiveness, the measure is not in and of itself of primary interest. Rather, it is the relationship between change in the measure and change in the external standard. One motivation for this is that if the relationship is strong (i.e., the measure is shown to adequately capture changes in the standard), the measure may be used instead of the reference measure as an outcome in future clinical trials. Another motivation is more general and is not based on the assumption that the measure under study should be a replacement for a standard measure. Rather, change in the standard is viewed as an accepted indication of a change in the condition of a patient. By accepted, we mean change that would be widely regarded by clinicians as meaningful and important change in clinical status. If the standard changes then it follows that some change in the measure under investigation would also be expected. Note that, unlike internal responsiveness, the external responsiveness of a measure will depend only on the choice of the external standard and not on the treatments under investigation. This implies that external responsiveness is a property of a measure and therefore it has meaning in a wider range of settings than the more context specific concept of internal responsiveness.
The lack of consensus on what “responsive” actually means, and how one should assess it, has led to a proliferation of responsiveness statistics, with investigators often reporting several within one study. This makes comparisons of measures across and within studies difficult or impossible 2, 3, 18, 19, 20, 21, 22. Beaton et al. [20] write: “there is no gold ‘standard’ for summarizing responsiveness, although some consensus is needed … the literature demonstrates inconsistency in the methods used for calculating responsiveness statistics, and readers must be cautioned to examine the formulae amid adaptations made to the different statistics.” Thus, the most appropriate responsiveness statistic remains a matter of debate and, indeed, if there are different aspects of responsiveness that are of interest, more than one statistic may be reported. However, there seems to be several statistics that have been proposed and are used that purport to reflect the same thing. This has motivated our current investigation.
The purpose of this article is to examine the property of responsiveness from a foundational standpoint. Many of the issues that we discuss have been explicitly or implicitly raised by others 2, 3, 17, 20, 21. Particularly relevant references in this regard are 7, 18. Our intentions in renewing discussion are to: 1) highlight the distinction between internal and external responsiveness; 2) clarify both the properties and interpretation of frequently used responsiveness statistics; 3) recommend the use of regression models to assess external responsiveness; and 4) provide directions for future research. Our illustrative example is drawn from the rheumatological literature, although the general principles we highlight apply to all disciplines in which responsiveness is important.
Section snippets
Notation
Here we define some notation that we will use subsequently to present the various responsiveness statistics. We assume research participants are assessed at two timepoints and let X1 and X2 denote their responses on the measure at the first and second assessments respectively. We let Dx = X2 − X1 represent the change in the response on the measure over time, with positive (negative) values for Dx representing increase (decrease) in the response over time. We let the expected mean change between
Internal responsiveness
The most frequently used responsiveness statistics fall into this group.
Receiver operating characteristic method
Deyo and Centro [17] were among the first to propose the assessment of responsiveness using receiver operating characteristic curves (ROCs) in rheumatology. In this context responsiveness is described in terms of sensitivity (probability of the measure correctly classifying patients who demonstrate change on an external criterion of clinical change) and specificity (probability of the measure correctly classifying patients who do not demonstrate change on the external criterion) 17, 18, 21. In
Responsiveness in psoriatic arthritis
The data originated from the University of Toronto psoriatic arthritis out-patient clinic [37]. Between 1994 and 1996, 70 patients (27 women and 43 men) completed three health status measures—the HAQ [38], AIMS2 [39], and SF-36 [40]—on two occasions, approximately 12–18 months apart [41].
Here we compute responsiveness statistics for the physical functioning dimension of the HAQ, the AIMS2, and the SF-36 in this sample of 70 patients. For the external responsiveness statistics, a health
Discussion
Further discussion of responsiveness is warranted. We have attempted to provide a structured framework within which such discussion can take place. Here we offer some preliminary thoughts based on our review of the literature.
The distinction between internal and external responsiveness is important. Stucki et al. [18] make a distinction in their work by referring to internal responsiveness simply as responsiveness and external responsiveness as discriminative ability. Kirshner and Guyatt [42]
Acknowledgements
Supported by the Medical Research Council of Canada. The authors would like to thank Dr. Gordon Guyatt and an anonymous referee for helpful referee.
References (43)
- et al.
Measuring change over timeassessing the usefulness of evaluative instruments
J Chron Dis
(1987) - et al.
Methodological problems in the retrospective computation of responsiveness to changethe lesson of Cronbach
J Clin Epidemiol
(1997) - et al.
Assessing the responsiveness of functional scales to clinical changean analogy to diagnostic test performance
J Chron Dis
(1986) - et al.
Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis
J Clin Epidemiol
(1995) - et al.
A comparison of different indices of responsiveness
J Clin Epidemiol
(1997) - et al.
Evaluating changes in health statusreliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders
J Clin Epidemiol
(1997) - et al.
Measurement of health status. Ascertaining the minimal clinically important difference
Controlled Clin Trials
(1989) - et al.
Methodological framework for assessing health indices
J Chron Dis
(1985) - et al.
Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation
Controlled Clin Trial
(1991) - et al.
Responsiveness and validity in health status measurementa clarification
J Clin Epidemiol
(1989)