Narrative review
Commentary: Statistical significance and clinical significance - A call to consider patient reported outcome measures, effect size, confidence interval and minimal clinically important difference (MCID)

https://doi.org/10.1016/j.jbmt.2019.02.009Get rights and content

Abstract

In healthcare research an intervention may be statistically significant based on quantitative analysis; however, simultaneously it may be relatively insignificant to the health or quality of life of patients affected by a particular condition or disease being treated by the intervention – thus may be interpreted as having low clinical significance. An understanding of statistics is fundamental for evidence informed healthcare. Patient-reported outcome measures (PROMs) direct patients to evaluate aspects of their own health, including quality of life, disability and function. Data obtained from PROMs can be used to demonstrate the impact of healthcare interventions on these elements of a person's quality of life. To interpret outcome measure data for clinical decision making, a clinician must understand the concepts of statistical significance and clinical significance. This commentary explores the concepts of patient reported outcome measures (PROMs), their statistical and clinical significance, and explores their relationship with a practical example for the clinician. Limitations of research that only reports p-values and the need to consider effect size, confidence intervals, and minimal clinically important difference are also discussed. Together, these concepts can assist the clinician to evaluate whether an intervention may be suitable for their clinical practice.

Introduction

When reviewing the statistics of research and its interpreted outcomes, a clinician must consider three questions: 1. How reliable are the results? 2. Are the results due to chance? and, 3. Do the results matter to a patient? To answer these questions, an understanding of patient reported outcome measures (PROMs), statistical and clinical significance is needed.

Patient-report outcome measures (PROMs) are used in clinical practice to provide a somewhat objective measurement of patient progress with respect to their management (Cella et al., 2010; Fleischmann and Vaughan, 2018). These measures are particularly valuable for demonstrating improvements in pain levels (Turk et al., 2006), activities of daily living (Cella et al., 2010) and functional activities, to both the patient and third-party payers (e.g. worker's compensation) (Blyth et al., 2003). However, Chiarotto et al. (2016) argue that PROMs lack content validity when measuring physical functioning in low back pain, and lack structural validity as this concept has received limited attention in the literature (Chiarotto et al., 2016, 2018). Notwithstanding, data obtained from PROMs can be used to demonstrate how healthcare interventions may affect various aspects of a person's quality of life and serve as a mechanism to monitor treatment effectiveness (Roach, 2006). Developing an understanding of how best to evaluate PROMs requires an understanding of the concepts of statistical significance and clinical significance to inform clinical decision-making. The purpose of the current commentary is first to explore the basic statistical concepts that can be applied to demonstrate these changes in PROM scores, and then highlights the importance of clinical significance in clinical practice.

Section snippets

Statistical significance

In most quantitative experiments, researchers investigate if there is a difference between intervention groups by performing statistical tests and reporting an associated p-value. Box 1 provides an example of how this approach may be applied in a research design.

Once researchers have performed statistical analysis (from Box 1 example) they compare the between group results (p-value) to an a-priori alpha level, a parameter designed to determine how likely the result was due to random chance. At

Effect size

Absolute effect size provides an indication of the magnitude of the difference between the averages or mean, between the two interventions in the example described previously (Ferguson, 2009; Nakagawa and Cuthill, 2007). Used to quantify the difference between two groups, it has an advantage over the use of tests of statistical significance alone as it places the emphasis on the size of the difference (Nakagawa and Cuthill, 2007). Multiple authors posit that effect sizes should be reported

Conclusion

This commentary has argued that reporting p-values is insufficient in both the research and practice contexts as it provides clinicians with little clinically useful information to measure change in their patients. It is most clinically relevant if data are collected through the use of patient reported outcome measures (PROM), then effect size and MCID are reported so that clinicians can make evidence informed decisions for patient centred care. Clinicians need to consider the statistical

Statement of competing interests

The authors state that there are no competing interests.

References (44)

  • R. Jaeschke et al.

    Measurement of health status: ascertaining the minimal clinically important difference

    Contr. Clin. Trials

    (1989)
  • D.A. Redelmeier et al.

    Assessing the minimal important difference in symptoms: a comparison of two techniques

    J. Clin. Epidemiol.

    (1996)
  • J.J. Swigris et al.

    The SF-36 and SGRQ: validity and first look at minimum important differences in IPF

    Respir. Med.

    (2010)
  • D.C. Turk et al.

    Developing patient-reported outcome measures for pain clinical trials: impact recommendations

    Pain

    (2006)
  • K.J. Yost et al.

    Minimally important differences were estimated for six Patient-Reported Outcomes Measurement Information System-Cancer scales in advanced-stage cancer patients

    J. Clin. Epidemiol.

    (2011)
  • A. Ahlbom

    Significance testing: why does it prevail?

    Eur. J. Epidemiol.

    (2017)
  • D.R. Anderson et al.

    Avoiding pitfalls when using information-theoretic methods

    J. Wildl. Manag.

    (2002)
  • B. Barrett et al.

    Sufficiently important difference: expanding the framework of clinical significance

    Med. Decis. Making

    (2005)
  • D.E. Beaton et al.

    Many faces of the minimal clinically important difference (MCID): a literature review and directions for future research

    Curr. Opin. Rheumatol.

    (2002)
  • L.E. Brahman

    Confidence intervals assess both clinical significance and statistical significance

    Ann. Intern. Med.

    (1991)
  • J.D. Childs et al.

    Neck pain: clinical practice guidelines linked to the international classification of functioning, disability, and health from the orthopaedic section of the American physical therapy association

    J. Orthop. Sports Phys. Ther.

    (2008)
  • C.E. Cook

    Clinimetrics corner: the minimal clinically important change score (MCID): a necessary pretense

    J. Man. Manip. Ther.

    (2008)
  • Cited by (22)

    • Neurocognitive functioning following lung cancer treatment: The PRO-Long Study

      2022, Technical Innovations and Patient Support in Radiation Oncology
      Citation Excerpt :

      However, research on standards of meaningful clinical important cognitive impairment, developed cooperatively with patients, is currently lacking.[10] Whereas statistical significance refers to the reliability of the data, MCID indicates the smallest change in outcome that is meaningful to the patient [34]. MCIDs facilitate the interpretation of clinical relevance of score changes over time.

    • Using anchor-based methods to determine the smallest effect size of interest

      2021, Journal of Experimental Social Psychology
      Citation Excerpt :

      However, psychologists are more commonly interested in the subjective experiences of people directly, and not in observers' perceptions. A more patient-centered anchor-based approach to estimate the minimal important difference also uses a global judgment about whether patients have improved or worsened, but asks the patients themselves to provide a subjective global rating of change, which is then used as an anchor (Cuijpers, Turner, Koole, Van Dijke, & Smit, 2014; Devji et al., 2020; Dworkin et al., 2008; Ebrahim et al., 2017; Fleischmann & Vaughan, 2019; Guyatt et al., 2002; Jaeschke, Singer, & Guyatt, 1989; King, 2011; Lydick & Epstein, 1993). In this approach, the construct of interest is also measured at two time-points (T1 and T2), for example before and after an intervention or manipulation.

    • The ICD-11 and opportunities for the osteopathy profession

      2020, International Journal of Osteopathic Medicine
    View all citing articles on Scopus
    View full text