Original Articles and Reviews

Assessing Test-Retest Reliability of Psychological Measures

Persistent Methodological Problems

Clinical Epidemiology, Nutrition and Biostatistics Section, UCL Great Ormond Street Institute of Child Health, London, UK

Search for more papers by this author

Terence M. Dovey

Department of Psychology, Brunel University, Uxbridge, UK

Search for more papers by this author

, and

Angie Wade

Clinical Epidemiology, Nutrition and Biostatistics Section, UCL Great Ormond Street Institute of Child Health, London, UK

Search for more papers by this author

Published Online:November 29, 2017https://doi.org/10.1027/1016-9040/a000298

Abstract

Abstract. Psychological research and clinical practice rely heavily on psychometric testing for measuring psychological constructs that represent symptoms of psychopathology, individual difference characteristics, or cognitive profiles. Test-retest reliability assessment is crucial in the development of psychometric tools, helping to ensure that measurement variation is due to replicable differences between people regardless of time, target behavior, or user profile. While psychological studies testing the reliability of measurement tools are pervasive in the literature, many still discuss and assess this form of reliability inappropriately with regard to the specified aims of the study or the intended use of the tool. The current paper outlines important factors to consider in test-retest reliability analyses, common errors, and some initial methods for conducting and reporting reliability analyses to avoid such errors. The paper aims to highlight a persistently problematic area in psychological assessment, to illustrate the real-world impact that these problems can have on measurement validity, and to offer relatively simple methods for improving the validity and practical use of reliability statistics.

References

Altman, D. G. & Bland, J. M. (1983). Measurement in medicine – The analysis of method comparison studies. Statistician, 32, 307–317. doi: 10.2307/2987937 First citation in article Crossref, Google Scholar
Altman, D. G. & Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. British Medical Journal, 311, 485. doi: 10.1136/bmj.311.7003.485 First citation in article Crossref, Google Scholar
Angold, A. & Costello, E. J. (1995). A test-retest reliability study of child-reported psychiatric-symptoms and diagnoses using the Child and Adolescent Psychiatric-Assessment (CAPA-C). Psychological Medicine, 25, 755–762. doi: 10.1017/S0033291700034991 First citation in article Crossref, Google Scholar
Atkinson, G. & Nevill, A. M. (1998). Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Medicine, 26, 217–238. doi: 10.2165/00007256-199826040-00002 First citation in article Crossref, Google Scholar
Baumgartner, T. A. (2000). Estimating the stability reliability of a score. Measurement in Physical Education and Exercise Science, 4, 175–178. doi: 10.1207/S15327841Mpee0403_3 First citation in article Crossref, Google Scholar
Bedard, M., Martin, N. J., Krueger, P. & Brazil, K. (2000). Assessing reproducibility of data obtained with instruments based on continuous measurements. Experimental Aging Research, 26, 353–365. doi: 10.1080/036107300750015741 First citation in article Crossref, Google Scholar
Bennett, R. J. & Robinson, S. L. (2000). Development of a measure of workplace deviance. The Journal of Applied Psychology, 85, 349–360. doi: 10.1037/0021-9010.85.3.349 First citation in article Crossref, Google Scholar
Bland, J. M. & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1, 307–310. doi: 10.1016/S0140-6736(86)90837-8 First citation in article Crossref, Google Scholar
Bland, J. M. & Altman, D. G. (1990). A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Computers in Biology and Medicine, 20, 337–340. doi: 10.1016/0010-4825(90)90013-F First citation in article Crossref, Google Scholar
Bland, J. M. & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8, 135–160. doi: 10.1191/096228099673819272 First citation in article Crossref, Google Scholar
Chmielewski, M. & Watson, D. (2009). What is being assessed and why it matters: The impact of transient error on trait research. Journal of Personality and Social Psychology, 97, 186–202. doi: 10.1037/a0015618 First citation in article Crossref, Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. doi: 10.1177/001316446002000104 First citation in article Crossref, Google Scholar
Egger, H. L., Erkanli, A., Keeler, G., Potts, E., Walter, B. K. & Angold, A. (2006). Test-retest reliability of the Preschool Age Psychiatric Assessment (PAPA). Journal of the American Academy of Child and Adolescent Psychiatry, 45, 538–549. doi: 10.1097/01.chi.0000205705.71194.b8 First citation in article Crossref, Google Scholar
Garner, D. M., Olmstead, M. P. & Polivy, J. (1983). Development and validation of a multidimensional eating disorder inventory for anorexia-nervosa and bulimia. The International Journal of Eating Disorders, 2, 15–34. doi: 10.1002/1098-108x(198321)2:2<15::Aid-Eat2260020203>3.0.Co;2-6 First citation in article Crossref, Google Scholar
Goodman, R. (1997). The Strengths and Difficulties Questionnaire: A research note. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 38, 581–586. doi: 10.1111/j.1469-7610.1997.tb01545.x9255702 First citation in article Crossref, Google Scholar
Goodman, R. (2001). Psychometric properties of the Strengths and Difficulties Questionnaire. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 1337–1345. doi: 10.1097/00004583-200111000-00015 First citation in article Crossref, Google Scholar
Grant, B. F., Dawson, D. A., Stinson, F. S., Chou, P. S., Kay, W. & Pickering, R. (2003). The Alcohol Use Disorder and Associated Disabilities Interview Schedule-IV (AUDADIS-IV): Reliability of alcohol consumption, tobacco use, family history of depression and psychiatric diagnostic modules in a general population sample. Drug and Alcohol Dependence, 71, 7–16. doi: 10.1016/S0376-8716(03)00070-X First citation in article Crossref, Google Scholar
Hole, G. (2014). Eight things you need to know about interpreting correlations, Retrieved from http://www.sussex.ac.uk/Users/grahamh/RM1web/Eight%20things%20you%20need%20to%20know%20about%20interpreting%20correlations.pdf First citation in article Google Scholar
Kernot, J., Olds, T., Lewis, L. K. & Maher, C. (2015). Test-retest reliability of the English version of the Edinburgh Postnatal Depression Scale. Archives of Women’s Mental Health, 18, 255–257. doi: 10.1007/s00737-014-0461-425209355 First citation in article Crossref, Google Scholar
Krebs, D. E. (1986). Declare your ICC type. Physical Therapy, 66, 1431. doi: 10.1093/ptj/66.9.1431 First citation in article Crossref, Google Scholar
Lee, K. M., Lee, J., Chung, C. Y., Ahn, S., Sung, K. H., Kim, T. W., … Park, M. S. (2012). Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research. Clinics in Orthopedic Surgery, 4, 149–155. doi: 10.4055/cios.2012.4.2.149 First citation in article Crossref, Google Scholar
Ludbrook, J. (2002). Statistical techniques for comparing measurers and methods of measurement: A critical review. Clinical and Experimental Pharmacology and Physiology, 29, 527–536. doi: 10.1046/j.1440-1681.2002.03686.x First citation in article Crossref, Google Scholar
March, J. S., Sullivan, K. & Parker, J. (1999). Test-retest reliability of the multidimensional anxiety scale for children. Journal of Anxiety Disorders, 13, 349–358. doi: 10.1016/S0887-6185(99)00009-2 First citation in article Crossref, Google Scholar
Meyer, T. J., Miller, M. L., Metzger, R. L. & Borkovec, T. D. (1990). Development and validation of the Penn State Worry Questionnaire. Behaviour Research and Therapy, 28, 487–495. doi: 10.1016/0005-7967(90)90135-6 First citation in article Crossref, Google Scholar
Pliner, P. & Hobden, K. (1992). Development of a scale to measure the trait of food neophobia in humans. Appetite, 19, 105–120. doi: 10.1016/0195-6663(92)90014-W First citation in article Crossref, Google Scholar
Rust, J. & Golombok, S. (2009). Modern psychometrics: The science of psychological assessment (3rd ed.). New York, NY: Routledge/Taylor & Francis. First citation in article Google Scholar
Shrout, P. E. & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428. doi: 10.1037/0033-2909.86.2.420 First citation in article Crossref, Google Scholar
Silverman, W. K., Saavedra, L. M. & Pina, A. A. (2001). Test-retest reliability of anxiety symptoms and diagnoses with the anxiety disorders interview schedule for DSM-IV: Child and parent versions. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 937–944. doi: 10.1097/00004583-200108000-00016 First citation in article Crossref, Google Scholar
Steptoe, A., Pollard, T. M. & Wardle, J. (1995). Development of a measure of the motives underlying the selection of food: The food choice questionnaire. Appetite, 25, 267–284. doi: 10.1006/appe.1995.0061 First citation in article Crossref, Google Scholar
Streiner, D. L. (2007). A shortcut to rejection: How not to write the results section of a paper. Canadian Journal of Psychiatry – Revue Canadienne De Psychiatrie, 52, 385–389. doi: 10.1177/070674370705200608 First citation in article Crossref, Google Scholar
Streiner, D. L., Norman, G. R. & Cairney, J. (2014). Health measurement scales: A practical guide to their development and use. Oxford, UK: Oxford University Press. First citation in article Crossref, Google Scholar
Tighe, S. K., Ritchey, M., Schweizer, B., Goes, F. S., MacKinnon, D., Mondimore, F., … Potash, J. B. (2015). Test-retest reliability of a new questionnaire for the retrospective assessment of long-term lithium use in bipolar disorder. Journal of Affective Disorders, 174, 589–593. doi: 10.1016/j.jad.2014.11.021 First citation in article Crossref, Google Scholar
Viglione, D. J., Blume-Marcovici, A. C., Miller, H. L., Giromini, L. & Meyer, G. (2012). An inter-rater reliability study for the Rorschach performance assessment system. Journal of Personality Assessment, 94, 607–612. doi: 10.1080/00223891.2012.684118 First citation in article Crossref, Google Scholar
Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research, 19, 231–240. doi: 10.1519/15184.1 First citation in article Crossref, Google Scholar

Volume 22Issue 4October 2017

ISSN: 1016-9040eISSN: 1878-531X

History

ReceivedJuly 4, 2016
RevisedMay 15, 2017
AcceptedMay 29, 2017
Published onlineNovember 29, 2017

Licenses & Copyright

Keywords

Acknowledgments:

I confirm that this paper is my own work and the work of the coauthors, and neither it, nor any of the data contained in the paper has been submitted for publication, in any part, to another journal. All data belongs to the authors of the paper. All of the authors detailed above have contributed substantially to the conception, production, and revision of this paper.

PDF download

Verify Phone

Congrats!

Assessing Test-Retest Reliability of Psychological Measures

Persistent Methodological Problems

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Assessing Test-Retest Reliability of Psychological Measures

Persistent Methodological Problems

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners