Original Research

Evaluating the reliability of gestalt quality ratings of medical education podcasts: A METRIQ study

Authors:

Jason M. Jason M. Woods
Teresa M. Chan
Damian Roland
Jeff Riddell
Andrew Tagg
Brent Brent Thoma
Email Brent Brent Thoma

Abstract

Introduction Podcasts are increasingly being used for medical education. Studies have found that the assessment of the quality of online resources can be challenging. We sought to determine the reliability of gestalt quality assessment of education podcasts in emergency medicine.

Methods An international, interprofessional sample of raters was recruited through social media, direct contact, and the extended personal network of the study team. Each participant listened to eight podcasts (selected to include a variety of accents, number of speakers, and topics) and rated the quality of that podcast on a seven-point Likert scale. Phi coefficients were calculated within each group and overall. Decision studies were conducted using a phi of 0.8.

Results A total of 240 collaborators completed all eight surveys and were included in the analysis. Attendings, medical students, and physician assistants had the lowest individual-level variance and thus the lowest number of required raters to reliably evaluate quality (phi >0.80). Overall, 20 raters were required to reliably evaluate the quality of emergency medicine podcasts.

Discussion Gestalt ratings of quality from approximately 20 health professionals are required to reliably assess the quality of a podcast. This finding should inform future work focused on developing and validating tools to support the evaluation of quality in these resources.

Keywords:

Podcast Gestalt Reliability FOAMEd

Year: 2020
Volume: 9 Issue: 5
Page/Article: 302-306
DOI: 10.1007/S40037-020-00589-X

Published on 3 Jun 2020

Peer Reviewed