CC BY 4.0 · ACI open 2021; 05(02): e94-e103
DOI: 10.1055/s-0041-1735975
Special Section on Informatics Governance

A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data

Nandini Anantharama
1   Faculty of IT, Monash University, Clayton, Victoria, Australia
,
Wray Buntine
1   Faculty of IT, Monash University, Clayton, Victoria, Australia
,
Andrew Nunn
2   Victorian Spinal Cord Service, Austin Health, Heidelberg, Victoria, Australia
› Author Affiliations
Funding None.

Abstract

Background Secondary use of electronic health record's (EHR) data requires evaluation of data quality (DQ) for fitness of use. While multiple frameworks exist for quantifying DQ, there are no guidelines for the evaluation of DQ failures identified through such frameworks.

Objectives This study proposes a systematic approach to evaluate DQ failures through the understanding of data provenance to support exploratory modeling in machine learning.

Methods Our study is based on the EHR of spinal cord injury inpatients in a state spinal care center in Australia, admitted between 2011 and 2018 (inclusive), and aged over 17 years. DQ was measured in our prerequisite step of applying a DQ framework on the EHR data through rules that quantified DQ dimensions. DQ was measured as the percentage of values per field that meet the criteria or Krippendorff's α for agreement between variables. These failures were then assessed using semistructured interviews with purposively sampled domain experts.

Results The DQ of the fields in our dataset was measured to be from 0% adherent up to 100%. Understanding the data provenance of fields with DQ failures enabled us to ascertain if each DQ failure was fatal, recoverable, or not relevant to the field's inclusion in our study. We also identify the themes of data provenance from a DQ perspective as systems, processes, and actors.

Conclusion A systematic approach to understanding data provenance through the context of data generation helps in the reconciliation or repair of DQ failures and is a necessary step in the preparation of data for secondary use.

Protection of Human and Animal Subjects

The study was performed in compliance with the Austin Health Human Research Ethics Committee Ethical Approval (HREC). Ethics approval obtained from Austin on August 18, 2017, reference no LNR/17/Austin/408.


Authors' Contributions

N.A.: Study design, data collection and formatting, analyses and evaluation, interviews, manuscript preparation, tables, and figures. W.B.: Study design, intellectual input on statistical analyses and modeling, manuscript preparation, and review. A.N.: Study design, interviews, clinical interpretation and validation of results, manuscript preparation, and review.


Supplementary Material



Publication History

Received: 28 January 2021

Accepted: 13 August 2021

Article published online:
18 October 2021

© 2021. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Wei W-Q, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc 2016; 23 (e1): e20-e27
  • 2 Callahan A, Fries JA, Ré C. et al. Medical device surveillance with electronic health records. NPJ Digit Med 2019; 2: 94
  • 3 Hribar MR, Read-Brown S, Goldstein IH. et al. Secondary use of electronic health record data for clinical workflow analysis. J Am Med Inform Assoc 2018; 25 (01) 40-46
  • 4 Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health 2016; 37: 61-81
  • 5 Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care 2012; 50 (suppl): S21-S29
  • 6 Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20 (01) 144-151
  • 7 Kahn MG, Callahan TJ, Barnard J. et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC) 2016; 4 (01) 1244-1244
  • 8 Cabitza F, Batini C. Information quality in healthcare. In: Batini C, Scannapieco M. eds. Data and Information Quality: Dimensions, Principles and Techniques. 1st ed.. Switzerland: Springer International Publishing; 2016: 403-419
  • 9 Kahn MG, Brown JS, Chun AT. et al. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC) 2015; 3 (01) 1052-1052
  • 10 Weiskopf NG, Bakken S, Hripcsak G, Weng C. A data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC) 2017; 5 (01) 14-14
  • 11 Juran JM, Gryna FM. Quality Control Handbook. 4th ed.. New York, NY: McGraw-Hill; 1988
  • 12 Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. Null 1996; 12: 5-33
  • 13 Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit On Translat Bioinforma 2010; 2010: 1-5
  • 14 Bayley KB, Belnap T, Savitz L, Masica AL, Shah N, Fleming NS. Challenges in using electronic health record data for CER: experience of 4 learning organizations and solutions applied. Med Care 2013; 51 (8, suppl 3) S80-S86
  • 15 Hong CJ, Kaur MN, Farrokhyar F, Thoma A. Accuracy and completeness of electronic medical records obtained from referring physicians in a Hamilton, Ontario, plastic surgery practice: a prospective feasibility study. Plast Surg (Oakv) 2015; 23 (01) 48-50
  • 16 Baier AW, Snyder DJ, Leahy IC, Patak LS, Brustowicz RM. A shared opportunity for improving electronic medical record data. Anesth Analg 2017; 125 (03) 952-957
  • 17 Martin S, Wagner J, Lupulescu-Mann N. et al. Comparison of EHR-based diagnosis documentation locations to a gold standard for risk stratification in patients with multiple chronic conditions. Appl Clin Inform 2017; 8 (03) 794-809
  • 18 Adibuzzaman M, DeLaurentis P, Hill J, Benneyworth BD. Big data in healthcare - the promises, challenges and opportunities from a research perspective: A case study with a model database. AMIA Annu Symp Proc 2018; 2017: 384-392
  • 19 Cowie MR, Blomster JI, Curtis LH. et al. Electronic health records to facilitate clinical research. Clin Res Cardiol 2017; 106 (01) 1-9
  • 20 Raman SR, Curtis LH, Temple R. et al. Leveraging electronic health records for clinical research. Am Heart J 2018; 202: 13-19
  • 21 Bae CJ, Griffith S, Fan Y. et al. The challenges of data quality evaluation in a joint data warehouse. EGEMS (Wash DC) 2015; 3 (01) 1125-1125
  • 22 Cohen B, Vawdrey DK, Liu J. et al. Challenges associated with using large data sets for quality assessment and research in clinical settings. Policy Polit Nurs Pract 2015; 16 (3-4): 117-124
  • 23 Zozus MN, Kahn MG, Weiskopf NG. Data quality in clinical research. In: Richesson RL, Andrews JE. eds. Clinical Research Informatics. Switzerland: Springer International Publishing; 2019: 213-248
  • 24 Savitz ST, Savitz LA, Fleming NS, Shah ND, Go AS. How much can we trust electronic health record data?. Healthc (Amst) 2020; 8 (03) 100444
  • 25 Hausvik GI, Thapa D, Munkvold BE. Information quality life cycle in secondary use of EHR data. Int J Inf Manage 2021; 56: 102227
  • 26 Panozzo CA, Woodworth TS, Welch EC. et al. Early impact of the ICD-10-CM transition on selected health outcomes in 13 electronic health care databases in the United States. Pharmacoepidemiol Drug Saf 2018; 27 (08) 839-847
  • 27 Raebel MA, Haynes K, Woodworth TS. et al. Electronic clinical laboratory test results data tables: lessons from Mini-Sentinel. Pharmacoepidemiol Drug Saf 2014; 23 (06) 609-618
  • 28 Cholan RA, Weiskopf NG, Rhoton DL. et al. Specifications of clinical quality measures and value set vocabularies shift over time: a study of change through implementation differences. AMIA Annu Symp Proc 2018; 2017: 575-584
  • 29 Knight S. The combined conceptual life-cycle model of information quality: part 1, an investigative framework. International Journal of Information Quality 2011; 2: 205-230
  • 30 van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw 2011; 45 (03) 1-67
  • 31 Krippendorff K. Reliability in content analysis. Hum Commun Res 2004; 30: 411-433
  • 32 Teddlie C, Yu F. Mixed methods sampling: a typology with examples. J Mixed Methods Res 2007; 1: 77-100
  • 33 Eslami Andargoli A, Scheepers H, Rajendran D, Sohal A. Health information systems evaluation frameworks: a systematic review. Int J Med Inform 2017; 97: 195-209
  • 34 Braun V, Clarke V. Using thematic analysis in psychology. Null 2006; 3: 77-101
  • 35 Johnson KE, Kamineni A, Fuller S, Olmstead D, Wernli KJ. How the provenance of electronic health record data matters for research: a case example using system mapping. EGEMS (Wash DC) 2014; 2 (01) 1058
  • 36 Kohane IS, Aronow BJ, Avillach P. et al; Consortium For Clinical Characterization Of COVID-19 By EHR (4CE). What every reader should know about studies using electronic health record data but may be afraid to ask. J Med Internet Res 2021; 23 (03) e22219
  • 37 Haneuse S, Daniels M. A general framework for considering selection bias in EHR-based studies: what data are observed and why?. EGEMS (Wash DC) 2016; 4 (01) 1203-1203
  • 38 Verheij RA, Curcin V, Delaney BC, McGilchrist MM. Possible sources of bias in primary care electronic health record data use and reuse. J Med Internet Res 2018; 20 (05) e185
  • 39 Agniel D, Kohane IS, Weber GM. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 2018; 361: k1479
  • 40 Weiskopf NG, Rusanov A, Weng C. Sick patients have more data: the non-random completeness of electronic health records. AMIA Annu Symp Proc 2013; 2013: 1472-1477
  • 41 Berger ML, Curtis MD, Smith G, Harnett J, Abernethy AP. Opportunities and challenges in leveraging electronic health record data in oncology. Future Oncol 2016; 12 (10) 1261-1274
  • 42 Pivovarov R, Albers DJ, Sepulveda JL, Elhadad N. Identifying and mitigating biases in EHR laboratory tests. J Biomed Inform 2014; 51: 24-34