Original article
Feasibility of Natural Language Processing–Assisted Auditing of Critical Findings in Chest Radiology

https://doi.org/10.1016/j.jacr.2019.05.038Get rights and content

Abstract

Objective

Time-sensitive communication of critical imaging findings like pneumothorax or pulmonary embolism to referring physicians is essential for patient safety. The definitive communication is the radiology free-text report. Quality assurance initiatives require that institutions audit these communications, a time-intensive manual task. We propose using a rule-based natural language processing system to improve the process for auditing critical findings communications.

Methods

We present a pilot assessment of the feasibility of using an automated critical finding identification system to assist quality assurance teams’ evaluation of critical findings communication compliance. Our assessment is based on chest imaging reports. Critical findings are identified in radiology reports using pyConTextNLP, an open source Python implementation of the ConText algorithm.

Results

In our test set, there were 75 reports with critical findings and 591 reports without critical findings. pyConTextNLP correctly identified 69 of the positive cases with 8 false-positives for a sensitivity of 0.92 and a specificity of 0.99.

Discussion

Natural language processing can provide valuable assistance to auditing critical findings communications.

Introduction

The communication of critical imaging findings from the radiologist to the referring provider is a key factor in providing efficacious patient care [1]. Critical findings are those that may result in death or severe morbidity and require urgent or emergent attention [2]. The Joint Commission national patient safety guidelines for laboratory accreditation include a rule for the reporting of critical results of tests and diagnostic procedures (NPSG.02.03.01). This rule states that those results that fall significantly outside the normal range and may indicate a life-threatening situation will be provided to the responsible licensed caregiver within a time frame such that the patient can receive the necessary treatment promptly [3]. Because the rule language includes diagnostic procedures, many systems extend the reporting system and communication expectations to diagnostic imaging [2].

In most radiology practices, critical findings communications are achieved through a verbal communication at the time of image interpretation. Documenting the communication process and keeping track of the communications and their timeliness are requirements of the Joint Commission rule. This documentation is typically a human resource–intensive and manual process. Reporting of compliance requires some type of system that audits the report repository. The aim of this study is to demonstrate the efficacy of a natural language processing (NLP) system that can automatically extract critical findings from radiology reports to support the tracking and reporting of these patient safety efforts.

In this study, we use pyConTextNLP (https://pypi.python.org/pypi/pyConTextNLP) to develop a rule-based NLP system [4]. We believed pyConTextNLP would work well because we had a limited number of annotated reports and could exploit the knowledge contained in the pyConTextNLP rules, particularly because many rules were developed for a similar critical findings task at another institution [5]. We adapted the NLP algorithm to automatically identify critical findings and then to assess the feasibility of deploying the system for auditing critical findings related to cardiothoracic disease processes.

Section snippets

Data Set

In this Institutional Review Board–approved study, we obtained all radiology reports generated in a 527-bed tertiary referral care academic medical hospital and affiliated outpatient care centers in the United States from October to December 2013, resulting in an initial convenience sample repository of 64,787 reports. We excluded examinations that were not specifically diagnostic imaging, reports generated from services other than radiology (eg, interventional radiology and obstetrics,

Results

In our evaluation set, 75 of 851 (8.8%) reports were annotated with critical findings. The distribution of these findings are shown in Table 1. We detected 77 findings requiring critical communication from 5 of our 18 categories. The most prevalent flagged findings were pneumothorax and pneumonia.

The report-level results of the pyConTextNLP algorithm are summarized in Table 2. From Table 2 we compute a sensitivity of 0.92 and a specificity of 0.99. Of the six false-negatives, two were

Discussion

Our primary goal was to assess how well an NLP system could be used to assist the retrospective auditing of critical findings communication. For this task, sensitivity is of primary importance, but given the low prevalence of critical findings, it is important to avoid alarm fatigue. The evaluation data set found a prevalence of 8.8% of critical findings in the chest. A keyword search of the entire report repository found a prevalence of critical findings of 27%, significantly higher than that

Conclusion

pyConTextNLP employs a fairly simple linguistic model that is easily extendable. The knowledge bases we employed in this study were primarily developed at different institutions. Nonetheless, the knowledge seems to have transferred well to the new institution, and despite its simplicity, pyConTextNLP performed remarkably well, suggesting that these kinds of NLP tools are easily extensible for the purpose of tracking and reporting health care quality assurance metrics.

Take-Home Points

  • Retrospective auditing of critical findings reporting is a human labor– and time-intensive process that could produce a higher yield of information when NLP tools are deployed.

  • pyConTextNLP employs a fairly simple linguistic model that is easily extendable for the purpose of tracking and reporting health care quality assurance metrics.

  • Preliminary work with thoracic critical findings showed high sensitivity and specificity of the NLP tool for detection of critical findings from radiology reports.

References (21)

There are more references available in the full text version of this article.

Cited by (20)

  • Automated extraction of incidental adrenal nodules from electronic health records

    2023, Surgery (United States)
    Citation Excerpt :

    With the implementation of the electronic health record (EHR) in modern health systems, automated surveillance and alerts have made it easier and less time-intensive to identify results that require further intervention. Natural language processing (NLP) has previously been explored for the improved surveillance of postoperative complications and surgical diagnoses within the EHR.20–24 To our knowledge, no prior studies have used advanced data analytic tools to aid in the identification and management of adrenal incidentalomas.

  • Use of Natural Language Processing (NLP) in Evaluation of Radiology Reports: An Update on Applications and Technology Advances

    2022, Seminars in Ultrasound, CT and MRI
    Citation Excerpt :

    Regulatory requirements dictate that radiology departments notify referring clinicians of a defined set of critical findings when identified on imaging studies and document such communications. Previous studies have shown that NLP programs can be designed to define the rate of critical findings communication.10,11 Others have shown that NLP programs can achieve a sensitivity of 98.9% and specificity of 94.9% for detection of clinically important findings.12

  • Basic Artificial Intelligence Techniques: Natural Language Processing of Radiology Reports

    2021, Radiologic Clinics of North America
    Citation Excerpt :

    Neural networks trained using gradient descent are one example of a statistical NLP approach.24 For instance, one could train an algorithm to decide whether a radiology report contains findings that require immediate clinical action.25 Under a supervised learning paradigm, the radiology reports could be annotated at the document level, where “1” indicates a finding requiring action and “0” does not, and then the system uses these data to create an algorithm for annotating the unseen reports accordingly.

  • Natural language processing for the surveillance of postoperative venous thromboembolism

    2021, Surgery (United States)
    Citation Excerpt :

    Given the probability-based results, the classification of statistical NLP sometimes can be challenging to interpret, especially when the underlying logic is sophisticated (eg, the logic used in the ACS NSQIP program). Our group and others have used NLP to extract clinical data related to several postoperative complications, including surgical site infections,12 VTE,13–15 pneumonia,16 and urinary tract infections.17 However, the majority of these studies focus on the mention-level or document-level extraction of concepts, failing to integrate information across a patient's medical record.18

View all citing articles on Scopus

This work was partly funded by the Department of Veteran Affairs (CRE 12-312) and University of Utah Healthcare System Hospital Project funds. The authors had full access to all of the data in this study and take complete responsibility for the integrity of the data and the accuracy of the data analysis. The authors state that they have no conflict of interest related to the material discussed in this article.

1

Marta E. Heilbrun, MD, Brian E. Chapman, PhD, are co-first authors.

View full text