Original Research
Classification of Emergency Department Chief Complaints Into 7 Syndromes: A Retrospective Analysis of 527,228 Patients

https://doi.org/10.1016/j.annemergmed.2005.04.012Get rights and content

Study objective

Electronic surveillance systems often monitor triage chief complaints in hopes of detecting an outbreak earlier than can be accomplished with traditional reporting methods. We measured the accuracy of a Bayesian chief complaint classifier called CoCo that assigns patients 1 of 7 syndromic categories (respiratory, botulinic, gastrointestinal, neurologic, rash, constitutional, or hemorrhagic) based on free-text triage chief complaints.

Methods

We compared CoCo's classifications with criterion syndromic classification based on International Classification of Diseases, Ninth Revision (ICD-9) discharge diagnoses. We assigned the criterion classification to a patient based on whether the patient's primary diagnosis was a member of a set of ICD-9 codes associated with CoCo's 7 syndromes. We tested CoCo's performance on a set of 527,228 chief complaints from patients registered at the University of Pittsburgh Medical Center emergency department (ED) between 1990 and 2003. We performed a sensitivity analysis by varying the ICD-9 codes in the criterion standard. We also tested CoCo on chief complaints from EDs in a second location (Utah).

Results

Approximately 16% (85,569/527,228) of the patients were classified according to the criterion standard into 1 of the 7 syndromes. CoCo's classification performance (number of cases by criterion standard, sensitivity [95% confidence interval (CI)], and specificity [95% CI]) was respiratory (34,916, 63.1 [62.6 to 63.6], 94.3 [94.3 to 94.4]); botulinic (1,961, 30.1 [28.2 to 32.2], 99.3 [99.3 to 99.3]); gastrointestinal (20,431, 69.0 [68.4 to 69.6], 95.6 [95.6 to 95.7]); neurologic (7,393, 67.6 [66.6 to 68.7], 92.7 [92.6 to 92.8]); rash (2,232, 46.8 [44.8 to 48.9], 99.3 [99.3 to 99.3]); constitutional (10,603, 45.8 [44.9 to 46.8], 96.6 [96.6 to 96.7]); and hemorrhagic (8,033, 75.2 [74.3 to 76.2], 98.5 [98.4 to 98.5]). The sensitivity analysis showed that the results were not affected by the choice of ICD-9 codes in the criterion standard. Classification accuracy did not differ on chief complaints from the second location.

Conclusion

Our results suggest that, for most syndromes, our chief complaint classification system can identify about half of the patients with relevant syndromic presentations, with specificities higher than 90% and positive predictive values ranging from 12% to 44%.

Introduction

Since 1999, electronic syndromic surveillance systems have been deployed across the country.1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 Emergency department (ED) data are the foundation of many syndromic surveillance systems, and researchers have shown that common outbreaks can be detected 1 to 2 weeks earlier with ED data than through conventional disease reporting methods.14 Earlier detection of outbreaks may save many lives.15 Some surveillance systems require manual classification of patients into relevant syndromes by triage nurses or emergency physicians,1, 2, 3 whereas others use preexisting electronic ED data4, 5 that typically include date of admission, sex, age, address, coded discharge diagnosis,6, 7, 8 and free-text triage chief complaint.9, 10, 11, 12, 13

Evaluating the ability of syndromic surveillance systems to detect outbreaks is difficult because outbreaks are rare, and those of potentially bioterroristic-induced diseases are virtually nonexistent. Successful outbreak detection from syndromic surveillance entails accurately identifying cases of concern and determining when the number of relevant cases has exceeded the number expected for a certain period or geographic region.11, 16 This article addresses the first point: syndromic case classification.

It is unclear what types of ED data are most useful for syndromic surveillance. Coded ED diagnoses are attractive because of the specificity of information but are not available at all hospitals or are only available several hours or days after admission. Free-text triage chief complaints have the advantage of being nearly ubiquitously available in the United States and are usually available electronically as soon as the patient is registered. However, to be useful, the chief complaints must first be classified into syndromic categories or into some other type of coded representation that can be manipulated by a computer.

In the Real-time Outbreak and Disease Surveillance system (RODS),10, 17 chief complaints are classified into syndromic categories by a naive Bayesian classifier called CoCo.18 CoCo assigns every patient a syndromic category based on the patient's chief complaint. The number of classifications in every syndromic category is monitored by time-series detection algorithms14, 16 and shown in graphic form on the RODS user interface. If the number of patients presenting with gastrointestinal complaints, for instance, exceeds the number expected, RODS sends an electronic alarm to a team of researchers and public health physicians. RODS is an open-source19 biosurveillance system, the development of which began in 1999. RODS collects ED registration data in real time, including age, sex, zip code, and triage chief complaint from more than 100 emergency care facilities in Pennsylvania, Utah, Ohio, and New Jersey.

In this study, we measured CoCo's accuracy at identifying individual cases of concern to public health for 7 early presentations of disease (syndromes): respiratory, gastrointestinal, neurologic, hemorrhagic, rash, constitutional, and botulinic. We measured the performance of syndromic case classification from free-text triage chief complaints in a single ED using primary International Classification of Diseases, Ninth Revision (ICD-9) discharge diagnoses as the criterion standard classification for 527,228 patients during a 13-year period. Our evaluation had 2 objectives: (1) determine how accurately CoCo classifies patients into syndromic categories and (2) determine whether CoCo can be applied to chief complaints from geographic locations different from the locality where the chief complaints in CoCo's training set were generated.

Section snippets

Study Design

This observational study examined the performance of a Bayesian classifier at categorizing patients into 1 of 7 syndromes based on triage chief complaints. The study used retrospective data collected throughout 13 years at a single ED.

Setting

The study was conducted on data collected from the University of Pittsburgh Medical Center (UPMC) Presbyterian Hospital ED from December 1990 to September 2003. The ED at the UPMC Presbyterian Hospital admits approximately 40,000 adult patients a year (48% women,

Results

Of 577,522 patients admitted during the study period, 527,228 were included in the study. We excluded approximately 19,000 patients because of missing chief complaints or discharge diagnoses and 31,000 because of an error in the computer script that retrieved only one third of the patients admitted in 1995.

Of the 527,228 patients in the study, 85,569 (16.2%) were classified into 1 of the 7 syndromes by criterion standard classification. The most frequent syndromic classification was respiratory

Limitations

The main limitation of our study was the use of ICD-9 codes for the criterion standard. The majority of misclassified cases were not due to CoCo's errors but to a mismatch between the patients' ICD-9 diagnoses and their chief complaints. Others have measured a lack of correlation between the syndrome implied by the chief complaints and ICD-9 discharge diagnoses.27, 28 Evidence suggests that ICD-9 discharge diagnoses are more accurate than chief complaints at predicting a patient's syndromic

Discussion

This paper presents a detailed evaluation of the ability to accurately classify patients into syndromic categories based on their chief complaints by testing on all ED admissions at UPMC Presbyterian Hospital during a 13-year period and by assessing performance on 7 syndromes, including syndromes that are rare and difficult to characterize. Approximately 16% of the patients in the study were classified into 1 of 7 syndromic classifications by the criterion standard ICD-9 discharge diagnosis.

References (39)

  • Dembek ZF, Myers M, Carley K, et al. Connecticut hospital admissions syndromic surveillance (HASS). Available at:...
  • Cochrane D, Allegra J, Rothman J. Real-time biosurveillance using an existing emergency department electronic medical...
  • R. Heffernan et al.

    Syndromic surveillance in public health practice, New York City

    Emerg Infect Dis

    (2004)
  • R. Lazarus et al.

    Use of automated ambulatory-care encounter records for detection of acute illness clusters, including potential bioterrorism events

    Emerg Infect Dis

    (2002)
  • R. Lazarus et al.

    Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection

    BMC Public Health

    (2001)
  • B.Y. Reis et al.

    Time series modeling for syndromic surveillance

    BMC Med Inform Decis Mak

    (2003)
  • Chapman WW, Wagner MM, Ivanov O, et al. Syndromic surveillance from free-text triage chief complaints. Available at:...
  • F.C. Tsui et al.

    Value of ICD-9 coded chief complaints for detection of epidemics

    Proc AMIA Symp

    (2001)
  • M.M. Wagner et al.

    The emerging science of very early detection of disease outbreaks

    J Public Health Manag Pract

    (2001)
  • Cited by (78)

    • Forecasting respiratory infectious outbreaks using ED-based syndromic surveillance for febrile ED visits in a Metropolitan City

      2019, American Journal of Emergency Medicine
      Citation Excerpt :

      Emergency Department (ED)-based syndromic surveillance systems have been introduced for faster and earlier detection of several disease categories. Using syndromic parameters derived from clinical data during ED stay instead of confirmed diagnosis or laboratory studies requiring time delay, ED based syndromic surveillance system have shown favorable accuracy and timeliness [7-12]. During the MERS Coronavirus (CoV) outbreak in Korea, the ED was the initial contact point of medical services, and overcrowded EDs were the main place to propagate MERS CoV and delay early detection of outbreak.

    • Comparison of machine learning classifiers for influenza detection from emergency department free-text reports

      2015, Journal of Biomedical Informatics
      Citation Excerpt :

      For this project, risk factors include the following conditions: smoking, drinking, illicit drug use, patterns in which an organ or location precedes the phrase “risk factors”, such as “cardiac risk factors”, “stroke risk factors”. Unlike conventional syndromic surveillance, which primarily uses ED CCs that are recorded by triage nurses [10,24], in this study we used ED reports that have been recorded by clinicians. We collected a total of 31,268 ED reports from four EDs in the University of Pittsburgh Medical Center (UPMC) Health System.

    View all citing articles on Scopus

    Supervising editor: Jonathan M. Teich, MD, PhD

    Author contributions: WWC, JND, and MMW conceived and designed the study. MMW obtained research funding. JND was the medical consultant who designed the criterion standard and performed the error analysis. WWC collected and analyzed the data. WWC performed the statistical analysis of the data with input from JND and MMW. WWC drafted the manuscript, and all authors contributed substantially to its revision. WWC takes responsibility for the paper as a whole.

    View full text