Elsevier

Journal of Electrocardiology

Volume 57, Supplement, November–December 2019, Pages S75-S78
Journal of Electrocardiology

Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study

https://doi.org/10.1016/j.jelectrocard.2019.09.008Get rights and content

Abstract

Digital electrocardiographs are now widely available and a large number of digital electrocardiograms (ECGs) have been recorded and stored. The present study describes the development and clinical applications of a large database of such digital ECGs, namely the CODE (Clinical Outcomes in Digital Electrocardiology) study.

ECGs obtained by the Telehealth Network of Minas Gerais, Brazil, from 2010 to 17, were organized in a structured database. A hierarchical free-text machine learning algorithm recognized specific ECG diagnoses from cardiologist reports. The Glasgow ECG Analysis Program provided Minnesota Codes and automatic diagnostic statements. The presence of a specific ECG abnormality was considered when both automatic and medical diagnosis were concordant; cases of discordance were decided using heuristisc rules and manual review. The ECG database was linked to the national mortality information system using probabilistic linkage methods.

From 2,470,424 ECGs, 1,773,689 patients were identified. After excluding the ECGs with technical problems and patients <16 years-old, 1,558,415 patients were studied. High performance measures were obtained using an end-to-end deep neural network trained to detect 6 types of ECG abnormalities, with F1 scores >80% and specificity >99% in an independent test dataset. We also evaluated the risk of mortality associated with the presence of atrial fibrillation (AF), which showed that AF was a strong predictor of cardiovascular mortality and mortality for all causes, with increased risk in women.

In conclusion, a large database that comprises all ECGs performed by a large telehealth network can be useful for further developments in the field of digital electrocardiography, clinical cardiology and cardiovascular epidemiology.

Introduction

Cardiovascular (CV) diseases are the leading cause of death worldwide and, in 2015, caused 18 million deaths worldwide [1]. The electrocardiogram (ECG) is an important diagnostic tool for this group of diseases, as well as an ancillary method for many others, with established value in the diagnosis, prognosis and therapeutic monitoring of several CV diseases. Most of the knowledge on the value of the ECG has been obtained by clinical observations, correlations of ECG findings with abnormalities observed in imaging or pathological studies, or derived from cohort studies. The availability of digital ECGs, in the last few decades, has permitted the development of digital ECG databases [2], that have been used for several purposes, such as to evaluate the prognosis of ECG abnormalities in communities and specific populations, to study genetic determinants of arrhythmias and ECG abnormalities and to determine the natural history of diseases. Databases have also been developed for serving as a reference for electrocardiographic computer measurement and diagnostic programs [3] and to develop new methods or algorithms for ECG analysis [4].

Most of these databases do not have the amount of data which is now available, with many of them being developed in the dawn of the digital era. Digital ECG machines have now become widely available and large number of exams are being stored in hospital and health services in different countries and are often linked to electronic health records or administrative databases. This huge amount of data - big data, analyzed by methods recently developed in the machine learning and data mining fields, may allow the recognition of hidden patterns that were not detected in the past by traditional statistical methods. This may serve for the development of new analytical tools, opening up a world of new possibilities [5]. We hypothesize that a large, annotated, database of digital ECGs, obtained in the community and linked with hospitalizations and death obtained from health or vital records will constitute an electronic cohort able to provide clinically useful prognostic information, as well as a better classification method for standard 12-lead ECG. The aim of the CODE study is to develop such a dataset and conduct studies on prognosis and classification of electrocardiograms. The present report i) describes the development of a large database of digital ECGs linked to mortality data, ii) shows initial results and iii) discusses its challenges and potential applications.

Section snippets

History of development

In 2005, with the support of the regional research agency of the state of Minas Gerais (FAPEMIG), a project to study the feasibility and cost-benefit of developing a tele-ECG service was implemented in 82 towns of this state, in Southeast Brazil [6,7]. A digital ECG machine able to send ECG tracings to a central hub was provided to primary care facilities in those small towns. Cardiologists from 5 University hospitals in different parts of the state provided the ECG report which was sent back

Description

All 12 lead ECGs analyzed in this study were obtained by the TNMG, using a Web application built on the Java programming language [6,8]. ECGs were recorded using an electrocardiograph manufactured by Tecnologia Eletrônica Brasileira (São Paulo, Brazil) – model TEB ECGPC - or Micromed Biotecnologia (Brasilia, Brazil) - model ErgoPC 13, from 2010 to 2017. Tracings obtained by these ECG machines were sent to central servers by internet, using the web application developed in-house. The duration of

Clinical role

From a dataset of 2,470,424 ECGs, 1,773,689 patients were identified. After excluding the ECGs with technical problems and patients under 16 years old, a total of 1,558,415 patients were included for analyses. The mean age was 51.6 [SD17.6] years with 40.2% male (Table 2). The overall mortality rate was 3.34% in a mean follow-up of 3.7 years. The resultant dataset has several potential applications, both for technical and clinical-epidemiological studies. Two studies already presented in

Example 1: training of deep neural networks for Automatic ECG diagnosis

There is a lot of excitement about how machine learning, and more specifically, deep neural networks (DNNs) might improve health care and clinical practice [5,13]. DNN models can benefit from having large datasets and produce high accuracy models [13]. This has allowed these models to achieve striking success in tasks such as image classification [14] and speech recognition [15]. Our dataset has been used to train a DNN to automatically detect 6 types of ECG abnormalities - right and left

Example 2: evaluation of the prognosis of atrial fibrillation

By using the large cohort of the CODE study, the risk of mortality in men and women with AF was evaluated in a preliminary report [20]. Only the first ECG of each patient was considered. Patients under 16 years were excluded. Hazard ratios (HR) for mortality were adjusted for demographic and self-reported clinical factors and estimated with Cox regression. AF was an independent risk factor for all-cause mortality (HR 2.10, 95%CI 2.03–2.17) and cardiovascular mortality (HR 2.03, 95%CI

Data storage and database structural changes over time

Although the TNMG has been in service since 2006, it was only possible to recover tracings obtained after 2010. ECGs acquired before 2010 were stored in proprietary format (.EWC) or as images (.JPG) by the ECG system used at that time. The analysis of these older tracings would be very useful to evaluate the long-term prognostic meaning ECG abnormalities observed in the first ECG.

Noise and absence of signal

Approximately 2.5% of the exams had low quality ECG signals and were classified as unsatisfactory for medical

Future work

The current dataset opens up several possibilities for future work. We are currently proceeding on an extended version of the CODE dataset, to include patients with ECGs recorded from 2006 to 2010 and after 2017, as well as on further linkage to the hospitalization database of the public health system (Sistema de Informações Hospitalares – SIM). It would allow not only the prediction of the risk of death, but also of relevant medical procedures, such as pacemaker implantation and cardiac

Conclusion

Electrocardiography is now well over a 100-year-old method, with an established role in the care of patients with documented or suspected cardiovascular diseases. The availability of large databases, linked to other clinical and vital information, as well as new methods of analysis can further increase our knowledge in the role of electrocardiography in clinical practice and open new applications of its use. Thus, the CODE dataset, which is a large database that comprises all ECGs performed by

Acknowledgments

This research was partly supported by the Brazilian Agencies CNPq, CAPES, and FAPEMIG, is part of by projects IATS (Instituto de Avaliação de Tecnologias em Saúde) and ATMOSPHERE (Adaptive, Trustworthy, Manageable, Orchestrated, Secure, Privacy-assuring Hybrid, Ecosystem for Resilient Cloud Computing). We also thank NVIDIA for awarding our project with a Titan V GPU. ALR is recipient of an unrestricted research scholarships from CNPq; AHR receives a Split-PhD scholarship from CNPq; MHR receives

References (22)

  • Lazy associative classification

  • Cited by (38)

    • Digital biomarkers and algorithms for detection of atrial fibrillation using surface electrocardiograms: A systematic review: Digital Biomarkers for AF in Surface ECGs

      2021, Computers in Biology and Medicine
      Citation Excerpt :

      Whilst these features are easy to interpret, more complex features can be used to further analyze and describe the ECG signal. NNs do not necessarily require preprocessing to extract features, since this method also allows raw ECG input. [13–26] In total, 131 feature sets were described in 130 studies, where one study trained and validated a classifier with two different feature sets.

    View all citing articles on Scopus
    View full text