Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study
Introduction
Cardiovascular (CV) diseases are the leading cause of death worldwide and, in 2015, caused 18 million deaths worldwide [1]. The electrocardiogram (ECG) is an important diagnostic tool for this group of diseases, as well as an ancillary method for many others, with established value in the diagnosis, prognosis and therapeutic monitoring of several CV diseases. Most of the knowledge on the value of the ECG has been obtained by clinical observations, correlations of ECG findings with abnormalities observed in imaging or pathological studies, or derived from cohort studies. The availability of digital ECGs, in the last few decades, has permitted the development of digital ECG databases [2], that have been used for several purposes, such as to evaluate the prognosis of ECG abnormalities in communities and specific populations, to study genetic determinants of arrhythmias and ECG abnormalities and to determine the natural history of diseases. Databases have also been developed for serving as a reference for electrocardiographic computer measurement and diagnostic programs [3] and to develop new methods or algorithms for ECG analysis [4].
Most of these databases do not have the amount of data which is now available, with many of them being developed in the dawn of the digital era. Digital ECG machines have now become widely available and large number of exams are being stored in hospital and health services in different countries and are often linked to electronic health records or administrative databases. This huge amount of data - big data, analyzed by methods recently developed in the machine learning and data mining fields, may allow the recognition of hidden patterns that were not detected in the past by traditional statistical methods. This may serve for the development of new analytical tools, opening up a world of new possibilities [5]. We hypothesize that a large, annotated, database of digital ECGs, obtained in the community and linked with hospitalizations and death obtained from health or vital records will constitute an electronic cohort able to provide clinically useful prognostic information, as well as a better classification method for standard 12-lead ECG. The aim of the CODE study is to develop such a dataset and conduct studies on prognosis and classification of electrocardiograms. The present report i) describes the development of a large database of digital ECGs linked to mortality data, ii) shows initial results and iii) discusses its challenges and potential applications.
Section snippets
History of development
In 2005, with the support of the regional research agency of the state of Minas Gerais (FAPEMIG), a project to study the feasibility and cost-benefit of developing a tele-ECG service was implemented in 82 towns of this state, in Southeast Brazil [6,7]. A digital ECG machine able to send ECG tracings to a central hub was provided to primary care facilities in those small towns. Cardiologists from 5 University hospitals in different parts of the state provided the ECG report which was sent back
Description
All 12 lead ECGs analyzed in this study were obtained by the TNMG, using a Web application built on the Java programming language [6,8]. ECGs were recorded using an electrocardiograph manufactured by Tecnologia Eletrônica Brasileira (São Paulo, Brazil) – model TEB ECGPC - or Micromed Biotecnologia (Brasilia, Brazil) - model ErgoPC 13, from 2010 to 2017. Tracings obtained by these ECG machines were sent to central servers by internet, using the web application developed in-house. The duration of
Clinical role
From a dataset of 2,470,424 ECGs, 1,773,689 patients were identified. After excluding the ECGs with technical problems and patients under 16 years old, a total of 1,558,415 patients were included for analyses. The mean age was 51.6 [SD17.6] years with 40.2% male (Table 2). The overall mortality rate was 3.34% in a mean follow-up of 3.7 years. The resultant dataset has several potential applications, both for technical and clinical-epidemiological studies. Two studies already presented in
Example 1: training of deep neural networks for Automatic ECG diagnosis
There is a lot of excitement about how machine learning, and more specifically, deep neural networks (DNNs) might improve health care and clinical practice [5,13]. DNN models can benefit from having large datasets and produce high accuracy models [13]. This has allowed these models to achieve striking success in tasks such as image classification [14] and speech recognition [15]. Our dataset has been used to train a DNN to automatically detect 6 types of ECG abnormalities - right and left
Example 2: evaluation of the prognosis of atrial fibrillation
By using the large cohort of the CODE study, the risk of mortality in men and women with AF was evaluated in a preliminary report [20]. Only the first ECG of each patient was considered. Patients under 16 years were excluded. Hazard ratios (HR) for mortality were adjusted for demographic and self-reported clinical factors and estimated with Cox regression. AF was an independent risk factor for all-cause mortality (HR 2.10, 95%CI 2.03–2.17) and cardiovascular mortality (HR 2.03, 95%CI
Data storage and database structural changes over time
Although the TNMG has been in service since 2006, it was only possible to recover tracings obtained after 2010. ECGs acquired before 2010 were stored in proprietary format (.EWC) or as images (.JPG) by the ECG system used at that time. The analysis of these older tracings would be very useful to evaluate the long-term prognostic meaning ECG abnormalities observed in the first ECG.
Noise and absence of signal
Approximately 2.5% of the exams had low quality ECG signals and were classified as unsatisfactory for medical
Future work
The current dataset opens up several possibilities for future work. We are currently proceeding on an extended version of the CODE dataset, to include patients with ECGs recorded from 2006 to 2010 and after 2017, as well as on further linkage to the hospitalization database of the public health system (Sistema de Informações Hospitalares – SIM). It would allow not only the prediction of the risk of death, but also of relevant medical procedures, such as pacemaker implantation and cardiac
Conclusion
Electrocardiography is now well over a 100-year-old method, with an established role in the care of patients with documented or suspected cardiovascular diseases. The availability of large databases, linked to other clinical and vital information, as well as new methods of analysis can further increase our knowledge in the role of electrocardiography in clinical practice and open new applications of its use. Thus, the CODE dataset, which is a large database that comprises all ECGs performed by
Acknowledgments
This research was partly supported by the Brazilian Agencies CNPq, CAPES, and FAPEMIG, is part of by projects IATS (Instituto de Avaliação de Tecnologias em Saúde) and ATMOSPHERE (Adaptive, Trustworthy, Manageable, Orchestrated, Secure, Privacy-assuring Hybrid, Ecosystem for Resilient Cloud Computing). We also thank NVIDIA for awarding our project with a Titan V GPU. ALR is recipient of an unrestricted research scholarships from CNPq; AHR receives a Split-PhD scholarship from CNPq; MHR receives
References (22)
- et al.
Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015
J Am Coll Cardiol
(2017) - et al.
NHLBI workshop on the utilization of ECG databases: preservation and use of existing ECG databases and development of future resources
J Electrocardiol
(1998) - et al.
A reference database for multilead electrocardiographic computer measurement programs
J AmCollCardiol
(1987) - et al.
Automated serial ECG comparison based on the Minnesota code
J Electrocardiol
(1996) - et al.
PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals
Circulation
(2000) - et al.
Toward a patient-centered, data-driven cardiology
Arq Bras Cardiol
(2019) - et al.
Cost-benefit of the telecardiology service in the state of Minas Gerais: Minas Telecardio Project
Arq Bras Cardiol
(2011) - et al.
Implementation of a telecardiology system in the state of Minas Gerais: the Minas Telecardio Project
Arq Bras Cardiol
(2010) - et al.
Improving patient access to specialized health care: the Telehealth Network of Minas Gerais, Brazil
BullWorld Health Organ
(2012) - et al.
Recommendations for the standardization and interpretation of the electrocardiogram: part I: the electrocardiogram and its technology: a scientific statement from the American Heart Association Electrocardiography and Arrhythmias Committee, Council on Clinical Cardiology; the American College of Cardiology Foundation; and the Heart Rhythm Society: endorsed by the International Society for Computerized Electrocardiology
Circulation
(2007)
Lazy associative classification
Cited by (38)
End-to-end risk prediction of atrial fibrillation from the 12-Lead ECG by deep neural networks
2023, Journal of ElectrocardiologyDigital biomarkers and algorithms for detection of atrial fibrillation using surface electrocardiograms: A systematic review: Digital Biomarkers for AF in Surface ECGs
2021, Computers in Biology and MedicineCitation Excerpt :Whilst these features are easy to interpret, more complex features can be used to further analyze and describe the ECG signal. NNs do not necessarily require preprocessing to extract features, since this method also allows raw ECG input. [13–26] In total, 131 feature sets were described in 130 studies, where one study trained and validated a classifier with two different feature sets.
Comparison of discrimination and calibration performance of ECG-based machine learning models for prediction of new-onset atrial fibrillation
2023, BMC Medical Research MethodologyScreening for Chagas disease from the electrocardiogram using a deep neural network
2023, PLoS Neglected Tropical Diseases