Abstract

Although convolutional neural networks (CNNs) can be used to classify electrocardiogram (ECG) beats in the diagnosis of cardiovascular disease, ECG signals are typically processed as one-dimensional signals while CNNs are better suited to multidimensional pattern or image recognition applications. In this study, the morphology and rhythm of heartbeats are fused into a two-dimensional information vector for subsequent processing by CNNs that include adaptive learning rate and biased dropout methods. The results demonstrate that the proposed CNN model is effective for detecting irregular heartbeats or arrhythmias via automatic feature extraction. When the proposed model was tested on the MIT-BIH arrhythmia database, the model achieved higher performance than other state-of-the-art methods for five and eight heartbeat categories (the average accuracy was 99.1% and 97%). In particular, the proposed system had better performance in terms of the sensitivity and positive predictive rate for V beats by more than 4.3% and 5.4%, respectively, and also for S beats by more than 22.6% and 25.9%, respectively, when compared to existing algorithms. It is anticipated that the proposed method will be suitable for implementation on portable devices for the e-home health monitoring of cardiovascular disease.

1. Introduction

The electrocardiogram (ECG) has become a useful tool [1, 2] for the diagnosis of cardiovascular diseases as it is fast and noninvasive. It has been reported that about 80% of sudden cardiac deaths are the result of ventricular arrhythmias or irregular heartbeats [3]. While an experienced cardiologist can easily distinguish arrhythmias by visually referencing the morphological pattern of the ECG signals, a computer-oriented approach can effectively reduce the diagnostic time and would enable the e-home health monitoring of cardiovascular disease [4]. However, realizing such computer-oriented approaches remains challenging due to the time-varying dynamics and various profiles of ECG signals, which cause the classification precision to vary from patient to patient [5], as even for a healthy person, the morphological pattern of their ECG signals can vary significantly over a short time [6].

To achieve the automatic classification of ECG signals, scientists have proposed several methods to automatically classify heartbeats, including the Fourier transform [7], principle component analysis [8], wavelet transform [9], and the hidden Markov method [10]. Moreover, machine learning methods, such as artificial neural networks (ANNs) [11], support vector machines (SVMs) [12], least squares support vector machines (LS-SVMs) [13], particle swarm optimization support vector machines (PSO-SVMs) [14], particle swarm optimization radial basis functions (PSO-RBFs) [15], and extreme learning machines (ELMs) [8], have also been developed for the accurate classification of heartbeats.

However, there are some drawbacks to these classification methods. For example, expert systems [16, 17] require a large amount of prior knowledge, which may vary for different patients. Another problem lies in the manual feature selection of the heartbeat signal for some machine learning methods. ECG feature extraction is a key technique for heartbeat recognition, which is used to select a representative feature subset from the raw ECG signal. Subsets are selected as they are easier to generalize, which will improve the accuracy of ECG heartbeat classification. However, manual selection may result in the loss of information [18, 19]. Moreover, methods like the PCA and Fourier transform may increase the complexity and computational time required to identify a solution. As the number of patients increases, the classification accuracy will decrease due to the large variation in the patterns of the ECG signals among different patients.

Convolutional neural networks (CNN) are useful tools that have been used in pattern recognition applications [20, 21], such as the classification of handwriting [22] and object recognition in large archives. The connectivity between neurons in a CNN is similar to the organization of the visual cortex in animals, which makes CNNs superior to other methods in the recognition of the pattern and structure of items. CNN also provide a number of advantages over conventional classification techniques in biomedical applications. For example, they are widely used in skin cancer diagnosis [23], animal behavior classification [24], protein structure prediction [25], and electromyography (EMG) signal classification [26]. In addition, CNNs have also been used for ECG classification. For example, Kiranyaz et al. proposed a one-dimensional (1D) CNN for real-time patient-specific ECG classification [5]. CNNs are a specialized type of neural network with an inherent grid-like topology for processing input data [27] in which nearby entries are correlated, such as those in a two-dimensional (2D) image.

In this paper, we explored the 2D approach for ECG classification with CNN. An information fusion vector of heartbeats is transformed into a binary image via one-hot encoding [28]. Such images will capture the morphology of a single heartbeat as well as the temporal relationship between adjacent beats and will be used as the 2D input to a CNN. To accelerate the convergence speed of the learning, a per-dimension learning rate method for gradient descent called ADADELTA [29] is incorporated into the CNN. Moreover, to reduce the overfitting of the network, a biased dropout [30] is also included.

The rest of the paper is organized as follows. Section 2 introduces the methodology employed in this study. In Section 3, the classification system based on the proposed method is presented. In Section 4, the simulation results are provided along with a comparison between the classification accuracy of the proposed method, another based on 1D data, and other common methods. The configuration of the proposed method used in the comparisons is detailed in Section 5, including the network structure, the parameters describing the convolution layer, and the input dimensions. Finally, the conclusions drawn from this study are outlined in Section 6.

2. Methodology

2.1. ADADELTA Method

The ADADELTA adaptive learning rate method was incorporated into the proposed CNN to avoid the need to set the learning rate manually. This algorithm employs a different learning rate for each parameter optimization in each iteration and the essence of the approach lies in accumulating the sum of squared gradients over a window of past gradients of some fixed size. The running average E of the squared gradient is calculated as follows:where is the gradient of the current time and is a decay constant. The increment of each iteration is determined aswhere is the learning rate that controls the size of a step in the gradient. The details of the process used to update parameter are provided in [29].

2.2. Biased Dropout

Overfitting can occur during the training of a neural network when there is a large number of parameters [30]; however, dropout methods can be used to mitigate this problem. In the case of the proposed CNN, a biased dropout technique was selected so as to minimize any increase in the training time. The essence of biased dropout is in the activation value of the hidden layer unit that is used to access the contributions to the network performance. In general, the dropout rate of a hidden unit group is inversely proportional to its contribution to the performance of the network. In a particular hidden layer, the total number of hidden units is and such units are divided into two groups according to their contribution to the activation value. The dropout rate can be computed as follows, where p1 represents the low dropout rate and p2 represents the high dropout rate.The average dropout rate can be expressed as where n1 and n2 are the numbers of the hidden layer units in each of the two groups. The dropout rate of a certain lth layer can be expressed by the Bernoulli distribution,and the updating of the CNN output of lth layer can be represented aswhere is the neuron output from lth layer, is the input to the activation function, is the weight coefficient, and is the bias.

3. Classification System

There are three major stages in a heartbeat classification system: preprocessing, feature extraction, and classification. In this study, preprocessing includes information fusion and one-hot encoding, while feature extraction is performed by the proposed CNNs. Finally, a common classifier called Softmax is used for classification, which solves multiple classification problems via logistic regression. A flowchart of the classification system is shown in Figure 1.

Two CNN models are constructed in this work, as shown in Figure 1. One is used for raining (represented by the circle diagram in Figure 1), while the other model is used for testing (represented by the square diagram in Figure 1). The training set data is firstly sent to the first CNN model for training. After 20 iterations, the weights and thresholds of each layer (convolution layer and full connection layer) of the trained CNN model can be obtained, and the automatically extracted weights and thresholds can be used as high-level features of the ECG signal. Finally, the test set is sent to the second CNN model for testing and then the classification results can be obtained.

3.1. Preprocessing

There are four processes in the ECG signal preprocessing stage: ECG denoising, heartbeat segmentation, information fusion, and one-hot encoding. A flowchart of the preprocessing stage is shown in Figure 2.

3.1.1. ECG Denoising

An ECG signal is a weak signal with an amplitude less than 100 mV in which the energy is concentrated in the 0.5–30 Hz frequency range [37]. Such weak signals are susceptible to corruption by environmental noise and other factors; thus, recorded ECG signals often include noise and interference, such as myoelectric interference, baseline drift, and power frequency interference. In Experiment 1, which is described later, the baseline drift was removed using a bandpass filter (passband: 0.1–100 Hz) [18] and the high-frequency noise was partially removed using a three-scale discrete wavelet transform, which removes the coefficients below a certain threshold [38, 39]. The Daubechies 5 (db5) wavelet basis function was adopted and the threshold was estimated via the Stein unbiased likelihood method. A comparison between a recorded ECG signal before and after denoising is shown in Figure 3. Note the amplitude of the ECG signal was normalized into the interval of . This ECG signal is from recording 228 in the MIT-BIH Arrhythmia Database [6]. In order to compare the results of Experiment 2 with those in [33], the baseline drift was removed via subtraction between the original signal and the baseline of the signal after processing by two median filters.

3.1.2. Heartbeat Segmentation

To preserve the morphological structure of the P-QRS-T waves and eliminate any extraneous data, in this study, the raw data of each heartbeat was represented by 234 samples centered around the R peak, the sampling frequency was 360 Hz, and the time length of each heartbeat was 0.65 s. The recording of each heartbeat was downsampled to 97 samples and the amplitude of each heartbeat was normalized to an interval of .

3.1.3. Information Fusion Vector

When studying heartbeats, the R-R interval refers to the time interval between two adjacent cardiac beats of the QRS complex and reflects the rhythm characteristics of the cardiac beat signal. Arrhythmias cause abnormal changes in the R-R interval, such as a shortening of the R-R interval of the premature beat rhythm and/or a lengthening of the R-R interval of the escape beat rhythm.

Each type of each heartbeat is clearly indicated with respect to the corresponding R peak in the MIT-BIH arrhythmia database [6], which allows the R-R interval for adjacent heartbeats to be obtained directly. Three values were selected to represent the rhythmic information of the beat (RR_inf), namely, RR_inf = [RR_pre, RR_pos, RR_dif], where the R-R interval between the current beat and the previous beat is denoted by RR_pre, the R-R interval between the former beat and the subsequent beat is denoted by RR_pos, and the difference between these is denoted by RR_dif. After labelling, the values RR_inf are normalized into the interval of .

Finally, the morphological information (Mor_inf) of each heartbeat (as described in Section 3.1.2) and RR_inf are fused to form the information fusion vector of the heartbeat (Input_x), where Input_x = [Mor_inf, RR_inf].

3.1.4. One-Hot Encoding

As CNNs exhibit excellent performance in image recognition, the information fusion vector (Input_x) of the heartbeat is therefore transformed into a binary image via the one-hot encoding technique. A binary image is a digital image that has only two possible values for each pixel. Therefore, the signal is divided into 20 (ordinate) × 100 (abscissa) meshes [26]. If the value of the information fusion vector appears in a mesh, then a value of one will be used to represent the mesh; otherwise, a value of zero will be used. The process of encoding the information fusion vector of one heartbeat is shown in Figure 4.

3.2. Feature Extraction

CNNs can automatically generate high-level features (i.e., weights and thresholds) by training. In this study, a novel CNN is proposed for the classification of heartbeats that consists of two convolutional layers, one pooling layer, one flattening layer, and two fully connected layers. The structure and settings of the proposed CNN are illustrated in Figure 5. In this network, each convolutional layer can be considered to be a fuzzy filter that enhances the characteristics of the original signal and reduces the noise. The convolution kernel size of each feature map is 5 × 11 and the stride is one. The subpooling layer (i.e., the maximum subsampling layer) subsamples the data using the principle of local correlation and retains useful information while reducing the number of data dimensions. The pooling size is 2 × 2 and the stride is two. The flattening layer is used to transform the data in the pooling layer into feature vectors. The biased dropout method (described in Section 2.1) is employed in the two fully connected layers. In addition, the rectifier linear unit (ReLU) was adopted as the activation function, the cross entropy was adopted as the loss function, and the ADADELTA method (as described in Section 2.2) was adopted as the adaptive learning rate method. The CNN structure was implemented in Keras on Python Linux running on a graphics processing unit (GPU) (GTX1080Ti).

4. Experiments and Results

We set up three experiments to evaluate the proposed classification system. In Experiment 1, compare the performance of the two proposed methods and different input dimensions, and compare the results of the existing research and the proposed classification algorithms in Experiments 2 and 3.

4.1. Database

This work employed the MIT-BIH arrhythmia [6] and QT databases [40] for three different experiments. The MIT-BIH arrhythmia database has been widely used in the evaluation of algorithms developed for the classification of heartbeats [5, 41, 42] and contains 48 recordings from 47 patients. Each recording contains two-lead signals slightly longer than 30 min that were band-pass filtered at 0.1–100Hz and digitized at 360 samples per second. The ECG signal of the modified limb lead II (ML II) was then used for two different experiments (Experiments 1 and 3). The six most common heartbeat categories were selected from 46 recordings (ML II) of the MIT-BIH arrhythmia database for Experiment 1, namely, the normal beat (N), paced beat (/), atrial premature beat (A), premature ventricular contraction (V), left ventricular bundle branch block (L), and right bundle branch block beat (R) categories. As normal heartbeats accounted for 73.3% (n = 73,542) of the total number of samples, 6000 normal heartbeats were randomly chosen for classification. The number of beats in each set of samples is listed in Table 1. A total of 60% of the heartbeats were randomly selected from the sample set to serve as the training dataset [43] while the remainder of the dataset was used as the testing set.

To compare the classification performance of the proposed algorithm with that of other algorithms, eight typical datasets (Table 2) were selected from the QT database [40]: normal beat (N), paced beat (/), atrial premature beat (A), premature ventricular contraction (V), fusion of paced and normal beat (f), fusion of ventricular and normal beat (F), premature or ectopic supraventricular beat (S), and right bundle branch block beat (R). The QT database contains two-channel ECG signals (ML II and V5), both of which were adopted in Experiment 2.

In Experiment 3, as recommended by the Association for the Advancement of Medical Instrumentation (AAMI) [44], all beats were classified as beats originating in the sinus mode (N), supraventricular ectopic beats (S beats or SVEB), ventricular ectopic beats (V beats or VEB), fusion beats (F), or unclassifiable beats (Q), and four paced records (102, 104, 107, and 217) were excluded from the MITDB. Based on the characteristics and symptoms of the subjects, the 44 selected records were divided into two groups [5, 18, 35, 36]. The first group, denoted by DS1, included 20 records with label numbers beginning with 1 as representative samples of a variety of waveforms and artifacts that may be encountered by an arrhythmia detector in routine clinical use. The remaining 24 records were placed into the second group (denoted by DS2) and contained complex ventricular, junctional and supraventricular arrhythmias, and conduction abnormalities. In order to train the CNN model, the training dataset consisted of a common part and a subject specific part. The common part was selected from DS1 and the subject specific part was selected from DS2 from the training data from the first 5 min of the ECG recording of each subject. The remaining 25 min of the recordings in DS2 was used for testing. The number of beats for each sample set is shown in Table 3.

4.2. Experiment 1

In Experiment 1, the performance of the gradient descent and accuracy of the classification between the model with the ADADELTA optimizer and the one with the stochastic gradient descent (SGD) optimizer over 20 iterations was compared, as shown in Figure 6. As can be seen in Figure 6(a), the convergence speed of the ADAELTA optimizer was faster than the SGD optimizer for all training iterations. The accuracy of classification was much higher (78% versus 96%) at the beginning of the first iteration and converged to about 99%, which was still higher than the results from the SGD optimizer after 20 iterations (Figure 6(b)). In addition, the training time required for an accuracy of 99% for the dropout and biased dropout methods (dropout rate: 50%) was compared and it was found that the biased dropout method required fewer iterations (14 versus 19).

Next, the results of the gradient descent and accuracy were compared for the CNN model with a 1D ECG signal input and the proposed 2D input. The data preprocessing steps applied to the ECG signal for each method were the same, although the pooling size varied due to differences in the structure of the input data. For the 1D heartbeat signal, the pooling size was set to 1 × 2 while that for the 2D heartbeat signal was 2 × 2. The results of this comparison are shown in Figure 7. The convergence speed of the model with the 2D input was faster than the one with the 1D input for all training iterations. Also, the initial accuracy was higher (97.5% versus 95.7%) for the model with the 2D input structure and this model continued to outperform the others for all validation iterations. The accuracy of the proposed method was found to be as high as 99.1%, which is 1.3% higher than that for the model with the 1D data input. Even though excellent results were obtained for the proposed method, a drawback is that the running speed on the GPU requires 5 s for one iteration compared to 3 s for the model with the 1D data input.

4.3. Experiment 2

In Experiment 2, the results of the accuracy of the proposed method were compared with the results of five common existing methods. The QT database [40] was adopted as a dataset and the number of classification categories was expanded to eight. The data preprocessing steps were the same for all methods. The methods used in the comparison are listed in Table 4. The average and standard deviation of the accuracy (as indicated by the red error bars) is shown in Figure 8. To ensure consistency, all experiments were repeated at least ten times after which the average accuracy was calculated to determine if the methods were capable of accurately classifying the input ECG beats. As shown, the proposed method had the highest accuracy at 97%, although when compared to the results of Experiment 1, the accuracy decreased by about 2% due to the lower number of heartbeats input to the CNN model.

4.4. Experiment 3

The data from the MIT-BIH arrhythmia database was processed according to the AAMI ECAR-1987 recommended practice [44] when used in the evaluation of the proposed classification algorithm. As the same data handling procedure was used in [5, 18, 35, 36], the results of the current study can be directly compared with the results of those. The confusion matrix for all testing beats in the 24 test records (DS2) in the MIT-BIH arrhythmia database is shown in Table 5, where it can be seen that some of the S beats and F beats were misclassified as N beats, although the number of misclassifications was significantly lower than the previous best result [5, 18]. Also, the average accuracy reached 99.1% after ten independent experiments.

Four standard metrics were used to evaluate the performance of the proposed classification method: the accuracy (Acc), sensitivity (Sen), specificity (Spe), and positive predictive rate (Ppr).where TP is a true positive, TN is a true negative, FP is a false positive, and FN represents a false negative. Here, the classification performance for S and V beat detection was evaluated using a method similar to that in several previous studies [5, 18, 35, 36].

The average classification results of the ten independent experiments and a comparison with the results from state-of-the-art algorithms are shown in Table 6. The experimental results demonstrate that the proposed classification system was among the top performers in terms of the indicators. It is worth noting that the proposed approach significantly improved the Sen and Ppr values of the S beats to more than 99%, which are two indicators that were seen to perform poorly in the other methods. Moreover, the Sen and Ppr for the V beats improved by more than 4.3% and 5.4%, respectively, when compared to the top performing algorithms. However, the Acc and Spe for the V beats were worse.

5. Discussion

A CNN is a deep feedforward artificial neural network that can automatically extract deep features from data without requiring the complexity of extracting features manually. CNNs are known to exhibit excellent performance in the field of image recognition. In this study, a classification system frame was designed based on the LeNet-5 structure to convert 1D heartbeat signals from the MIT-BIH arrhythmia database into 2D images via one-hot encoding. These images were then used as inputs to the proposed CNN. The results of the proposed method demonstrated the classification accuracy of the 2D inputs was significantly higher than that for the 1D inputs when six heartbeat categories were evaluated. To improve system performance, the morphological characteristics and rhythm information of the original heartbeat signal were fused into a single information vector. The steps were as follows: (1) adjust the number of layers, the number of characteristic images of the convolution layer, and the size of the convolution kernel on the basis of the LeNet-5 structure; (2) select an appropriate input image size. Considering the characteristics of the heart signal and the data characteristics of the MIT-BIH arrhythmia database, one of the pooling layers present in the LeNet-5 structure was not included in the proposed CNN structure. The resulting structure consisted of two convolutional layers and one pooling layer. To determine suitable parameters in each convolution layer, five groups of different numbers of feature maps (C1/C2: 12/6, 6/6, 6/12, 3/6, and 6/3) combined with different sizes of convolution kernels (3 × 9, 3 × 11, 5 × 9, and 5 × 11) were evaluated in a cross correlation experiment. The different input sizes of image (20 × 50, 40 × 50, 20 × 100, and 40 × 100) was also considered in this work. The results demonstrate that to guarantee as little training time as possible, 12/6 feature maps, convolutional kernels of size 5 × 11, and input image of size 20 × 100 provided the highest classification accuracy.

The ADADELTA and biased dropout methods were adopted to avoid overfitting and reduce the training time in the proposed CNN. The average classification accuracy of the 2D input model was as high as 99.1% on the six heart-beat types with a training time of each round of 5 s. The initial classification accuracy (i.e., the accuracy after the first round of training) was 97%. In order to evaluate the performance of the system, two comparative experiments were developed and the results showed that the performance of the proposed system significantly improved the classification accuracy of five and eight heartbeat categories when compared with other existing algorithms. The proposed 2D CNN-based classification system was found to have a 22.6% higher Sen and 25.9% higher Ppr, respectively, for S beat detection when compared with previous CNN based methods [5, 18]. However, the Acc and Spe for the V beats were less than the best available results, due to the lack of additional screening of the types of heartbeats being classified. Instead, the original heartbeats were selected directly from the database for training. In the future, more care should be taken to extract representative heartbeats for input into the CNN model to improve the performance of the classification system.

6. Conclusion

In this work, a CNN based classification system of ECG signals was proposed that included both the ADADELTA and biased dropout algorithms to improve performance. In the preprocessing stage, the 1D information fusion vector was transformed into a 2D image via one-hot encoding to improve the accuracy and convergence speed of classification. The results demonstrated that the ADADELTA optimizer significantly increased the learning rate and convergence speed. The task of training 20,000 heartbeats on the GPU required only 5 s for a single iteration. Compared with other existing algorithms, the proposed method outperformed in terms of the evaluation indicators. It should be noted that the results for the Sen and Ppr for the S beats were 22.6% and 25.9%, respectively, higher than the existing top performing algorithms. Thus, it is anticipated that the proposed algorithm is well suited for implementation in a portable device that can be used in e-home health monitoring scenarios.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Key Scientific and Technological Research Project of Jilin Province (Grant No. 20170414017GH) and the Premier Key Discipline Enhancement Scheme supported by Guangdong Government Funds (Grant No. 2016GDYSZDXK036).