Abstract

Arrhythmia is a common cardiovascular disease; the electrocardiogram (ECG) is widely used as an effective tool for detecting arrhythmia. However, real-time arrhythmia detection monitoring is difficult, so this study proposes a long short-term memory-residual model. Individual beats provide morphological features and combined with adjacent segments provide temporal features. Our proposed model captures the time-domain and morphological ECG signal information simultaneously and fuses the two information types. At the same time, the attention block is applied to the network to further strengthen the useful information, capture the hidden information in the ECG signal, and improve the model classification performance. Our model was finally trained and tested on the MIT-BIH arrhythmia database, and the entire dataset was divided into intrapatient and interpatient modes. Accuracies of 99.11% and 85.65%, respectively, were obtained under the two modes. Experimental results demonstrate that our proposed method is an efficient automated detection method.

1. Introduction

Arrhythmia is a common cardiovascular disease with an important clinical significance [1]. Common methods for detecting arrhythmias generally rely on electrocardiogram (ECG) and doctors’ experience [2]. This process is complex and cumbersome and is easily influenced by the doctor’s subjective inference. Therefore, research on the automatic detection of arrhythmias has become trending.

In recent years, with the development of machine learning, an increasing number of automatic detection methods for arrhythmias have been applied [3, 4]. Machine learning methods generally require the manual extraction of features. First, the data are preprocessed for refinement, and then features are extracted through a series of mathematical methods, such as wavelet transform, linear discriminant analysis, independent component analysis, and principal component analysis (PCA). The extracted features are input into a classifier to complete the classification [511]. Generally, classifiers include support vector machine (SVM), decision tree, and artificial neural network [1214]. Traditional feature extraction methods of ECG signals are complex and subject to the limitations of specific knowledge fields. In addition, the nonlinear fitting ability is limited. Therefore, the extracted features do not necessarily represent the optimal features, and even the key information of the ECG signals may be omitted.

To overcome the disadvantages of machine learning, deep learning has been applied to the automatic detection of arrhythmias. Compared to machine learning, deep learning no longer requires the manual extraction of features [15]. Convolutional neural networks (CNNs) are a type of deep learning that can automatically extract advanced features of ECG signals by stacking layers and are no longer limited to specific domain knowledge [16]. Al Rahhal et al. proposed a dense convolutional network to detect arrhythmias and proposed focal loss to reduce the problems caused by data imbalance [17]. Yang et al. proposed an ECG classification method based on a lead CNN, which used fuzzy sets to reduce the order of extracted ECG image features and optimized the network using the residual structure [18]. In addition, long short-term memory (LSTM) has been widely used in the classification of ECG signals owing to its excellent performance in processing time series data. Kim and Pyun proposed an automatic arrhythmia detection algorithm based on two-way LSTM, and the experimental results showed that it was superior to the traditional LSTM [19]. Sharma et al. used Fourier–Bessel expansion to process the RR interval and then input the processed data to the LSTM for ECG classification, obtaining good results [20].

Considering the characteristics of ECGs and the superiority of the two networks, an automatic arrhythmia detection algorithm based on multi-information LSTM combined with a residual block is proposed. The algorithm includes two parallel inputs, which are used to fully mine the temporal and morphological characteristics of the ECG. The main contributions of this study are as follows:(1)A residual block and attention block are introduced. The residual block used for even deeper networks can maintain the integrity of the information and prevent gradient explosion. An attention block is used to generate attention weight, enhance information useful features, weaken useless characteristics, and improve model performance.(2)The multiscale depth model is applied, and the ECG beats and segments (include RR interval) are used as inputs to the model, which can focus on both the morphological and temporal characteristics of ECG signals. This method fully excavates hidden information in ECG signals and exhibits excellent performance in arrhythmia detection.(3)In this study, we no longer need to manually extract the features. We conducted experiments in both interpatient and intrapatient modes and achieved excellent results.

2. Materials and Methods

2.1. Data Source

The MIT-BIH arrhythmia database (MITDB) dataset was used in the experiments [21]. The database included 48 records, each containing two leads. According to the classification rules of the Association for the Advancement of Medical Instrumentation, we divided all the data into five types of arrhythmias, as presented in Table 1. Because the number of unknown beats (Q) was small and contained less useful information, it was deleted, and only the other four disease types were retained. We divided the data into two modes, interpatient and intrapatient, and the data were divided into DS1 and DS2, with each mode containing 22 records. Four of these records were not included in DS1 and DS2 owing to poor signal quality, including 102, 104, 107, and 217.

DS1: 101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124, 201, 203, 205, 207, 208, 209, 215, 220, 223, and 230.

DS2: 100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, and 234.

In general, trained models need to be tested for performance using untrained data. In the interpatient mode, DS1 and DS2 were used for training and testing, respectively. There was no patient data overlap between the two. In contrast, in the intrapatient model, data from the training and testing sets might have been from the same patient. In general, the interpatient model is more in line with the actual requirements. Table 2 lists the number of disease types in the ECG database.

2.2. ECG Signal Preprocessing

In our model, we generally had to specify a fixed length as the input. Since the voltage value of R wave is the largest in the ECG signal, it is easy to locate. In contrast, low amplitude waveforms, such as P and T waves, are difficult to detect. Therefore, the segmentation of ECG signal is generally based on R wave as the center. We used the classical R-peak algorithm to detect the position of the R-wave peak [22]. The ECG beat was taken as input 1. With the peak of the R wave as the center, 100 sampling points were taken forward, and 152 sampling points were taken backward; a total of 252 sampling points were taken. With segments (include RR interval) as input, 2, 252 points were taken forward, and 252 points were taken backward with the peak of the R wave as the center; a total of 504 points were taken. Waveform visualization is depicted in Figure 1, which shows the ECG beat and segment. The entire segment contained the current beat waveform and a wider range of information, which constitute the time-domain characteristics of the ECG signal, and a single beat constituted the morphological characteristics of the ECG signal.

To reduce the impact of large differences in the data value range and improve the overall operation rate of the model, Z-score normalization was used for data processing. The normalization function is defined as inwhere X refers to the values of the ECG recording and Z and D refer to the mean and standard deviation of these values, respectively.

2.3. LSTM Block

By introducing the gate mechanism, LSTM overcomes the disadvantage of the traditional recurrent neural network (RNN), namely, gradient disappearance. The forget gate, input gate, and output gate are introduced to LSTM to allow LSTM to overcome long-term dependency and maintain information integrity, which makes LSTM more suitable than RNN for processing temporal data. Figure 2 demonstrates the structure of the LSTM. Equations (2)–(6) represent the calculation formulas for each part of the LSTM:where σ and tanh refer to the sigmoid and tanh functions, respectively. W and b refer to the weights and bias values, respectively. It, Ft, Ot, and ct refer to the input gate, forget gate, output gate, and cell state, respectively. L refers to the accumulated information of the present moment.

2.4. Residual Block

In deep learning, CNNs rely on the sliding of the convolutional window on the input data to extract the local features of the data, and they rely on the pooling layer to refine the features and extract important information from the input data. In general, the more the number of convolutional layers, the more advanced the extracted features will be. However, as the number of convolutional layers reaches a certain level, the model suffers from the problem of gradient explosion. Therefore, to alleviate the gradient problem, a residual block structure was introduced [23]. The skip connection is applied to the residual network to connect useful information to a deeper network for transmission. This effect is superior to the simple stacked structure in the traditional CNN. Figure 3 demonstrates the structure of the residual block; (a) represents the residual block in the attention block, (b) represents the residual block in the backbone network, and Mul denotes multiplication. The output of the first convolutional layer is connected to the subsequent convolutional layers through skip connections so that it can keep the information intact in the deeper network.

2.5. Attention Block

In the detection of arrhythmias, owing to the different activation mapping on the related characteristics of arrhythmias, the recognition degree is different, and other signals may cause interference. Therefore, certain characteristics of generation may not be related to arrhythmia diseases. Hence, this study used the attention block to enhance the information associated with arrhythmias and weaken the irrelevant information. In this study, four attention blocks were used to continuously enhance the relevant information and enhance the recognition performance of the model [24]. Figure 4 depicts the concrete structure of an attention block.

First, the input data are processed through a convolution layer, and then the processed data are introduced into a down-up sampling phase, which is used to expand the receiving domain. Maxpooling and nearest interpolation were used as the downsampling and upsampling operations, respectively. Finally, the final feature is obtained through the residual structure and the 11 convolution layer, which is input into the sigmoid function to obtain the attention weight. Symmetric downsampling and upsampling architectures can quickly extend the receiving domain to obtain global information. The batch normalization (BN) layer is added before the sigmoid function to prevent the gradient problem and overfitting in the training process.

2.6. Proposed Model

In this study, we proposed a novel deep learning model with parallel input. The two branches had the same structure, including one LSTM block, four residual blocks, four attention blocks, and four convolution layers, followed by a maxpooling layer (MAXP) and a global average pooling layer (GAP). Finally, the outputs of the two parallel branches were fused and sent to the full connection layer for classification. Before the full connection layer, dropout with a parameter of 0.5 was added to reduce overfitting. The step size of the largest pooling layer in the entire network was two, and the pooling window was two. The Adam algorithm was used to train the model, and the learning rate was set to 0.001. To fully train the model and achieve better results, the batch size was set to 128 and epochs were set to 100. Validation sets were also added to the intrapatient model. The ratio of the training set, validation set, and testing set was 3 : 1: 1. Figure 5 shows the concrete structure of the proposed model, where “x4” represents that part to be repeated four times. The number of filters in each part was increased successively, namely, 32, 64, 128, and 256, the stride size was two, and the kernel size was two. The last convolution layer in the attention block had a kernel size of one.

3. Results

To more authoritatively evaluate the performance of the entire model, three recognized evaluation indexes, namely, specificity (Spe), sensitivity (Sen), and accuracy (Acc), were applied to evaluate the model. Equations (7)–(9) represent the calculation formulas for each evaluation standard:where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively.

Table 3 presents the final results of our model in the intrapatient model. As observed from Table 3, our method achieved a relatively good performance. The overall Acc was 99.11%, and the average Spe and Sen were 98.56% and 91.63%, respectively. We separately input segments and ECG beats into the model to verify that the performance of our double-input model is superior to that of the single-input model. Tables 4 and 5 present the confusion matrix, Spe, and Sen of the single-input model. The overall Acc of the model with ECG beat and segment was 98.58% and 98.60%, respectively.

No-LSTM network is also designed to show the advantages of LSTM network in the processing of time series data. No-LSTM is to remove the LSTM network in the proposed method and keep the rest unchanged. Table 6 presents the confusion matrix, Spe, and Sen of No-LSTM network. The overall Acc of the model with No-LSTM was 98.53%.

Figure 6 illustrates the changes in the Acc of the training and validation sets and the loss during the training process. In the initial stage, the curves of the training and validation sets changed rapidly and gradually tended to be stable with increasing epoch times and constant parameter optimization. There was no overfitting or underfitting problem in our training process.

We also designed an interpatient model. DS1 and DS2 were applied to the training and testing sets, respectively. Meanwhile, there was no validation set. Table 7 presents the confusion matrix of the interpatient model. The overall Acc was 85.65%. Figure 7 depicts the change curves of the Acc and loss in the model training process, without any fitting phenomenon.

4. Discussion

The traditional ECG signal classification is typically composed of feature extraction and classification. We propose a method based on deep learning that combines feature extraction and classification, avoiding the complex feature extraction process and eliminating the need for a separate classifier. To verify the performance of our model, we conducted three comparative experiments. On the one hand, segments and ECG beats were input into our model as single inputs. On the other hand, No-LSTM network was designed. According to Tables 36, notably, the Sen of the model to SVEB and F was low, and it was highly possible that these two types of arrhythmias constituted a small proportion of the total data quantity. It could be noted from Table 8 that the overall Acc, Spe, and Sen of our double-input model were 99.11%, 98.56%, and 91.63%, respectively, which was better than the overall performance of two single-input models and No-LSTM network. In particular, the Sen of our double-input model was 7.68% (segments) and 6.34% (beats) higher than that of the single-input model. Meanwhile, the Sen of our double-input model was 3.81% higher than No-LSTM network. The information provided by both the segment and the ECG beat improved the Sen of our model, and LSTM has an excellent role in processing time series data.

As depicted in Figure 6, our model had no overfitting, except for slight oscillations. It can be demonstrated that an accurate detection of arrhythmias requires sufficient information from ECG beats and segments.

Existing ECG classification algorithms are summarized in Table 9. Gao et al. used PCA and the dynamic time warping (DTW) method to extract the features of selected ECG fragments and input the extracted features into an SVM [25]. The overall Acc was 97.80%, and the Spe was 88.83%. Although the model achieved good results, the process of feature extraction was complicated and required skilled techniques. Wang et al. proposed a method combining CNN with multilayer perceptron (MLP), in which CNN was used to extract features, and the extracted features were fused with the RR interval and input into MLP, with an Acc rate of 96.27% [26]. Wang et al. used continuous wavelet transform (CWT) to decompose ECG signals and used a CNN to extract features from a two-dimensional scale spectrum composed of the aforementioned time-frequency components [27]. The Acc and Sen were 98.74% and 67.47%, respectively. Niu et al. proposed a new deep learning method for ECG classification based on adversarial domain adaptation, which completed the classification of arrhythmias by constructing three modules, and the final classification result was 92.3% [28]. Our model exhibited excellent performance, particularly in terms of Sen. Our model had certain limitations: (1) the overall model had several parameters, which required a large amount of calculations. (2) The sample data were not balanced, and normal samples accounted for a large proportion. (3) In the interpatient mode, the performance of our model was poor. (4) It had a low recognition degree for a small number of samples.

5. Conclusions

In this paper, a new automatic detection method for arrhythmias is proposed. In our proposed method, both the segment and ECG beat were used as inputs. LSTM combined with a residual block model was used to extract and refine features. Furthermore, the attention block was used to enhance useful information and weaken useless information. The entire process no longer required manual feature production. However, our model simultaneously extracted important information from the time-domain characteristics and morphological characteristics of ECG signals and then integrated the extracted information to the final full connection layer to complete the classification. To effectively evaluate the performance of our model, our entire experiment was conducted on MITDB. The data were divided into interpatient and intrapatient models, and Acc was achieved at 85.65% in the interpatient model and 99.11% in the intrapatient model. The experimental results demonstrated that our model simultaneously captured temporal and morphological characteristics, which is an effective automatic detection technique.

Furthermore, the model can be applied to wearable devices to assist doctors in diagnosis. In the future, our research will be extended to other databases to achieve more disease classifications.

Data Availability

The data used to support the findings of this study have not been made available because the data also form part of an ongoing study. Original data of the study can be obtained at https://physionet.org/.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This study was partly supported by the National Natural Science Foundation of China (Grant no. 61903226).