A deep learning approach for fetal QRS complex detection

Wei Zhong; Lijuan Liao; Xuemei Guo; Guoli Wang

doi:10.1088/1361-6579/aab297

1. Introduction

Worldwide, an estimated 2.65 million stillbirths occur annually, of which about half occur in the intrapartum period such as childbirth (Bhutta et al 2011). Thus, effective monitoring techniques are needed to monitor the fetal condition during the pregnancy and at delivery.

As the one of most important vital signs, fetal heart rate (FHR) allows people who are interested in the fetal health to evaluate the physiological condition of the fetus and thus, FHR monitoring has been extensively used for intrapartum and antepartum monitoring. Some abnormal FHR patterns, such as rapid acceleration-decelerations, can be associated with fetal distress. Currently, five kinds of electronic fetal monitoring techniques have been used for FHR monitoring (Behar et al 2016). These techniques include non-invasive foetal electrocardiography (NI-FECG), fetal phonocardiography (PCG), fetal magnetocardiography (FMCG), cardiotocography (CTG) and scalp ECG (SECG).

PCG obtains an acoustic recording from the mother's abdomen. Fetal heart sounds are observed through auscultation. However, an expert is required to locate the fetus and PCG is prone to acoustic noise. In addition, PCG has the lowest signal-to-noise ratio (SNR) of all methods. Due to these limitations, PCG is almost unused in clinical practice (Kovacs et al 2010).

FMCG uses SQUID sensors positioned near the mother's abdomen to detect the magnetic field of the fetal heart (Peters et al 2001). High cost and cumbersome apparatus have restricted its use to-date, though a relatively high SNR is obtained.

CTG and SECG are two main widespread methods currently in clinical practice. However, neither method is perfect. Both techniques have their advantages and drawbacks. CTG, which is the most widespread, uses an ultrasound transducer and a uterine contraction pressure-sensitive transducer to monitor contraction and FHR. However, extensive training and relatively high costs are required in CTG and, moreover, the safety of the fetus exposed to ultrasound radio frequency has not been conclusively demonstrated (Barnett and Maulik 2001). An electrode is placed on the fetal scalp directly in SECG. SECG can provide an accurate FHR time series, but it is an invasive method that increases the risk of infection, so it can only be used at delivery (Hasan et al 2009).

Due to the limitations of other methods, NI-FECG, which is non-invasive, is suggested as an alternative monitoring technique, because NI-FECG can not only provide an accurate estimation of the FHR, but also has the potential to provide morphological information related to the pathological condition of the fetal heart. In the field of NI-FECG signal processing, in order to compute the FHR and detect rhythm abnormalities, effective techniques are needed to detect the location of fetal QRS complexes. However, it is difficult to detect fetal QRS complexes in the NI-FECG signals recorded from the mother's abdomen (Cohen et al 2012). One of the main reasons is the relatively low SNR of the fetal electrocardiography (FECG) compared to the MECG. MECG, which is the predominant interference source, has much larger amplitude than the FECG. Some other interference sources are as follows: (1) 50 or 60 Hz power-line interference, (2) fetal movement and respiration, (3) electrode contact noise, and so on Behar et al (2016).

Despite the significant advances in the field of adult ECG signal processing, the analysis of NI-FECG is still in the early stages of development. Numerous source separation algorithms have been proposed to retrieve the real FECG from the NI-FECG. The main algorithms for FECG extraction have been reviewed (Clifford et al 2014), which showed that most of the algorithms can usually be categorized into adaptive filtering, linear decomposition and non-linear decomposition or blind/semi-blind, adaptive filtering methods (AM) and template subtraction (Behar et al 2014). A two-step approach is widely employed. The MECG is typically removed in the first step, using, for example, an adaptive filtering algorithm, followed by a second step with an attempt to extract the fetal QRS complexes from ECG residuals, which contain the FECG and noise components.

An adaptive filtering algorithm, which assumes a regressive model, can remove the MECG or extract the FECG directly by using a trained filter. However, a reference MECG signal, which is affected by noise, is required in an adaptive filtering algorithm, and the noisy MECG signal does not satisfy the hypothesis of the regressive model. Therefore, the performance of adaptive filtering in the MECG canceling is reduced. Among the adaptive techniques the AM_esn has recently been shown as a promising method for the analysis of NI-FECG in the work of Andreotti et al (2016).

In essence, linear decomposition is a form of blind or semi-blind source separation, which decomposes the abdominal ECG recordings into three categories: MECG, FECG and noise. Principal component analysis (Kanjilal et al 1997), independent component analysis (Najafabadi et al 2006), periodic component analysis (πCA) (Sameni et al 2008) and singular value decomposition (Kanjilal et al 1997) are the most commonly used linear decomposition algorithms for NI-FECG signal processing. The mixture of the three categories (MECG, FECG and noise) is assumed to be linear and stationary in linear decomposition algorithms. This last assumption is usually not a good approximation for a non-stationary signal like NI-FECG.

Non-linear decomposition algorithms have potential to extract FECG from low SNR abdominal recordings. For this reason, the non-linear decomposition algorithms are the subject of considerable research. To cancel out the maternal QRS and enhance FECG, the deflation method of subspace decomposition and non-linear projection (Richter et al 1998) are usually used in non-linear decomposition algorithms. However, high computational complexity in non-linear decomposition algorithms has limited its use in real-time circumstances. To get more details about the state-of-the-art research, the review article from Sameni and Clifford (2010) is highly recommended.

The present study has three major goals.

Unlike previous studies, we intend to explore the performance of fetal QRS complex detection based only on NI-FECG fingerprinting without canceling MECG signals.
Based on the recent success of deep learning approach in arrhythmia modeling (Rajpurkar et al 2017), we build a model of fetal QRS complexes using a CNN for the first time (to the best of our knowledge).
We investigate the effectiveness of several advanced deep learning algorithms and the signal quality assessment on classification performance in this fetal QRS complex detection task.

2. Method

2.1. Dataset

Data used in this experiment are collected from set-a of PhysioNet/computing in the cardiology challenge database (PCDB) (Silva et al 2013). PCDB is the largest publicly available NI-FECG dataset to date. Set-a includes seventy-five abdominal ECG (AECG) recordings (a01–a75). Each recording includes four channels. Each channel is sampled at 1000 Hz. Data and reference annotations (fetal QRS complexes) for set-a are available. As suggested in Behar et al (2013), seven AECG recordings (a33, a38, a47, a52, a54, a71 and a74) are discarded because of inaccurate reference annotations.

The remaining 68 AECG recordings without a signal quality assessment implementation described below are divided into three categories: training, validation, and testing sets. Seven recordings (a01, a02, a03, a04, a05, a06 and a07) constitute the testing set, which includes a total of 3924 fetal QRS complexes. Six recordings (a08, a09, a10, a11, a12 and a13) constitute the validation set, which includes a total of 3348 fetal QRS complexes. The remaining 55 recordings constitute the training set, which includes a total of 31 056 fetal QRS complexes. Notably, the data for the training, validation, and testing sets are mutually exclusive.

Automatic high-accuracy methods for adult QRS detection (thus maternal QRS) have been published since the mid-1980s (Pan and Tompkins 1985). However, methods for fetal QRS detection are still in their infancy. In the context of NI-FECG extraction, a detected fetal QRS is classically considered a true positive if it is within 50 ms of the reference annotation. So the frame size is set to 100 ms. An example of an input in this study is shown in figure 1.

**Figure 1.** An example of an input (100 ms long NI-FECG signal). We train a three-layer CNN to detect fetal QRS complexes from these raw NI-FECG signals.
Download figure:
Standard image High-resolution image

To form a relatively balanced dataset, a stride of 35 ms is implemented for data without fetal QRS and a stride of 10 ms is implemented for data with fetal QRS. In this study, fetal QRS detection is a task of binary classifcation. Each NI-FECG signal in the training, validation, and testing set is 100 ms long and can contain only one type. Each NI-FECG signal is annotated into one of the two classes according to the reference annotation provided by the PCDB. For each 100 ms long NI-FECG signal, if the reference location of the fetal QRS complex is within the frame, then the NI-FECG signal is considered a fetal QRS complex. In contrast, if the reference location of the fetal QRS complex provided by the PCDB is not within the frame, then we considered the 100 ms long NI-FECG signal not a fetal QRS complex.

2.2. Signal quality assessment

We investigate the task of fetal QRS complex detection from the raw NI-FECG signals. This is known to be a challenging task and it is very common for the NI-FECG to be contaminated by a variety of other physiological signals and noise. To automatically detect fetal QRS complexes in NI-FECG signals, an algorithm must possess the ability to recognize the distinct wave types and discern the complex relationships between them over time. This is difficult due to the variability in wave morphology in the NI-FECG signals as well as the presence of noise from all kinds of interference sources. Thus, there is a need for effective techniques that can assess the signal quality of NI-FECG. The sample entropy (SampEn) method mentioned in Liu et al (2014) is used in this study for noise identification. The SampEn is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm SampEn}(m, r, N) = -{\rm ln}(\sum_{i=1}^{N-m}A_i^m(r)/ \sum_{i=1}^{N-m}B_i^m(r)) \label{SampEn} \nonumber \end{align} \tag{ 1 }$

where m corresponds to the embedded dimension, r to the threshold and N to the data length. To get more details about the SampEn, the article from Liu et al (2014) is recommended. In this study, the threshold is set at 1.5. The embedded dimension is set at 2 and the data length is set at 500. By comparing the mean SampEn value from each channel with the threshold value, SampEn values less than 1.5 are regarded as good quality and are used in the study. However, if the number of good quality signals is less than two in a recording, the two channels with the smallest and penultimate SampEn values are obtained to form the dataset. An example of signal quality assessment on a 10 s signal episode from record 42 of set-a is shown in figure 2.

**Figure 2.** An example of signal quality assessment on 10 s signal episode from record 42 of set-a. With good quality, channels 2 and 4 are reserved for further analysis, while channels 1 and 3 are discarded because of the relatively large SampEn values.
Download figure:
Standard image High-resolution image

After signal quality assessment, the number of fetal QRS complexes in training, validation, and testing sets change to 26 639, 3082 and 3538, respectively. A three-layer convolutional neural network is trained on this training set. The network maps a sequence of NI-FECG samples to a sequence of sample classes.

2.3. CNN in fetal QRS detection

With large annotated datasets, deep neural network based machine learning models have consistently been able to approach and often exceed human performance (He et al 2015, Amodei et al 2016). The CNN is a trainable and non-linear system that is made up of a very large number of connections and layers. It has already proven to be effective in many fields such as handwriting character recognition and healthcare applications, particularly in arrhythmia detection and medical imaging (Rajpurkar et al 2017, Esteva et al 2017).

The fetal QRS complex detection task is a sequence-to-sequence task. A CNN model is employed for this sequence-to-sequence learning task. An NI-FECG signal $X= [x_1, x_2, ..., x_s]$ is the input to the model. A sequence of labels $P=[\,p_1, p_2, ..., p_n]$ are the output of the model. Each segment of the input corresponds to an output label.

The convolutional layers calculate L 1D convolutional computations between filter vectors w^l, of size 2k + 1, and input map x:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle y^l(i) = \sum_{j=-k}^{k }w^l(\,j)x(i - j) ~~l=1, 2, ..., L \nonumber \end{align} \tag{ 2 }$

where y^l(i) is the unit i of the l-th output feature map and w^l(j) is the weight j of the l-th filter vector. The dimension (dim) of the output feature map (y) is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm dim}(y) = \frac {{\rm dim}(x) - {\rm dim}(w) + 1}{{\rm size~ of~`max~pooling'}}. \nonumber \end{align} \tag{ 3 }$

Then the output units of the convolutional layer are given by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle z^l(i) = {\rm ReLU}(y^l(i) + b^l)~~l=1, 2, ..., L \nonumber \end{align} \tag{ 4 }$

where z^l(i) is the unit i of the l-th output units of the convolutional layer and where b^l is the bias term.

The ReLU activation function is represented by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle f_{r}(v) = {\rm max}(0, v). \nonumber \end{align} \tag{ 5 }$

We use a cross-entropy objective function as the loss function in this study. The cross-entropy objective function is given by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{L}(Q, P) = - \sum_{i=1}^{n }Q(i) \log P(i) \nonumber \end{align} \tag{ 6 }$

where Q is the ideal output and where P is the actual output.

Figure 3 shows the high-level architecture of the proposed deep learning network. Overall, the architecture of the network contains three convolutional blocks and a dense block. The first convolutional block contains a convolutional layer (Conv1), a batch normalization (Ioffe and Szegedy 2015) layer (BN1) and an activation function layer (ReLU1). In the first convolutional block, we apply the BN1 between the Conv1 and ReLU1 to make the optimization of such a deep CNN model more tractable.

The second convolutional block contains a convolutional layer (Conv2), a dropout layer (Dropout2), a max-pooling layer (Maxpool2) and an activation function layer (ReLU2). In the second convolutional block, we apply the Dropout2 and Maxpool2 between the Conv2 and ReLU2 to prevent neural networks from overfitting.

The third convolutional block contains a convolutional layer (Conv3), a dropout layer (Dropout3), a max-pooling layer (Maxpool3) and an activation function layer (ReLU3). The convolutional layers are considered as a feature extractor, which capture the abstraction of NI-FECG signals.

The dense block contains dense layers (Dense) and a softmax layer (Softmax). The Dense contains three fully connected layers. A distribution over the two output classes (number of classes n = 2 in this study) is produced by final fully connected layers and softmax.

The proposed deep learning network, which contains three convolutional layers (L = 3), takes as input a time-series of 100 ms long NI-FECG signals (input length s = 100 in this study). The first, second, and third convolutional layers contain 64, 128, and 256 filters, respectively. Each filter generates one feature map. Inspired by some convolutional neural network models (Simonyan and Zisserman 2014), the filter length used in this study is set at 3. Using the filter with length of 3 achieves satisfactory results in this study, and we have not found significant performance improvement when using a larger filter. The size of all the max-poolings is set at 2. We use the stochastic gradient descend optimizer with learning rate 10⁻² to train the model in this study. Overall, the CNN network maps a sequence of NI-FECG samples to two sample classes (fetal QRS or not fetal QRS). In addition, the data is normalized before the raw NI-FECG signal is fed into the CNN network. The min-max normalization is used to transform the data into a common range $[0, 1].$ We use the validation set to evaluate the model and the best model is saved during the optimization process.

In the proposed method, the features of single-channel NI-FECG signals are used for fetal QRS complex detection. In the case of multi-channel signals, the channel with the best quality is recommended for the detection task. In addition, in order to learn more possible patterns present in the data, the channels with relatively good quality (SampEn values less than 1.5) are used to train the CNN network in this study.

3. Result

3.1. Evaluation metrics

Precision, recall, and F-measure are typically used as the evaluation metrics in pattern recognition and information retrieval. We use these evaluation metrics to assess the performances of fetal QRS complex detection. Precision, recall, and F-measure are defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm Precision = \frac {TP}{TP+FP}} \nonumber \end{align} \tag{ 7 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm Recall = \frac {TP}{TP+FN} }\nonumber \end{align} \tag{ 8 }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle F_{1}~{\rm measure= 2\times \frac {Precision \times Recall}{Precision + Recall}} \nonumber \end{align} \tag{ 9 }$

where TP, FP, and FN are the number of true positive (correctly detected fetal QRS complexes), the number of false positive (wrongly detected fetal QRS complexes) and the number of false negative (missed detected fetal QRS complexes) detections, respectively. Precision represents how good the algorithm is at detecting true fetal QRS complexes out of all the detections it makes. Recall represents how good an algorithm is at finding the true fetal QRS complexes. F-measure is also called F₁ measure and it is an harmonic mean that represents the equal weights of precision and recall. The performance of the classification processes is computed in terms of accuracy:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm Accuracy = \frac {TP +TN}{TP + TN + FP +FN}}. \nonumber \end{align} \tag{ 10 }$

3.2. Comparison of CNN with other classifiers

To confirm the effectiveness of CNN on fetal QRS complex detections based only on NI-FECG fingerprinting without canceling MECG signals, we compare our approach with three other state-of-the-art classifiers, namely KNN, NB and SVM on testing set.

In our study, experiments with KNN, NB and SVM classifiers are performed on the MATLAB platform. NB is used with default parameters. A Euclidean metric is used for distance calculation in the KNN classifier. For the SVM classifier, we adopt a Gaussian radial basis function as the kernel function. The parameters used in KNN, NB and SVM classifiers are all optimized on the validation set.

Figure 4 and table 1 demonstrate the experiment results for the fetal QRS complex detections of CNN versus the KNN, NB and SVM classifiers.

Table 1. Accuracies of four classifiers. The best result is highlighted in bold.

Methods	KNN	NB	SVM	CNN
Accuracies (%)	68.76	55.52	70.65	77.38

We develop an algorithm that exceeds the classification performance of three other state-of-the-art classifiers in detecting fetal QRS complexes from raw data of NI-FECG signals. Considering precision, recall, F₁ measure and accuracy, CNN outperforms KNN, NB and SVM classifiers, demonstrating that the CNN network has better classification capability to effectively detect the fetal QRS complexes when compared with the three other classifiers.

4. Discussion

4.1. Signal quality assessment

In this study, the correlation between classification performance and signal quality assessment step is investigated. We use SampEn to assess the signal quality of the NI-FECG. So some channels with bad quality are excluded in the experiment. To test the effectiveness of signal quality assessment, experiments with/without signal quality assessment are employed in this study, and the results of with/without signal quality assessment are demonstrated in table 2.

Table 2. Accuracies of CNN with/without signal quality assessment. The best result is highlighted in bold.

	With signal quality assessment	Without signal quality assessment
Accuracies (%)	77.38	76.52

Table 2 gives a comparison of the performance of two processes (with/without signal quality assessment) in accuracy. It is noted that better classification performance of the CNN classifier is obtained by implementing the SampEn based signal quality assessment procedure for the real AECG signals. In addition, it should be also noted in the equation (1) that the SampEn values mainly rely on the parameter setting of threshold r. In this study, only a small part of the channels with poor signal quality is excluded by setting the threshold to 1.5. In addition, some improved entropy methods are proposed with better algorithm stability and consistency (Liu et al 2013). With a combination of fuzzy theory and entropy, these new methods provide a new insight into assessing the signal quality of the NI-FECG and it is worth evaluating their effectiveness for fetal QRS complex detections in the future.

4.2. Removing baseline and power line interference

The NI-FECG is inevitably contaminated by baseline and power-line interference. In this fetal QRS complex detection task, we further test the effectiveness of removing baseline and power line interference. Firstly, a notch filter is used to remove power-line interference. Secondly, we implement a band-pass Butterworth filter between 0.5 Hz and 100 Hz to remove baseline wander interference. The band-pass Butterworth filter is cascaded by a second-order low-pass and a second-order high-pass filter, and we also implement zero-phase digital filtering to minimize start-up and ending transients in the NI-FECG signals.

The classification performance of classifiers with or without removing baseline and power line interference are tested in this study, and the results are demonstrated in figure 5

Figure 5 gives a comparison of the two processes (with/without removing baseline and power line interference) in accuracy. Here, we can note that removing the baseline and power line interference slightly improves the performance of recognition in the CNN classifier, and very similar results are obtained in KNN and SVM classifiers. The fetal QRS complex detection task in this study can be considered as a kind of morphological analysis. The morphology of fetal ECG is enhanced by removing baseline and power line interference. In addition, we should perform the morphological analysis of abdominal signals in a different way, which depends on the established goal. For example, to detect the fetal QRS, filters should be projected to enhance this complex and to reduce other waves such as T.

4.3. Activation functions

The features of morphology in NI-FECG signals are captured step by step in the CNN, and the performance of the CNN network is affected by activation functions.

In this study, we further test the effectiveness of activation functions on the fetal QRS complex detection task. Three kinds of activation functions including sigmoid, hyperbolic tangent (Tanh), and rectified linear unit (Relu) are tested in this study. The sigmoid activation function is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle f_s(v) = \frac{1}{1 + \exp(-v)} \nonumber \end{align} \tag{ 11 }$

and the Tanh activation function is represented by

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle f_t(v) = \frac{{\rm exp}(v) - {\rm exp}(-v)}{{\rm exp}(v) + {\rm exp}(-v)}. \nonumber \end{align} \tag{ 12 }$

The results of different activation functions are demonstrated in table 3. Here, we can note that ReLu outperforms the other two activation functions in this study. So the ReLu is highly recommended on this particular task.

Table 3. A comparison of the performance of three activation functions in accuracy. The best result is highlighted in bold.

	Sigmoid	Tanh	ReLu
Accuracies (%)	52.37	71.92	77.38

5. Conclusion

This study proposes a deep learning model for detecting the fetal QRS complexes based only on NI-FECG fingerprinting without canceling MECG signals. The accuracy of the classification processes and three evaluation metrics including precision, recall and F₁ measure are used for comparison with three other state-of-the-art classifiers, namely KNN, NB and SVM, and results show that the CNN exceeds all the other three classifiers in precision, recall, F₁ measure and accuracy. The fetal QRS complexes can be effectively recognized with more than 77% F₁ measure. Key to the exceeding performance is a deep convolutional network that can map a sequence of NI-FECG samples to a sequence of sample annotations along with a relatively large dataset (PCDB). In addition, the performance of several advanced deep learning algorithms are evaluated and compared in this study. To get a better classification performance, the rectified linear unit (ReLu) is highly recommended for this particular task. Future work should investigate the potential of the proposed method on different noise levels by using synthetic data.

Acknowledgments

This work was supported by the National Natural Science Foundation of the People's Republic of China under Grant Nos. 61772574 and 61375080, the Key Program of Natural Science Foundation of Guangdong, China under Grant No. 2015A030311049.

A deep learning approach for fetal QRS complex detection

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction