1 Introduction

The term intrauterine growth restriction (IUGR) describes a decrease in fetal growth rate that prevents a fetus from obtaining his or her complete growth potential. The incidence of IUGR is estimated to be approximately 5–7% [5]. A large number of etiologies are not yet identified and the known associations involve fetal, placental and/or maternal factors. The fetus is at risk of hypoxia and this condition is often associated with increased perinatal mortality and morbidity [13]. The recent imaging ultrasound technology permits to assess with high resolution the fetal biometry (i.e., abdominal and cranial circumferences, femur length, estimation of gestational age, fetal weight, etc.). The comparison with population standards can thus identify the small-for-gestational-age (SGA) fetuses, characterized by biometric dimensions <10th percentile. Unfortunately, this group includes healthy fetuses of small size as well. A crucial problem in fetal monitoring is therefore to decide if the small dimensions are physiological or due to a pathological condition [20]. Dating accurately the fetal growth, early in pregnancy, is essential for a diagnosis of IUGR. The interpretation of clinical data, although supported by new and advanced technologies, is often very difficult in the very preterm period and evidence-based guidelines do not exist [1]. However, the early identification of a pathological state is fundamental to predict possible complications and to take appropriate decisions.

Since there are no effective therapies to reverse fetal growth restriction, perinatal management is aimed primarily at determining the ideal timing and mode of delivery [19]. This decision depends on several variables: gestational age of the fetus, maternal health, severity of the IUGR and fetal condition. On one side, anticipating the delivery time and removing the fetus from a suboptimal environment can prevent risk of significant morbidities. On the other hand, the risk of intrauterine compromise has to be weighted against potential risks from iatrogenic premature delivery, which are typically higher before the 32nd–34th gestational week. Beginning at the gestational age of 27 weeks, the probability of survival is increased by 1% for every further day of permanence of the fetus in uterus. Thus, it is crucial to accurately determine the real condition of the fetus in order to avoid a cesarean section if not strictly necessary.

The main goal of this work is to propose a new method to identify the IUGRs in the antepartum period on the basis of FHR variability analysis. To this purpose, we considered FHR time series recorded in the early periods of gestation through a cardiotocograph. The cardiotocography (CTG) is the most common antepartum monitoring technique, which permits detecting fetal heartbeats by Doppler ultrasounds and autocorrelation method [26].

As reported in several review studies, the indices commonly adopted in CTG analysis are poorly suitable for a clinical directive [11, 15]. Thus, more advanced analysis techniques, already used for HRV analysis in adults, were adopted and an attempt of a multiparametric approach was performed. Our analyses included the computation of Lempel Ziv complexity (LZC) index and the multiscale entropy (MSE) estimation [8]. In addition, a preliminary k-mean cluster analysis was carried out to show the separability of IUGRs from physiological fetuses on the basis of the computed indices.

2 Methods

2.1 Data collection

We analyzed FHR signals belonging to fetuses, whose gestational age ranged from the 27th to the 34th week of gestation at the recording time.

A Hewlett Packard M1351A CTG fetal monitor was used to collect FHR tracings. This system is based on ultrasound echo-Doppler technology to detect fetal heartbeats, and includes an autocorrelation technique to compare the demodulated Doppler signal of a heartbeat with the next one. A peak detection firmware, embedded in the CTG monitor, then determines the heart period (the equivalent of RR period) from the autocorrelation function. Through a peak position interpolation algorithm, the effective resolution is better than 2 ms [26]. The resulting heart period is then converted into a heart frequency as soon as a new heart event is detected and accepted. Due to historical reasons, almost all commercially available fetal CTG monitors display only the fetal heart rate expressed in the number of beats per minute (bpm) and do not offer the series of interbeat intervals, usually employed in HRV analysis. The HP 1351A produces a new FHR value in bpm every 250 ms and stores it in a digital buffer. In the commercially available system, a computer reads through a serial interface (RS-422) ten consecutive values of the buffer every 2.5 s and determines the actual FHR as the average of the ten values (equivalent to a sampling frequency of 0.4 Hz). We modified the software in order to read the FHR from the CTG device every 0.5 s without averaging. This allows increasing the “equivalent” FHR sampling frequency up to 2 Hz.

The signals were selected from a database of recordings collected during periodic ambulatory checkups in a university hospital in Rome. The recording length was at least 40 min. In order to set homogenous groups without spurious data, we excluded the recordings belonging to gestations complicated by other pathologies such as diabetes, hypertension, etc. The fetuses were selected by considering three groups: normal, severe IUGR and not severe IUGR. The normal group includes 17 fetuses without pathologies, delivered at term by spontaneous labor and having a good Apgar score. The severe IUGR group comprises 23 small fetuses, prematurely delivered by a cesarean section (before the 34th gestational week) because of the appearance of life-threatening conditions. The not severe IUGR group includes 19 small fetuses delivered after the 34th gestational week and classified as IUGR at delivery by clinicians. For this group, we supposed that they were only small-for-gestational-age fetuses. The description of the subjects is reported in Table 1. The severe IUGR fetuses were prematurely delivered (before the 34th gestational week); therefore, for all fetuses, we limited the analysis to the recordings performed at the gestational ages ranging from the 27th to the 34th week. This choice was motivated by the fact that CTG traces belonging to very different gestational period are not comparable, as the patterns of HRV signal undergo physiological changes during gestation, according with increasing fetal parasympathetic nervous system development. Moreover, the choice of this gestational time window was supported by the fact that the potential risks associated with iatrogenic premature delivery are typically higher before the 32nd–34th gestational week [20].

Table 1 Details about data collection used in the analysis

Signal loss and signal subsets with insufficient quality have been excluded from the analysis, as already reported in [26].

2.2 Multiscale entropy

The approximate entropy is an index introduced by Pincus SM [22] that measures, with a tolerance r, the regularity of patterns comparing them to a given pattern of length m (m and r are fixed values; m is the detail level at which the signal is analyzed and r is a threshold that filters out irregularities).

Recently, Richman and Moorman developed a modification of this algorithm in order to remove what they considered the defects of ApEn: the name of this new statistic is sample entropy (SampEn) [17, 24].

In order to capture signal fluctuations on different time scales, a new method was introduced with the goal of computing the entropy at different degrees of resolution, i.e., in a multiscale manner [8, 9]. The idea is that the signal may contain information at different time scales, and therefore methods working only on a single scale are unsuitable. The procedure to compute the multiscale entropy (MSE) consists of setting up of consecutive coarse-grained time series y (τ), as a function of the factor τ:

$$ y^{(\tau)}_j=\sum_{i=(j-1)\tau+1}^{j\tau}x_i, \quad 1 \le j \le N/\tau $$
(1)

where y(1) is the original time series, while the length of each coarse-grained time series is equal to N/τ. For each sequence y (τ), an entropy measure is then calculated and it is plotted as function of the scale factor τ. The rationale beyond this procedure is an enhancement of time series repetitive patterns as a function of different scales. The distribution of the MSE values at various time scales could help understand the time series in terms of regularity and structure (i.e., short versus long range). In fact, as it was obtained in previous works [10, 12], different MSE trends can be associated with different dynamical systems and/or different pathological conditions.

Moreover, as previous results show [10], not only the singular entropy values can be a pathology marker but also their distribution along the different scale factors. In fact, the trend of entropy values along τ can provide important hints on the signal structure. For this reason, in this work we decided to interpolate the single MSE values at different τ and to consider the slope of the resulting curve as an additional index.

In particular, we selected from each FHR signal a 5000 point length subset by removing the first minute of recording, which corresponds to about 40 min of recording. The parameters adopted for the computation of ApEn and SampEn were: m = 1 and r = 0.1, m = 2 and r = 0.15 and 0.2 and the scale factor τ ranged from 1 to 15 (the parameter r is a percentage of the standard deviation SD of the original time series). The slope α of the curve interpolating MSE values corresponds to the first coefficient of a linear equation that fits the entropy values in a least squares sense. Different ranges of scale factors were considered.

2.3 Lempel Ziv complexity analysis

The measure of complexity introduced by Lempel and Ziv assess the so-called algorithmic complexity, which is defined according to the Information Theory as the minimum quantity of information needed to define a binary string [18]. For a sequence of binary values dynamically presented to a device, the LZC quantifies the rate of new patterns arising with the time evolution of the incoming string. A detailed description of the algorithm can be found in [11, 30]. When dealing with strings different in length, it was suggested to adopt a normalized LZC measure [16], i.e., the measure of complexity normalized by a factor depending on the sequence length.

A crucial point in the application of the LZC analysis to a biological time series is the process needed to transform the signals into a symbolic sequence of a finite alphabet. One can consider several methods to convert a real values time series into a symbolic string (e.g., by setting thresholds). Nonetheless, LZC analysis has found large application in biosignals analysis, such as event detection (epileptic seizure [23], onset of ventricular tachycardia or fibrillation [31] and modifications from sleep to awake state in-depth anesthesia [30]), and also in characterizing neural spike trains [27] or in studying DNA sequences [21]. Those attempts to apply the LZC method in event detection did not produce any improvement with respect to the application of the entropy estimators (ApEn, SampEn at single scale). The explanation can be found in the coding procedure because they mostly adopted methods based on moving thresholds to code signal values. This approach reproduces exactly what the entropy estimators do and does not take into account changes in signal slope.

The coding criteria based on a moving average threshold adopt the following rules. For a given time series {x n }, the signal average is computed on a time window of N samples and then the current sample x n is encoded with 1 if x n  > avg, with 0 if x n  ≤ avg. In case of a ternary coding (alphabet composed of three symbols), the minimum and maximum values of the signal are estimated on a time window of N samples and then the range [min, max] is divided in order to obtain three levels of the same amplitude A. The current sample is then encoded as follows:

$$ Y_{n+1} = \left\{\begin{array}{ll} 0, & \hbox{ if } x_n \le {\rm min}+A\\ 1, & \hbox{ if } {\rm min}+A < x_n \le {\rm max}-A\\ 2, & \hbox{ if } x_n > {\rm max}-A \end{array}\right. $$
(2)

Our work proposes an encoding approach based on the change in the direction of the signal (sign of the slope). As the biological systems are often driven by nonlinear mechanisms and they are intrinsically noisy systems, the simple increase or decrease of the signal was adopted as the coding criterion, as suggested in [29]. Therefore, for a given time series {x n }, the most straightforward procedure is to assign 1 to an increase in the the signal (x n+1 > x n ) and 0 to a decrease (x n+1 ≤ x n ). In case of a ternary coding, the equivalent procedure should be to denote with 1 an increase in the the signal (x n+1 > x n ), with 0 a decrease (x n+1 < x n ) and with 2 a stationary state (x n+1 = x n ).

Unfortunately, this procedure could produce a dependence of the encoded string on the level of quantization by which the signal was obtained. For this reason, a factor p was introduced, which represents a minimum quantization level for a symbol change in the coded string. The introduction of this parameter limits the effect of additive noise as well. Then, given a signal {x n }, the encoding rule adopted for the binary alphabet is the following:

$$ Y_{n+1} = \left\{\begin{array}{ll} 0, & \hbox{ if } x_{n+1}\le x_n+p\cdot x_n\\ 1, & \hbox{ if } x_{n+1} > x_n+p\cdot x_n \end{array}\right. $$
(3)

The rule for the ternary alphabet is instead:

$$ Y_{n+1} = \left\{\begin{array}{ll} 0, & \hbox{ if } x_{n+1} < x_n-p\cdot x_n\\ 1, & \hbox{ if } x_{n+1} > x_n+p\cdot x_n\\ 2, & \hbox{ if } x_n-p\cdot x_n \le x_{n+1} \le x_n+p\cdot x_n \end{array}\right. $$
(4)

The current value x n+1 is therefore classified as stationary, denoted by 2, if it lies in a p range around the previous sample. Figure 1 shows the results of the outlined encoding procedures.

Fig. 1
figure 1

Example of two different encoding procedures. The upper panel refers to the proposed coding criterion, which is simply the increase or decrease of the signal. The lower panel refers to the moving threshold approach: the current sample is encoded with 1 or 0 if it is greater or less than the signal average, respectively. Signal average is computed over a time window of 25 sample and indicated by the horizontal line

The LZC index was computed over 50% overlapping 360 point-long FHR sequences (3 min), by adopting both the new coding criterion and a moving average threshold. The evaluated index is the mean value of parameters computed on the intervals. For these analyses, we considered the encoding parameter p = 0.5, 1, 2 and 0% as well. The moving average threshold was computed on time windows of length N = 360, i.e., the length of the subsequence.

2.4 Frequency domain parameters

The HRV signals were evaluated by a spectral analysis as well. The power spectral density (PSD) estimation was carried out on the RR sequences obtained by the CTG signal through RR(i) = 60,000/ctg(i) (where ctg(i) gives the instantaneous frequency in beats per minute). The computation was performed on short subsequences (3 min) and the considered indices were the mean value of parameters computed on the intervals [11, 26]. Both the spectral power and the normalized spectral power at different frequency bands were estimated. The normalized spectral components represent the relative value of each power component in proportion to the total power minus the very low frequency (VLF) component.

Different from what one can observe in a PSD of an adult subject [28], in the FHR spectrum we can identify four contributions: the very low frequency (VLF: 0–0.03 Hz-eq) is related to long period and nonlinear contributions, the low frequency (LF: 0.03–0.15 Hz-eq) is mainly correlated with neural sympathetic activity, whereas high frequency (HF: 0.5-fNyquist Hz-eq) marks the presence of fetal breathing. The movement frequency (MF: 0.15–0.5 Hz-eq) is typical of the FHR spectrum. Evidence of this component in the presence of fetal movements (basically of the trunk) is also reported in [4, 25]. It depends also on maternal breathing, in fact, the analysis between the maternal and fetal heart rate signals showed a high correlation between the fetal MF component and the maternal respiratory frequency. This would confirm the complex interaction between the mechanical influence of the maternal respiratory activity and the fetal neural reflexes [6]. The LF/(HF + MF) ratio was estimated as well and it quantifies the autonomic balance between neural control mechanisms from different origin (in accordance with the LF/HF ratio normally calculated in adults [28]).

2.5 Multiparametric approach

The indices evaluation was completed by a multiparametric analysis. The adopted method was the k-mean cluster analysis, which treats each observation (measured indices) as an object having a location in space. Computationally, this method can be thought as an analysis of variance (ANOVA) “in reverse”. The algorithm starts with k random clusters, and then iteratively move objects among those clusters with the goal to (i) minimize variability within clusters and (ii) maximize variability among clusters. Each cluster in the partition is defined by its member objects and by its centroid, or center. The centroid for each cluster is the point to which the sum of distances from all objects in that cluster is minimized. The analogy to “ANOVA in reverse” is in the sense that the significance test in ANOVA evaluates the between-group variability against the within-group variability when computing the significance test for the hypothesis that the means in the groups are different from each other. In k-means clustering, the algorithm tries to move objects in and out of groups (clusters) to get the most significant ANOVA results.

3 Results

The entropy values computed for scale factors τ > 3 were able to significantly separate the severe IUGRs from both the not severe IUGRs and the normal fetuses (ANOVA test and Kruskall–Wallis test were performed, then post hoc comparisons were made by Scheffé test, p-value ANOVA <1% and p-value Scheffé test <5%). As shown in Fig. 2, the entropy values of severe IUGRs are lower than the values of the other groups, whereas the entropy values of not severe IUGRs and healthy subjects are very similar.

Fig. 2
figure 2

MSE analysis. Entropy values of MSE analysis plotted versus the scale factor τ (from 1 to 15). Upper panels refer to the values obtained by ApEn(1,0.1) and SampEn(1,0.1). Lower panels refer to the values obtained by ApEn(2,0.15) and SampEn(2,0.15)

In addition, the α slopes were able to significantly separate the three groups. In particular, the slope values associated with pathological fetuses were smaller than in the healthy condition for all the entropy estimators adopted in this study. Table 2 shows the slope values obtained with SampEn(2,0.2).

As regards LZC analysis, the proposed encoding procedure demonstrated a significant separation of the groups. (ANOVA test and Kruskall–Wallis test were performed among the three patient groups, then post hoc comparisons were made by Scheffé test, p-value ANOVA <1% and p-value Scheffé test <5%). Moreover, the discriminating ability of the ternary LZC seems to be weakly dependent on the choice of the encoding parameter p. Actually, significant results were obtained with different encoding factors: p = 0, 0.5 and 1%. On the contrary, the LZC values computed by the moving threshold encoding procedure did not produce significant results. In addition, the LZC values obtained from this procedure are much lower than those computed with the proposed coding approach, as Table 3 shows. This confirms that the index in case of thresholds encoding mainly captures the seemingly stationary oscillations of the FHR signal, but not the inherent dynamics.

Table 2 Slope values estimated on the MSE curves obtained by SampEn(2,0.2)
Table 3 LZC(k, p) values (avg ± std), where k refers to the number of symbols used and p is the encoding parameter (p = 0, 0.5, 1, 2%)

The spectral components were not sufficient to separate the severe from the not severe IUGRs. As Table 4 shows, the severe IUGRs provided significant lower values in most spectral bands (LF, MF, HF) with respect to the healthy fetuses (ANOVA test was performed among the three patients groups, p-value <5%). However, the not severe IUGRs present values whose range overlaps with the values of healthy and severe IUGRs. Actually, the post hoc comparisons shows that these indices could not separate the three groups.

Table 4 Values of power of spectral components (avg ± std)

Finally, the analysis was completed by a multiparametric approach. The method adopted was the k-mean cluster analysis and the distance measure was the squared Euclidean distance. The most interesting results were obtained by considering both the ternary LZC and the slope α1-2 assessed from MSE analysis as the intercept of scale factors τ = 1 and 2. The k-mean cluster analysis applied to these indices produced the following results: the severe IUGRs were gathered and separated from both the not severe IUGRs and normal fetuses, which were instead included in the same cluster. In particular, the best results were achieved by choosing the parameters LZC(3,0) and the α1-2 slope, computed on the multiscale SampEn(2,0.15) and on the multiscale SampEn(2,0.2). The discrimination between the two clusters has sensitivity (Se) = 72.2% and accuracy (Ac) = 80.4% in the first case, whereas Se = 77.8% and Ac = 82.4% in the second case (see Fig. 3). Notice that the same indices separately analyzed would have produced a worse performance in term of accuracy. In fact, the discrimination based on the LZC(3,0) obtains Se = 88.9% and Ac = 70.6%. The classification based on α1-2 slope, computed on the multiscale SampEn(2,0.15) and SampEn(2,0.2), provides Se = 61.1% Ac = 74.4% and Se = 77.8% Ac = 70.6%, respectively.

Fig. 3
figure 3

Results from the k-mean cluster analysis. The numbers inside the markers refer to the clusters (1 severe IUGR, 2 physiological condition). The α1-2 slope is obtained from the multiscale SampEn(2,0.2). Notice that almost all the severe IUGRs were grouped in the same cluster 1 in the left upper corner of the graph

4 Discussion and conclusion

The results show that the LZC index computed with the proposed coding procedure and the MSE could be promising indices to identify the actual IUGRs and to separate them from the not severe IUGRs and the normals.

However, one must carefully choose the encoding procedure for the computation of the LZC index. As a matter of fact, LZC values, obtained with the encoding procedure based on moving threshold, did not produce good results. The main question could be: why this procedure, commonly adopted in HRV analysis seems to provide less information than the proposed coding procedure? As highlighted in the Appendix, the moving threshold is mainly responsive to large changes in signal amplitude, so as to be less suitable for the analysis of the FHR signal, which, particularly in the early period of development, presents very small amplitude variations. On the contrary, a coding procedure based on the increase/decrease in the signal enhances information on the small oscillations of the signal.

Concerning the potential application to clinical diagnosis in antenatal medicine, our study provides interesting hints on IUGR pathological state. In fact, the IUGR condition was characterized by higher LZC values, very close to 1 (namely the theoretical value associated with a completely random string) advancing the hypothesis that the HR belonging to the severe IUGRs goes up and down more randomly than in physiological conditions. These are only preliminary results, which should be further investigated and validated in relation to different pathologies affecting the mechanisms of HR regulation.

Furthermore, the MSE analysis confirms that the entropy estimators computed on single scale are not sufficient to discriminate the pathological fetuses: actually for the scale factor 1, none of the entropy indices adopted in this study is able to separate the groups. As already demonstrated in studies dealing with HR variability in adults [10], pathological condition can affect the regularity of signal at different time scales.

The k-mean cluster analysis has shown a better discriminant ability when the LZC and the α1-2 slope are considered together, with respect to each single index. This fact reinforces the opinion that a multiparametric approach generally improves the identification of risky conditions in clinical applications. In particular, in antenatal monitoring, the indices related to HRV can reflect different developmental stages of HR regulation mechanisms and thus they can assume different weights along the gestational period. Thus, it appears extremely helpful to face the diagnostic problem in fetal medicine with multiparametric analysis.