1 Introduction

In this paper, we present a robust method for image registration in temporal series of in-utero blood oxygenation level dependent (BOLD) MRI. BOLD MRI is a promising imaging tool for studying functional dynamics of the placenta and fetal brain [13]. It has been shown that changes in fetal and placental oxygenation levels with maternal hyperoxygenation can be used to detect and characterize placental dysfunction, and therefore hold promise for monitoring maternal and fetal well-being [4]. Investigating hemodynamics of the placenta and fetal organs necessitates robust estimation of correspondences and motion correction across different volumes in the dynamic MRI series. Temporal MRI data suffers from serious motion artifacts due to maternal respiration, unpredictable fetal movements and signal non-uniformities [5], as illustrated in Fig. 1. Our approach exploits the temporal nature of the data to achieve robust registration in this challenging setup.

Fig. 1.
figure 1

Example twin pregnancy case from the study. The same cross-section from frames \(J_1\), \(J_2\), \(J_{74}\), and \(J_{75}\) is shown. Arrows indicate areas of substantial motion of the placenta (red), fetal head (green), and fetal body (yellow). (Color figure online)

Prior work in in-utero MRI has focused on the fetal brain and demonstrated that rigid transformations capture brain motion accurately [6, 7]. The rigid model, however, fails to fully account for movement and deformation of the placenta. Recently, B-spline transformations have been employed for tracking of regions-of-interest (ROIs) in placental images by registering all volumes to a reference frame [8]. This approach ignores the temporal nature of the data and yields a substantial number of outlier volumes that fail registration due to significant motion. In this paper, we demonstrate that a temporal model of movement improves the quality of alignment.

Beyond the specific application to in-utero MRI time series, the problem of temporal alignment has been investigated in longitudinal [9, 10], cardiac [1114] and lung imaging [1418]. Longitudinal studies often involve subtle changes, and the algorithms are fine-tuned to detect small deformations [9]. Both cardiac and lung motion patterns are somewhat regular and smooth across time, and lend themselves to biomechanical modeling [12, 13, 15]. In contrast to these applications, in-utero volumetric MRI time series contain a combination of subtle non-rigid deformation of the placenta and large-scale unpredictable motion of the fetus. At the same time, consecutive frames in a series are quite close to each other in time, which is the property we exploit in our modeling.

Existing methods for temporal registration in image series can be categorized into three distinct groups. The first group applies a variant of groupwise registration to this problem. This approach relies on a group template – estimated or selected from the image set – that yields acceptable registration results for all frames [10, 14, 16, 18]. Unfortunately, large motion present in in-utero MRI makes some frames to be substantially different from the template, leading to registration failures. The second group aligns consecutive frames and concatenates resulting deformations to estimate alignment of all frames in the series [17]. In application to long image series (BOLD MRI series contain hundreds of volumes), this approach leads to substantial errors after several concatenation steps. The third approach formulates the objective function in terms of pairwise differences between consecutive frames, leading to algorithms that perform pairwise registration of consecutive frames iteratively until the entire series comes into alignment [14]. Our method is also related to filtering approaches in respiratory motion modeling [15]. Since in-utero motion is much more complex than respiratory motion, we do not attempt to explicitly model motion but rather capture it through deformations of the latent template image.

In this paper, we construct the so called filtered estimates of the deformations by making a Markov assumption on the temporal series. We derive a sequential procedure to determine the non-rigid transformation of the template to each frame in the series. This work represents a first step towards efficient and robust temporal alignment in this challenging novel application, and provides a flexible framework that can be augmented in the future with clinically relevant estimation of MRI intensity dynamics in organs of interest. We demonstrate the method on real in-utero MRI time series, and report robust improvements in alignment of placenta, fetal brains, and fetal livers.

2 HMM and Filtered Estimates

In this section, we briefly review inference in HMMs [19, 20], and introduce our notation in the context of temporal registration. We assume existence of a latent (hidden) state whose temporal dynamics is governed by a Markov structure, i.e., the future state depends on the history only through the current state. In our application, we assume that template I deforms at each time point to describe the anatomical arrangement at that time. Deformation \(\varphi _n\) defines the latent state at time \(n\in \{1, ..., N\}\), where N is the number of images in the series. The observed image \(J_n\) at time n is generated by applying \(\varphi _n\) to template I independently of all other time points.

We aim to estimate the latent variables \(\{\varphi _n\}\) from observations \(\{J_n\}\). Formally, we construct and then maximize posterior distribution \(p(\varphi _n|{J_{1:{n}}}; I)\), where we use \(J_{k:m}\) to denote sub-series \(\{J_k, J_{k+1}, ..., J_m\}\) of the volumetric time series \(\{J_1, J_2, ..., J_N\}\). This distribution, referred to as a filtered estimate of the state, can be efficiently constructed using forward message passing [19, 20], also known as sequential estimation. The message \(m_{{(n-1)}\rightarrow {(n)}}(\varphi _n)\) from node \(n-1\) to node n is determined through forward recursion that integrates a previous message \(m_{(n-2)\rightarrow {(n-1)}}(\varphi _{n-1})\) with the data likelihood \(p(J_n|\varphi _{n}; I)\) and dynamics \(p({\varphi _n}|\varphi _{n-1})\):

$$\begin{aligned} m_{{(n-1)}\rightarrow {(n)}}(\varphi _n)&\triangleq p({\varphi _n}|J_{1:n}; I) \propto p(J_n|\varphi _{n}; I) \, p({\varphi _n}|J_{1:{n-1}}; I) \end{aligned}$$
(1)
$$\begin{aligned}&= p(J_n|\varphi _{n}; I) \!\! \int \limits _{{{\varphi }_{{n-1}}}} \!\!\! p({\varphi _n}|\varphi _{n-1}) \, m_{(n-2)\rightarrow {(n-1)}}(\varphi _{n-1}) \, d{\varphi _{n-1}}, \end{aligned}$$
(2)

where \(m_{0\rightarrow 1}(\varphi _1) = p(\varphi _1)\) and \(n=\{1, ..., N\}\). The forward pass produces the posterior distribution \(p(\varphi _n|J_{1:n}; I)\) for each time point n in the number of steps that is linear with n. Similarly, efficient backward pass enables computation of posterior distribution \(p(\varphi _n|J_{1:N}; I)\) based on all data, often referred to as smoothing. In this paper, we investigate advantages of the temporal model in the context of filtering and leave the development of a smoothing algorithm for future work.

3 Modeling Temporal Deformations Using HMM

The likelihood term \(p(J_n|\varphi _n; I)\) in Eq. (2) is determined by the model of image noise:

$$\begin{aligned} p(J_n|\varphi _n; I) \propto \exp \left( -\text {Dist}\left( J_n, I \left( \varphi ^{-1}_n\right) \right) \right) , \end{aligned}$$
(3)

where \(\text {Dist}(\cdot , \cdot )\) is a measure of dissimilarity between images. The transition probability \(p({\varphi _n}|\varphi _{n-1})\) encourages temporal and spatial smoothness:

$$\begin{aligned} p({\varphi _n}|\varphi _{n-1}) \propto \exp \left( -\lambda _1 \text {Reg}{(\varphi _n)}- \lambda _2 {||\varphi _n\circ \varphi _{n-1}^{-1} ||}^2\right) , \end{aligned}$$
(4)

where \(\text {Reg}{(\cdot )}\) is the regularization term that encourages spatial smoothness of the deformation, \(||\cdot ||\) is the appropriate norm that encourages \(\varphi _n\) to be close to \(\varphi _{n-1}\), and \(\lambda _1\) and \(\lambda _2\) are regularization parameters.

Since the integration over all possible deformation fields is intractable, we resort to a commonly used approximation of evaluating the point estimate that maximizes the integrand. In particular, if \(\varphi _{n-1}^{*}\) is the best deformation estimated by the method for time point \(n-1\), the message passing can be viewed as passing the optimal deformation \(\varphi _{n-1}^{*}\) to node n:

$$\begin{aligned} m_{({n-1})\rightarrow ({n})}(\varphi _n)&\propto p(J_n|\varphi _{n}; I) \!\!\int \limits _{\varphi _{n-1}} \!\!\! p({\varphi _n}|\varphi _{n-1}) \, \mathbbm {1}\{\varphi _{n-1}=\varphi _{n-1}^{*}\} \, d{\varphi _{n-1}} \end{aligned}$$
(5)
$$\begin{aligned}&= p(J_{n}|\varphi _{n}; I) \, p({\varphi _{n}}|\varphi _{n-1}^{*}), \end{aligned}$$
(6)

and the estimate for time point n is recursively estimated as

$$\begin{aligned} \varphi _{n}^{*}=\arg \max _{\varphi _{n}}\ p(J_{n}|\varphi _{n}; I) \, p({\varphi _{n}}|\varphi _{n-1}^{*}). \end{aligned}$$
(7)

This estimate is then used to determine \(\varphi _{n+1}^{*}\), and so on until we reach the end of the series.

4 Implementation

In this work, we choose the first image \(J_1\) as the reference template I and focus on exploring the advantages of the Markov structure. The model can be readily augmented to include a model of a bias field and a latent reference template that is estimated jointly with the deformations, similar to prior work in groupwise registration [10, 14, 16, 18]. We manipulate Eq. (7) to obtain

$$\begin{aligned} \varphi _{n}^{*}&= \arg \max _{\varphi _{n}}\ p(J_n|\varphi _n; I) \, p(\varphi _n|\varphi _{n-1}^{*}) \end{aligned}$$
(8)
$$\begin{aligned}&= \arg \min _{\varphi _{n}}\ \text {Dist}\left( J_n, I\left( \varphi ^{-1}_n\right) \right) + \lambda _1 \text {Reg}{(\varphi _n)}+ \lambda _2 {||\varphi _n\circ {(\varphi _{n-1}^{*})}^{-1} ||}^2, \end{aligned}$$
(9)

and observe that this optimization problem reduces to pairwise image registration of the template I and the observed image \(J_n\). The algorithm proceeds as follow. Given the estimate \(\varphi _{n-1}^{*}\) of the template deformation to represent image \(J_{n-1}\), we apply the registration algorithm to I and \(J_n\) while using \(\varphi _{n-1}^{*}\) as an initialization, resulting in the estimate \(\varphi _{n}^{*}\).

We implemented our method using symmetric diffeomorphic registration with cross-correlation [21]. Diffeomorphic registration ensures that the estimated deformation is differentiable bijective with differentiable inverse. We employ cross-correlation to define the measure of image dissimilarity \(\text {Dist}(\cdot , \cdot )\), because cross-correlation adapts naturally to the image data with signal non-uniformities. We set the size of the local window for computing cross-correlation to be 5 voxels. We use the state-of-the-art implementation provided in the ANTS software package [21]. Following the common practice in the field, ANTS implements spatial regularization via Gaussian smoothing.

5 Experiments and Results

Data. Ten pregnant women were consented and scanned on a 3T Skyra Siemens scanner (single-shot GRE-EPI, \(3 \times 3 {\text {mm}}^2\) in-plane resolution, \(3 \text {mm}\) slice thickness, interleaved slice acquisition, TR \(=5.8-8 \text {s}\), TE \(=32-36 \text {ms}\), FA \(=90^o\)) using 18-channel body and 12-channel spine receive arrays. Each series contains around 300 volumes. To eliminate the effects of slice interleaving, we resampled odd and even slices of each volume onto a common isotropic 3 mm\(^3\) image grid. This study included three singleton pregnancies, six twin pregnancies, and one triplet pregnancy, between 28 and 37 weeks of gestational age. A hyperoxia task paradigm was used during the scans, comprising three consecutive ten-minute episodes of initial normoxic episode (\(21\,\% \text {O}_2\)), hyperoxic episode (\(100\,\% \text {O}_2\)), and a final normoxic episode (\(21\,\% \text {O}_2\)). To enable quantitative evaluation, we manually delineated the placentae (total of 10), fetal brains (total of 18), and fetal livers (total of 18), in the reference template \(I=J_1\) and in five additional randomly chosen volumes in each series.

Experiments. To evaluate the advantages of the temporal model, we compare it to a variant of our algorithm that does not assume temporal structure and instead aligns each image in the series to the reference frame using the same registration algorithm used by our method. Algorithmically, this corresponds to setting \(\lambda _2\) in Eq. (9) to be 0, and initializing the registration step with an identity transformation instead of the previously estimated transformation \(\varphi _{n-1}^*\). To quantify the accuracy of the alignment, we transform the manual segmentations in the reference template to the five segmented frames in each series using the estimated deformations. We employed Dice coefficient [22] to quantify volume overlap between the transferred and the manual segmentations. In our application, the goal is to study average temporal signals for each ROI, and therefore delineation of an ROI provides an appropriate evaluation target.

Fig. 2.
figure 2

Two example cases from the study. For each case, we display the reference frame \(J_1\) with manual segmentations, the reference frame \(J_1(\varphi _{75}^{-1})\) transformed into the coordinate system of frame \(J_{75}\), frame \(J_{75}\) with manual segmentations, and frame \(J_{75}\) with segmentations transferred from the reference frame \(J_1\) via \(\varphi _{75}\). Both cases are twin pregnancies. Segmentations of the placentae (pink), fetal brains (green), and fetal livers (yellow) are shown. Two-dimensional cross-sections are used for visualization purposes only; all computations are performed in 3D. (Color figure online)

Fig. 3.
figure 3

Volume overlap between transferred and manual segmentations: (a) placentae (b) fetal brains, and (c) fetal livers. The cases in the study are reported in the increasing order of placental volume overlap for our method. Duplicate case numbers correspond to twin and triplet pregnancies. Statistics are reported for our method (red), pairwise registration to the template frame (green), and no alignment (blue). (Color figure online)

Experimental Results. Fig. 2 illustrates results for two example cases from the study. We observe that the reference frame was warped accurately by the algorithm to represent a frame in the series that is substantially different in the regions of the placenta and the fetal liver. The delineations achieved by transferring manual segmentations from the reference frame to the coordinate system of the current frame (\(J_{75}\) in the figure) are in good alignment with the manual segmentations for the current frame. Figure 3 reports volume overlap statistics for the placentae, fetal brains, and fetal livers, for each case in the study. We observe that temporal alignment improves volume overlap in important ROIs and offers consistent improvement for all cases over pairwise registration to the reference frame. We also note that temporal alignment offers particularly substantial gains in cases with a lot of motion, i.e., low original volume overlap.

6 Conclusions

We presented a HMM-based registration method to align images in in-utero volumetric MRI time series. Forward message passing incorporates the temporal model of motion into the estimation procedure. The filtered estimates are therefore based on not only the present volume frame and the template, but also on the previous frames in the series. The experimental results demonstrate the promise of our approach in a novel, challenging application of in-utero BOLD MRI analysis. Future work will focus on obtaining robust estimates of the MRI signal time courses by augmenting the method with a backward pass and a model of ROI-specific intensity changes.