A deep learning framework for unsupervised affine and deformable image registration

doi:10.1016/j.media.2018.11.010

Medical Image Analysis

Volume 52, February 2019, Pages 128-143

https://doi.org/10.1016/j.media.2018.11.010 Get rights and content

Highlights

•
Unsupervised Deep Learning Image Registration (DLIR) is feasible for affine and deformable image registration.
•
The method is unsupervised; no registration examples are necessary to train a ConvNet for image registration.
•
Once a ConvNet is trained, image registration can be performed on unseen images in one-shot.
•
Registration, including image resampling, is near real-time.
•
Unsupervised DLIR yields image registration performance similar to a conventional approach.

Abstract

Image registration, the process of aligning two or more images, is the core technique of many (semi-)automatic medical image analysis tasks. Recent studies have shown that deep learning methods, notably convolutional neural networks (ConvNets), can be used for image registration. Thus far training of ConvNets for registration was supervised using predefined example registrations. However, obtaining example registrations is not trivial. To circumvent the need for predefined examples, and thereby to increase convenience of training ConvNets for image registration, we propose the Deep Learning Image Registration (DLIR) framework for unsupervised affine and deformable image registration. In the DLIR framework ConvNets are trained for image registration by exploiting image similarity analogous to conventional intensity-based image registration. After a ConvNet has been trained with the DLIR framework, it can be used to register pairs of unseen images in one shot. We propose flexible ConvNets designs for affine image registration and for deformable image registration. By stacking multiple of these ConvNets into a larger architecture, we are able to perform coarse-to-fine image registration. We show for registration of cardiac cine MRI and registration of chest CT that performance of the DLIR framework is comparable to conventional image registration while being several orders of magnitude faster.

Graphical abstract

Introduction

Image registration is the process of aligning two or more images. It is a well-established technique in (semi-)automatic medical image analysis that is used to transfer information between images. Commonly used image registration approaches include intensity-based methods, and feature-based methods that use handcrafted image features (Sotiras, Davatzikos, Paragios, 2013, Viergever, Maintz, Klein, Murphy, Staring, Pluim, 2016). Since recently, supervised and unsupervised deep learning techniques have been successfully employed for image registration (Jaderberg, Simonyan, Zisserman, Kavukcuoglu, 2015, Wu, Kim, Wang, Munsell, Shen, 2016, Miao, Wang, Liao, 2016, Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Yang, Kwitt, Styner, Niethammer, 2017, de Vos, Berendsen, Viergever, Staring, Išgum, 2017, Eppenhof, Lafarge, Moeskops, Veta, Pluim, 2018).

Deep learning techniques are well suited for image registration, because they automatically learn to aggregate the information of various complexities in images that are relevant for the task at hand. Additionally, the use of deep learning techniques potentially yields high robustness, because local optima may be of lesser concern in deep learning methods, i.e. zero gradients are often (if not always) at saddle points (Dauphin et al., 2014). Moreover, deep learning methods like convolutional neural networks are highly parallelizable which makes implementation and execution on GPUs straight-forward and fast. As a consequence deep learning enhanced registration methods are exceptionally fast making them interesting for time-critical applications; e.g. for emerging image guided therapies like High Intensity Focused Ultrasound (HIFU), the MRI Linear Accelerator (MR-linac), and MRI-guided proton therapy.

Although not explicitly introduced as a method for image registration, the spatial transformer network (STN) proposed by Jaderberg et al. (2015) was one of the first methods that exploited deep learning for image alignment. The STN is designed as part of a neural network for classification. Its task is to spatially transform input images such that the classification task is simplified. Transformations might be performed using a global transformation model or a thin plate spline model. In the application of an STN, image registration is an implicit result; image alignment is not guaranteed and only performed when beneficial for the classification task at hand. STNs have been shown to aid classification of photographs of traffic signs, house numbers, and handwritten digits, but to the best of our knowledge they have not yet been used to aid classification of medical images.

In other studies deep learning methods were explicitly trained for image registration (Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Miao, Wang, Liao, 2016, Yang, Kwitt, Styner, Niethammer, 2017, Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Hu, Modat, Gibson, Ghavami, Bonmati, Moore, Emberton, Noble, Barratt, Vercauteren, 2018, Hu, Modat, Gibson, Li, Ghavami, Bonmati, Wang, Bandula, Moore, Emberton, Ourselin, Noble, Barratt, Vercauteren, 2018). For example, convolutional neural networks (ConvNets) were trained with reinforcement learning to be agents that predicted small steps of transformations toward optimal alignment. Liao et al. (2017) applied these agents for affine registration of intra-patient cone-beam CT (CBCT) to CT and Krebs et al. (2017) applied agents for deformable image registration of inter-patient prostate MRI. Like intensity-based registration, image registration with agents is iterative. However, ConvNets can also be used to register images in one shot. For example, Miao et al. (2016) used a ConvNet to predict parameters in one shot for rigid registration of 2D CBCT to CT volumes. Similarly, ConvNets have been used to predict parameters of a thin plate spline model. Cao et al. (2017) used thin plate splines for deformable registration of brain MRI scans and Eppenhof et al. (2018) used thin plate splines for deformable registration of chest CT scans. Furthermore, in the work of Sokooti et al. (2017) it has been demonstrated that a ConvNet can be used to predict a dense displacement vector field (DVF) directly, without constraining it to a transformation model. Similarly, Yang et al. (2017) used a ConvNet to predict the momentum for registration with large deformation diffeomorphic metric mapping (Beg et al., 2005). Recently, Hu et al. (2018a) presented a method that employs segmentations to train ConvNets for global and local image registration. In this method a ConvNet takes fixed and moving image pairs as its inputs and it learns to align the segmentations. This was demonstrated on global and deformable registration of ultrasound and MR images using prostate segmentation.

While the aforementioned deep learning-based registration methods show accurate registration performance, the methods are all supervised, i.e. they rely on example registrations for training or require manual segmentations, unlike conventional image registration methods that are typically unsupervised. Training examples for registration have been generated by synthesizing transformation parameters for affine image registration (Miao et al., 2016) and deformable image registration (Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Eppenhof, Lafarge, Moeskops, Veta, Pluim, 2018), or require manual annotations (Hu, Modat, Gibson, Ghavami, Bonmati, Moore, Emberton, Noble, Barratt, Vercauteren, 2018, Hu, Modat, Gibson, Li, Ghavami, Bonmati, Wang, Bandula, Moore, Emberton, Ourselin, Noble, Barratt, Vercauteren, 2018). However, generating synthetic data may not be trivial as it is problem specific. In contrast to supervised methods, training examples can be be obtained by using conventional image registration methods (Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Yang, Kwitt, Styner, Niethammer, 2017). Alternatively, unsupervised deep learning methods could be employed. Wu et al. (2016) exploited unsupervised deep learning by employing a convolutional stacked auto-encoder (CAE) that extracted features from fixed and moving images. It improved registration with Demons (Vercauteren et al., 2009) and HAMMER (Shen and Davatzikos, 2002) on three different brain MRI datasets. However, while the CAE is unsupervised, the extracted features are optimized for image reconstruction and not for image registration. Thus, there is no guarantee that the extracted features are optimal for the specific image registration task.

Unsupervised deep learning has been used to estimate optical flow (Yu, Harley, Derpanis, 2016, Dosovitskiy, Fischer, Ilg, Hausser, Hazirbas, Golkov, van der Smagt, Cremers, Brox, 2015, Ilg, Mayer, Saikia, Keuper, Dosovitskiy, Brox, 2017) or to estimate depth (Garg et al., 2016) in video sequences. Such methods are related to medical image registration, but typically address different problems. They focus on deformations among frames in video sequences. These video sequences are in 2D, contain relatively low levels of noise, have high contrast due to RGB information, and have relatively small deformations between adjacent frames. In contrast, medical images are often 3D, may contain large amounts of noise, may have relatively low contrast and aligning them typically requires larger deformations.

We propose a Deep Learning Image Registration (DLIR) framework: an unsupervised technique to train ConvNets for medical image registration tasks. In the DLIR framework, a ConvNet is trained for image registration by exploiting image similarity between fixed and moving image pairs, thereby circumventing the need for registration examples. The DLIR framework bears similarity with a conventional iterative image registration framework, as shown in Fig. 1. However, in contrast to conventional image registration, the transformation parameters are not directly optimized, but indirectly, by optimizing the ConvNet’s parameters. In the DLIR framework the task of a ConvNet is to learn to predict transformation parameters by analyzing fixed and moving image pairs. The predicted transformation parameters are used to make a dense displacement vector field (DVF). The DVF is used to resample to moving image into a warped image that mimics the fixed image. During training, the ConvNet learns the underlying patterns of image registration by optimizing image similarity between the fixed and warped moving images. Once a ConvNet is trained, it has learned the image registration task and it is able to perform registration on pairs of unseen fixed and moving images in one shot, thus non-iteratively.

The current paper extends our preliminary study of unsupervised deformable image registration (de Vos et al., 2017) in several ways. First, we extend the analysis from 2D to 3D images. Second, we perform B-spline registration with transposed convolutions, which results in high registration speeds, reduces memory footprint, and allows simple implementation of B-spline registration on existing deep learning frameworks. Third, borrowed from conventional image registration where regularization often is an integral part in transformation models (Sotiras et al., 2013), we include a bending energy penalty term that encourages smooth displacements. Fourth, we present ConvNet designs for affine as well as deformable registration. Fifth, we introduce multi-stage ConvNets for registration of coarse-to-fine complexity in multiple-levels and multiple image resolutions by stacking ConvNets for affine and deformable image registration. Such a multi-stage ConvNet can perform registration tasks on fixed and moving pairs of different size, similarly to conventional iterative intensity-based registration strategies. Finally, in addition to evaluation on intra-patient registration of cardiac cine MR images, we conduct experiments on a diverse set of low-dose chest CTs for inter-patient registration, and we evaluate the method on the publicly available DIR-Lab dataset for image registration (Castillo, Castillo, Guerra, Johnson, McPhail, Garg, Guerrero, 2009, Castillo, Castillo, Martinez, Shenoy, Guerrero, 2010).

Section snippets

Method

In image registration the aim is to find a coordinate transformation T: I_F → I_M that aligns a fixed image I_F and a moving image I_M. In conventional image registration similarity between the images is optimized by minimizing a dissimilarity metric L: $\hat{μ} = \underset{μ}{\arg \min} {L (T_{μ}; I_{F}, I_{M}) + R (T_{μ})},$ where T_μ is parameterized by transformation parameters μ and R is an optional regularization term to encourage smoothness of the transformation T_μ. Several dissimilarity metrics might be used, e.g. mean squared

Data

Like most deep learning approaches, the DLIR framework requires large sets of training data. Publicly available datasets that are specifically provided to evaluate registration algorithms, contain insufficient training data for our approach. Therefore, we made use of large datasets of cardiac cine MRIs for intra-patient registration experiments, and low-dose chest CTs from the National Lung Screening Trial (NLST) for inter-patient registration experiments. We used manually delineated anatomical

Evaluation

The DLIR framework was evaluated with intra-patient as well as inter-patient registration experiments. As image folding is anatomically implausible, especially in intra-patient image registration, after registration, we evaluated the topology of obtained DVFs quantitatively. For this we determined the Jacobian determinant–also known as the Jacobian–for every point p(i, j, k) in the DVF: $d e t (J (i, j, k)) = | \begin{matrix} \frac{\partial i}{\partial x} & \frac{\partial j}{\partial x} & \frac{\partial k}{\partial x} \\ \frac{\partial i}{\partial y} & \frac{\partial j}{\partial y} & \frac{\partial k}{\partial y} \\ \frac{\partial i}{\partial z} & \frac{\partial j}{\partial z} & \frac{\partial k}{\partial z} \end{matrix} |$ A Jacobian of 1 indicates that no volume change has

DLIR framework

All ConvNets were trained with the DLIR framework using the loss function provided in Section 2.4. The ConvNets were initialized with Glorot’s uniform distribution (Glorot and Bengio, 2010) and optimized with Adam (Kingma and Ba, 2015).

Rectified linear units were used for activation in all ConvNets, except in the output layers. The output of the deformable ConvNets were unconstrained to enable prediction of negative B-spline displacement vectors. The outputs of affine ConvNets were constrained

Intra-patient registration of cardiac cine MRI

Intra-patient registration experiments were conducted using cardiac cine MRIs. The task was to register volumes (i.e. 3D images) within the 4D scans. Experiments were performed with 3-fold cross-validation. In each fold 30 images were used for training and 15 for evaluation. Given that each scan has 20 timepoints, 11,400 different permutations of image pairs were available per fold for training. Performance was evaluated using registration between images at ED and ES by label propagation of

Inter-patient registration of low-dose chest CT

Inter-patient registration was performed with chest CT scans of different subjects from the NLST. In this set large variations in the field of view were present, which were caused by differences in scanning protocol and by the different CT-scanners that were used. Because of these variations, and the variations in anatomy among subjects, affine registration was necessary for initial alignment. Therefore, multi-stage image registration was performed with sequential affine and deformable image

Intra-patient registration of 4D chest CT

Current registration benchmark datasets unfortunately do not provide sufficient scans to train a ConvNet using the DLIR framework. Nevertheless, to give further insight in the method’s performance and especially to enable reproducing our results, we performed experiments using the publicly available DIR-Lab data. The used dataset consists of ten 4D chest CTs that encompass a full breathing cycle in 10 timepoints. For each scan, 300 manually identified anatomical landmarks in the lungs in two

Discussion

We have presented a new framework for unsupervised training of ConvNets for 3D image registration: the Deep Learning Image Registration (DLIR) framework. The DLIR framework exploits image similarity between fixed and moving image pairs to train a ConvNet for image registration. Labeled training data, in the form of example registrations, are not required. The DLIR framework can train ConvNets for hierarchical multi-resolution and multi-level image registration and it can achieve accurate

Conclusion

We presented the Deep Learning Image Registration framework for unsupervised affine and deformable image registration with convolutional neural networks. We demonstrated that the DLIR framework is able train ConvNets without training examples for accurate affine and deformable image registration within very short execution times.

Acknowledgment

This work is part of the research programme ImaGene with project number 12726, which is partly financed by the Netherlands Organisation for Scientific Research (NWO). The authors thank the National Cancer Institute for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI.

References (38)

Y. Hu et al.
Weakly-supervised convolutional neural networks for multimodal image registration
Med. Image. Anal
(2018)
M.A. Viergever et al.
A survey of medical image registration – under review
Med. Image. Anal.
(2016)
J.J. Yu et al.
Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness
Computer Vision - ECCV 2016 Workshops, Part 3
(2016)
M.F. Beg et al.
Computing large deformation metric mappings via geodesic flows of diffeomorphisms
Int. J. Comput. Vis.
(2005)
F.F. Berendsen et al.
Registration of organs with sliding interfaces and changing topologies
Proc.SPIE
(2014)
X. Cao et al.
Deformable image registration based on similarity–steered cnn regression
E. Castillo et al.
Four-dimensional deformable image registration using trajectory modeling
Phys. Med. Biol.
(2010)
R. Castillo et al.
A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets
Phys. Med. Biol.
(2009)
Y.N. Dauphin et al.
Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization
A. Dosovitskiy et al.
Flownet: learning optical flow with convolutional networks
The IEEE International Conference on Computer Vision (ICCV)
(2015)

K. Eppenhof et al.

Deformable image registration using convolutional neural networks

Proc.SPIE

(2018)

R. Garg et al.

Unsupervised CNN for single view depth estimation: geometry to the rescue

Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII

(2016)

X. Glorot et al.

Understanding the difficulty of training deep feedforward neural networks

Y. Hu et al.

Label-driven weakly-supervised learning for multimodal deformable image registration

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018)

(2018)

E. Ilg et al.

Flownet 2.0: Evolution of optical flow estimation with deep networks

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(2017)

M. Jaderberg et al.

Spatial transformer networks

D. Kingma et al.

Adam: A method for stochastic optimization

The International Conference on Learning Representations (ICLR)

(2015)

S. Klein et al.

Elastix: a toolbox for intensity–based medical image registration

IEEE Trans. Med. Imag.

(2010)

J. Krebs et al.

Robust non–rigid registration through agent–based action learning

Cited by (663)

An unsupervised deep learning framework for large-scale lung CT deformable image registration
2024, Optics and Laser Technology
Accurate lung CT deformable image registration is especially useful in many medical image analyzing domains. In this paper, we present a novel unsupervised deep learning framework to speed up registration processing with high accuracy. Our approach consists of a convolutional neural network (CNN) model with frequent connections between layers for extracting robust image features, and of well-designed pre-processing and post-processing techniques to handle with large images without losing the precision. Additionally, during training stage, the local cross coefficient (LCC) and L2-norm for gradients of dense displacement fields (DDF) are adopted to form loss function in the model. Experiments on a large-scale lung CT dataset with each image size of over 400 × 400 × 350 show that our method achieves the best performances on Dice score of 0.9245 and mean squared error (MSE) of 0.0046 compared with some traditional and learning-based methods. Besides, our model has been proved to be robust for various deformations. Above all, our method is several orders of magnitude faster than the state-of-the-art non-learning-based algorithms.
Hybrid unsupervised paradigm based deformable image fusion for 4D CT lung image modality
2024, Information Fusion
Deformable image registration plays a critical role in various clinical applications (e.g., image fusion, atlas creation, and tumors targeting). In radiation therapy, especially in the context of fast registration of computed tomography (CT) lung image modalities, it is used to determine the geometric transformation by relating the anatomic points in two images. The main challenge lies in effectively addressing the nonlinear large and small deformation between the inspiration and expiration phases. In this work, we propose an unsupervised hybrid paradigm-based registration network (HPRN) for the registration of 4D CT lung images without relying on ground truth data. The proposed HPRN exhibits effective learning of multi-scale and multi-resolution features, leading to the computation of a more accurate Deformation Vector Field (DVF). Furthermore, we incorporate the regularization, image similarity and Jacobian determinant loss functions, which results in improving capability in dealing with complex large and small deformations. We evaluate the effectiveness of the proposed model on the publicly accessible DIRLab 4DCT lung image dataset, which shows the effectiveness of the proposed framework by achieving better Target Registration Error $(2.04 \pm 1.42 mm)$ compared to other state-of-the-art unsupervised image registration algorithms.
Artificial intelligence-based automated segmentation and radiotherapy dose mapping for thoracic normal tissues
2024, Physics and Imaging in Radiation Oncology
Objective assessment of delivered radiotherapy (RT) to thoracic organs requires fast and accurate deformable dose mapping. The aim of this study was to implement and evaluate an artificial intelligence (AI) deformable image registration (DIR) and organ segmentation-based AI dose mapping (AIDA) applied to the esophagus and the heart.
AIDA metrics were calculated for 72 locally advanced non-small cell lung cancer patients treated with concurrent chemo-RT to 60 Gy in 2 Gy fractions in an automated pipeline. The pipeline steps were: (i) automated rigid alignment and cropping of planning CT to week 1 and week 2 cone-beam CT (CBCT) field-of-views, (ii) AI segmentation on CBCTs, and (iii) AI-DIR-based dose mapping to compute dose metrics. AIDA dose metrics were compared to the planned dose and manual contour dose mapping (manual DA).
AIDA required ∼2 min/patient. Esophagus and heart segmentations were generated with a mean Dice similarity coefficient (DSC) of 0.80±0.15 and 0.94±0.05, a Hausdorff distance at 95th percentile (HD95) of 3.9±3.4 mm and 14.1±8.3 mm, respectively. AIDA heart dose was significantly lower than the planned heart dose (p = 0.04). Larger dose deviations (>=1Gy) were more frequently observed between AIDA and the planned dose (N = 26) than with manual DA (N = 6).
Rapid estimation of RT dose to thoracic tissues from CBCT is feasible with AIDA. AIDA-derived metrics and segmentations were similar to manual DA, thus motivating the use of AIDA for RT applications.
Deep-learning-based joint rigid and deformable contour propagation for magnetic resonance imaging-guided prostate radiotherapy
2024, Medical Physics
Generative Adversarial Registration Network for Multi-Contrast Liver MRI Registration and Added Value to Hepatocellular Carcinoma Segmentation: A Multicentre Study
2024, IEEE Transactions on Emerging Topics in Computational Intelligence
Weakly supervised medical image registration with multi-information guidance
2024, Measurement Science and Technology

View all citing articles on Scopus

View full text

A deep learning framework for unsupervised affine and deformable image registration

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Method

Data

Evaluation

DLIR framework

Intra-patient registration of cardiac cine MRI

Inter-patient registration of low-dose chest CT

Intra-patient registration of 4D chest CT

Discussion

Conclusion

Acknowledgment

Med. Image. Anal

Med. Image. Anal.

Computing large deformation metric mappings via geodesic flows of diffeomorphisms

Int. J. Comput. Vis.

Registration of organs with sliding interfaces and changing topologies

Proc.SPIE

Deformable image registration based on similarity–steered cnn regression

Four-dimensional deformable image registration using trajectory modeling

Phys. Med. Biol.

A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets

Phys. Med. Biol.

Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization

Flownet: learning optical flow with convolutional networks

The IEEE International Conference on Computer Vision (ICCV)

Deformable image registration using convolutional neural networks

Proc.SPIE

Unsupervised CNN for single view depth estimation: geometry to the rescue

Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII

Understanding the difficulty of training deep feedforward neural networks

Label-driven weakly-supervised learning for multimodal deformable image registration

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018)

Flownet 2.0: Evolution of optical flow estimation with deep networks

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Spatial transformer networks

Adam: A method for stochastic optimization

The International Conference on Learning Representations (ICLR)

Elastix: a toolbox for intensity–based medical image registration

IEEE Trans. Med. Imag.

Robust non–rigid registration through agent–based action learning