A deep learning framework for unsupervised affine and deformable image registration
Graphical abstract
Introduction
Image registration is the process of aligning two or more images. It is a well-established technique in (semi-)automatic medical image analysis that is used to transfer information between images. Commonly used image registration approaches include intensity-based methods, and feature-based methods that use handcrafted image features (Sotiras, Davatzikos, Paragios, 2013, Viergever, Maintz, Klein, Murphy, Staring, Pluim, 2016). Since recently, supervised and unsupervised deep learning techniques have been successfully employed for image registration (Jaderberg, Simonyan, Zisserman, Kavukcuoglu, 2015, Wu, Kim, Wang, Munsell, Shen, 2016, Miao, Wang, Liao, 2016, Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Yang, Kwitt, Styner, Niethammer, 2017, de Vos, Berendsen, Viergever, Staring, Išgum, 2017, Eppenhof, Lafarge, Moeskops, Veta, Pluim, 2018).
Deep learning techniques are well suited for image registration, because they automatically learn to aggregate the information of various complexities in images that are relevant for the task at hand. Additionally, the use of deep learning techniques potentially yields high robustness, because local optima may be of lesser concern in deep learning methods, i.e. zero gradients are often (if not always) at saddle points (Dauphin et al., 2014). Moreover, deep learning methods like convolutional neural networks are highly parallelizable which makes implementation and execution on GPUs straight-forward and fast. As a consequence deep learning enhanced registration methods are exceptionally fast making them interesting for time-critical applications; e.g. for emerging image guided therapies like High Intensity Focused Ultrasound (HIFU), the MRI Linear Accelerator (MR-linac), and MRI-guided proton therapy.
Although not explicitly introduced as a method for image registration, the spatial transformer network (STN) proposed by Jaderberg et al. (2015) was one of the first methods that exploited deep learning for image alignment. The STN is designed as part of a neural network for classification. Its task is to spatially transform input images such that the classification task is simplified. Transformations might be performed using a global transformation model or a thin plate spline model. In the application of an STN, image registration is an implicit result; image alignment is not guaranteed and only performed when beneficial for the classification task at hand. STNs have been shown to aid classification of photographs of traffic signs, house numbers, and handwritten digits, but to the best of our knowledge they have not yet been used to aid classification of medical images.
In other studies deep learning methods were explicitly trained for image registration (Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Miao, Wang, Liao, 2016, Yang, Kwitt, Styner, Niethammer, 2017, Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Hu, Modat, Gibson, Ghavami, Bonmati, Moore, Emberton, Noble, Barratt, Vercauteren, 2018, Hu, Modat, Gibson, Li, Ghavami, Bonmati, Wang, Bandula, Moore, Emberton, Ourselin, Noble, Barratt, Vercauteren, 2018). For example, convolutional neural networks (ConvNets) were trained with reinforcement learning to be agents that predicted small steps of transformations toward optimal alignment. Liao et al. (2017) applied these agents for affine registration of intra-patient cone-beam CT (CBCT) to CT and Krebs et al. (2017) applied agents for deformable image registration of inter-patient prostate MRI. Like intensity-based registration, image registration with agents is iterative. However, ConvNets can also be used to register images in one shot. For example, Miao et al. (2016) used a ConvNet to predict parameters in one shot for rigid registration of 2D CBCT to CT volumes. Similarly, ConvNets have been used to predict parameters of a thin plate spline model. Cao et al. (2017) used thin plate splines for deformable registration of brain MRI scans and Eppenhof et al. (2018) used thin plate splines for deformable registration of chest CT scans. Furthermore, in the work of Sokooti et al. (2017) it has been demonstrated that a ConvNet can be used to predict a dense displacement vector field (DVF) directly, without constraining it to a transformation model. Similarly, Yang et al. (2017) used a ConvNet to predict the momentum for registration with large deformation diffeomorphic metric mapping (Beg et al., 2005). Recently, Hu et al. (2018a) presented a method that employs segmentations to train ConvNets for global and local image registration. In this method a ConvNet takes fixed and moving image pairs as its inputs and it learns to align the segmentations. This was demonstrated on global and deformable registration of ultrasound and MR images using prostate segmentation.
While the aforementioned deep learning-based registration methods show accurate registration performance, the methods are all supervised, i.e. they rely on example registrations for training or require manual segmentations, unlike conventional image registration methods that are typically unsupervised. Training examples for registration have been generated by synthesizing transformation parameters for affine image registration (Miao et al., 2016) and deformable image registration (Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Eppenhof, Lafarge, Moeskops, Veta, Pluim, 2018), or require manual annotations (Hu, Modat, Gibson, Ghavami, Bonmati, Moore, Emberton, Noble, Barratt, Vercauteren, 2018, Hu, Modat, Gibson, Li, Ghavami, Bonmati, Wang, Bandula, Moore, Emberton, Ourselin, Noble, Barratt, Vercauteren, 2018). However, generating synthetic data may not be trivial as it is problem specific. In contrast to supervised methods, training examples can be be obtained by using conventional image registration methods (Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Yang, Kwitt, Styner, Niethammer, 2017). Alternatively, unsupervised deep learning methods could be employed. Wu et al. (2016) exploited unsupervised deep learning by employing a convolutional stacked auto-encoder (CAE) that extracted features from fixed and moving images. It improved registration with Demons (Vercauteren et al., 2009) and HAMMER (Shen and Davatzikos, 2002) on three different brain MRI datasets. However, while the CAE is unsupervised, the extracted features are optimized for image reconstruction and not for image registration. Thus, there is no guarantee that the extracted features are optimal for the specific image registration task.
Unsupervised deep learning has been used to estimate optical flow (Yu, Harley, Derpanis, 2016, Dosovitskiy, Fischer, Ilg, Hausser, Hazirbas, Golkov, van der Smagt, Cremers, Brox, 2015, Ilg, Mayer, Saikia, Keuper, Dosovitskiy, Brox, 2017) or to estimate depth (Garg et al., 2016) in video sequences. Such methods are related to medical image registration, but typically address different problems. They focus on deformations among frames in video sequences. These video sequences are in 2D, contain relatively low levels of noise, have high contrast due to RGB information, and have relatively small deformations between adjacent frames. In contrast, medical images are often 3D, may contain large amounts of noise, may have relatively low contrast and aligning them typically requires larger deformations.
We propose a Deep Learning Image Registration (DLIR) framework: an unsupervised technique to train ConvNets for medical image registration tasks. In the DLIR framework, a ConvNet is trained for image registration by exploiting image similarity between fixed and moving image pairs, thereby circumventing the need for registration examples. The DLIR framework bears similarity with a conventional iterative image registration framework, as shown in Fig. 1. However, in contrast to conventional image registration, the transformation parameters are not directly optimized, but indirectly, by optimizing the ConvNet’s parameters. In the DLIR framework the task of a ConvNet is to learn to predict transformation parameters by analyzing fixed and moving image pairs. The predicted transformation parameters are used to make a dense displacement vector field (DVF). The DVF is used to resample to moving image into a warped image that mimics the fixed image. During training, the ConvNet learns the underlying patterns of image registration by optimizing image similarity between the fixed and warped moving images. Once a ConvNet is trained, it has learned the image registration task and it is able to perform registration on pairs of unseen fixed and moving images in one shot, thus non-iteratively.
The current paper extends our preliminary study of unsupervised deformable image registration (de Vos et al., 2017) in several ways. First, we extend the analysis from 2D to 3D images. Second, we perform B-spline registration with transposed convolutions, which results in high registration speeds, reduces memory footprint, and allows simple implementation of B-spline registration on existing deep learning frameworks. Third, borrowed from conventional image registration where regularization often is an integral part in transformation models (Sotiras et al., 2013), we include a bending energy penalty term that encourages smooth displacements. Fourth, we present ConvNet designs for affine as well as deformable registration. Fifth, we introduce multi-stage ConvNets for registration of coarse-to-fine complexity in multiple-levels and multiple image resolutions by stacking ConvNets for affine and deformable image registration. Such a multi-stage ConvNet can perform registration tasks on fixed and moving pairs of different size, similarly to conventional iterative intensity-based registration strategies. Finally, in addition to evaluation on intra-patient registration of cardiac cine MR images, we conduct experiments on a diverse set of low-dose chest CTs for inter-patient registration, and we evaluate the method on the publicly available DIR-Lab dataset for image registration (Castillo, Castillo, Guerra, Johnson, McPhail, Garg, Guerrero, 2009, Castillo, Castillo, Martinez, Shenoy, Guerrero, 2010).
Section snippets
Method
In image registration the aim is to find a coordinate transformation T: IF → IM that aligns a fixed image IF and a moving image IM. In conventional image registration similarity between the images is optimized by minimizing a dissimilarity metric L: where Tμ is parameterized by transformation parameters μ and R is an optional regularization term to encourage smoothness of the transformation Tμ. Several dissimilarity metrics might be used, e.g. mean squared
Data
Like most deep learning approaches, the DLIR framework requires large sets of training data. Publicly available datasets that are specifically provided to evaluate registration algorithms, contain insufficient training data for our approach. Therefore, we made use of large datasets of cardiac cine MRIs for intra-patient registration experiments, and low-dose chest CTs from the National Lung Screening Trial (NLST) for inter-patient registration experiments. We used manually delineated anatomical
Evaluation
The DLIR framework was evaluated with intra-patient as well as inter-patient registration experiments. As image folding is anatomically implausible, especially in intra-patient image registration, after registration, we evaluated the topology of obtained DVFs quantitatively. For this we determined the Jacobian determinant–also known as the Jacobian–for every point p(i, j, k) in the DVF:A Jacobian of 1 indicates that no volume change has
DLIR framework
All ConvNets were trained with the DLIR framework using the loss function provided in Section 2.4. The ConvNets were initialized with Glorot’s uniform distribution (Glorot and Bengio, 2010) and optimized with Adam (Kingma and Ba, 2015).
Rectified linear units were used for activation in all ConvNets, except in the output layers. The output of the deformable ConvNets were unconstrained to enable prediction of negative B-spline displacement vectors. The outputs of affine ConvNets were constrained
Intra-patient registration of cardiac cine MRI
Intra-patient registration experiments were conducted using cardiac cine MRIs. The task was to register volumes (i.e. 3D images) within the 4D scans. Experiments were performed with 3-fold cross-validation. In each fold 30 images were used for training and 15 for evaluation. Given that each scan has 20 timepoints, 11,400 different permutations of image pairs were available per fold for training. Performance was evaluated using registration between images at ED and ES by label propagation of
Inter-patient registration of low-dose chest CT
Inter-patient registration was performed with chest CT scans of different subjects from the NLST. In this set large variations in the field of view were present, which were caused by differences in scanning protocol and by the different CT-scanners that were used. Because of these variations, and the variations in anatomy among subjects, affine registration was necessary for initial alignment. Therefore, multi-stage image registration was performed with sequential affine and deformable image
Intra-patient registration of 4D chest CT
Current registration benchmark datasets unfortunately do not provide sufficient scans to train a ConvNet using the DLIR framework. Nevertheless, to give further insight in the method’s performance and especially to enable reproducing our results, we performed experiments using the publicly available DIR-Lab data. The used dataset consists of ten 4D chest CTs that encompass a full breathing cycle in 10 timepoints. For each scan, 300 manually identified anatomical landmarks in the lungs in two
Discussion
We have presented a new framework for unsupervised training of ConvNets for 3D image registration: the Deep Learning Image Registration (DLIR) framework. The DLIR framework exploits image similarity between fixed and moving image pairs to train a ConvNet for image registration. Labeled training data, in the form of example registrations, are not required. The DLIR framework can train ConvNets for hierarchical multi-resolution and multi-level image registration and it can achieve accurate
Conclusion
We presented the Deep Learning Image Registration framework for unsupervised affine and deformable image registration with convolutional neural networks. We demonstrated that the DLIR framework is able train ConvNets without training examples for accurate affine and deformable image registration within very short execution times.
Acknowledgment
This work is part of the research programme ImaGene with project number 12726, which is partly financed by the Netherlands Organisation for Scientific Research (NWO). The authors thank the National Cancer Institute for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI.
References (38)
- et al.
Weakly-supervised convolutional neural networks for multimodal image registration
Med. Image. Anal
(2018) - et al.
A survey of medical image registration – under review
Med. Image. Anal.
(2016) - et al.
Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness
Computer Vision - ECCV 2016 Workshops, Part 3
(2016) - et al.
Computing large deformation metric mappings via geodesic flows of diffeomorphisms
Int. J. Comput. Vis.
(2005) - et al.
Registration of organs with sliding interfaces and changing topologies
Proc.SPIE
(2014) - et al.
Deformable image registration based on similarity–steered cnn regression
- et al.
Four-dimensional deformable image registration using trajectory modeling
Phys. Med. Biol.
(2010) - et al.
A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets
Phys. Med. Biol.
(2009) - et al.
Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization
- et al.
Flownet: learning optical flow with convolutional networks
The IEEE International Conference on Computer Vision (ICCV)
(2015)
Deformable image registration using convolutional neural networks
Proc.SPIE
Unsupervised CNN for single view depth estimation: geometry to the rescue
Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII
Understanding the difficulty of training deep feedforward neural networks
Label-driven weakly-supervised learning for multimodal deformable image registration
2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018)
Flownet 2.0: Evolution of optical flow estimation with deep networks
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Spatial transformer networks
Adam: A method for stochastic optimization
The International Conference on Learning Representations (ICLR)
Elastix: a toolbox for intensity–based medical image registration
IEEE Trans. Med. Imag.
Robust non–rigid registration through agent–based action learning
Cited by (663)
An unsupervised deep learning framework for large-scale lung CT deformable image registration
2024, Optics and Laser TechnologyHybrid unsupervised paradigm based deformable image fusion for 4D CT lung image modality
2024, Information FusionArtificial intelligence-based automated segmentation and radiotherapy dose mapping for thoracic normal tissues
2024, Physics and Imaging in Radiation OncologyGenerative Adversarial Registration Network for Multi-Contrast Liver MRI Registration and Added Value to Hepatocellular Carcinoma Segmentation: A Multicentre Study
2024, IEEE Transactions on Emerging Topics in Computational IntelligenceWeakly supervised medical image registration with multi-information guidance
2024, Measurement Science and Technology