Elsevier

Medical Image Analysis

Volume 52, February 2019, Pages 128-143
Medical Image Analysis

A deep learning framework for unsupervised affine and deformable image registration

https://doi.org/10.1016/j.media.2018.11.010Get rights and content

Highlights

  • Unsupervised Deep Learning Image Registration (DLIR) is feasible for affine and deformable image registration.

  • The method is unsupervised; no registration examples are necessary to train a ConvNet for image registration.

  • Once a ConvNet is trained, image registration can be performed on unseen images in one-shot.

  • Registration, including image resampling, is near real-time.

  • Unsupervised DLIR yields image registration performance similar to a conventional approach.

Abstract

Image registration, the process of aligning two or more images, is the core technique of many (semi-)automatic medical image analysis tasks. Recent studies have shown that deep learning methods, notably convolutional neural networks (ConvNets), can be used for image registration. Thus far training of ConvNets for registration was supervised using predefined example registrations. However, obtaining example registrations is not trivial. To circumvent the need for predefined examples, and thereby to increase convenience of training ConvNets for image registration, we propose the Deep Learning Image Registration (DLIR) framework for unsupervised affine and deformable image registration. In the DLIR framework ConvNets are trained for image registration by exploiting image similarity analogous to conventional intensity-based image registration. After a ConvNet has been trained with the DLIR framework, it can be used to register pairs of unseen images in one shot. We propose flexible ConvNets designs for affine image registration and for deformable image registration. By stacking multiple of these ConvNets into a larger architecture, we are able to perform coarse-to-fine image registration. We show for registration of cardiac cine MRI and registration of chest CT that performance of the DLIR framework is comparable to conventional image registration while being several orders of magnitude faster.

Introduction

Image registration is the process of aligning two or more images. It is a well-established technique in (semi-)automatic medical image analysis that is used to transfer information between images. Commonly used image registration approaches include intensity-based methods, and feature-based methods that use handcrafted image features (Sotiras, Davatzikos, Paragios, 2013, Viergever, Maintz, Klein, Murphy, Staring, Pluim, 2016). Since recently, supervised and unsupervised deep learning techniques have been successfully employed for image registration (Jaderberg, Simonyan, Zisserman, Kavukcuoglu, 2015, Wu, Kim, Wang, Munsell, Shen, 2016, Miao, Wang, Liao, 2016, Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Yang, Kwitt, Styner, Niethammer, 2017, de Vos, Berendsen, Viergever, Staring, Išgum, 2017, Eppenhof, Lafarge, Moeskops, Veta, Pluim, 2018).

Deep learning techniques are well suited for image registration, because they automatically learn to aggregate the information of various complexities in images that are relevant for the task at hand. Additionally, the use of deep learning techniques potentially yields high robustness, because local optima may be of lesser concern in deep learning methods, i.e. zero gradients are often (if not always) at saddle points (Dauphin et al., 2014). Moreover, deep learning methods like convolutional neural networks are highly parallelizable which makes implementation and execution on GPUs straight-forward and fast. As a consequence deep learning enhanced registration methods are exceptionally fast making them interesting for time-critical applications; e.g. for emerging image guided therapies like High Intensity Focused Ultrasound (HIFU), the MRI Linear Accelerator (MR-linac), and MRI-guided proton therapy.

Although not explicitly introduced as a method for image registration, the spatial transformer network (STN) proposed by Jaderberg et al. (2015) was one of the first methods that exploited deep learning for image alignment. The STN is designed as part of a neural network for classification. Its task is to spatially transform input images such that the classification task is simplified. Transformations might be performed using a global transformation model or a thin plate spline model. In the application of an STN, image registration is an implicit result; image alignment is not guaranteed and only performed when beneficial for the classification task at hand. STNs have been shown to aid classification of photographs of traffic signs, house numbers, and handwritten digits, but to the best of our knowledge they have not yet been used to aid classification of medical images.

In other studies deep learning methods were explicitly trained for image registration (Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Miao, Wang, Liao, 2016, Yang, Kwitt, Styner, Niethammer, 2017, Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Hu, Modat, Gibson, Ghavami, Bonmati, Moore, Emberton, Noble, Barratt, Vercauteren, 2018, Hu, Modat, Gibson, Li, Ghavami, Bonmati, Wang, Bandula, Moore, Emberton, Ourselin, Noble, Barratt, Vercauteren, 2018). For example, convolutional neural networks (ConvNets) were trained with reinforcement learning to be agents that predicted small steps of transformations toward optimal alignment. Liao et al. (2017) applied these agents for affine registration of intra-patient cone-beam CT (CBCT) to CT and Krebs et al. (2017) applied agents for deformable image registration of inter-patient prostate MRI. Like intensity-based registration, image registration with agents is iterative. However, ConvNets can also be used to register images in one shot. For example, Miao et al. (2016) used a ConvNet to predict parameters in one shot for rigid registration of 2D CBCT to CT volumes. Similarly, ConvNets have been used to predict parameters of a thin plate spline model. Cao et al. (2017) used thin plate splines for deformable registration of brain MRI scans and Eppenhof et al. (2018) used thin plate splines for deformable registration of chest CT scans. Furthermore, in the work of Sokooti et al. (2017) it has been demonstrated that a ConvNet can be used to predict a dense displacement vector field (DVF) directly, without constraining it to a transformation model. Similarly, Yang et al. (2017) used a ConvNet to predict the momentum for registration with large deformation diffeomorphic metric mapping (Beg et al., 2005). Recently, Hu et al. (2018a) presented a method that employs segmentations to train ConvNets for global and local image registration. In this method a ConvNet takes fixed and moving image pairs as its inputs and it learns to align the segmentations. This was demonstrated on global and deformable registration of ultrasound and MR images using prostate segmentation.

While the aforementioned deep learning-based registration methods show accurate registration performance, the methods are all supervised, i.e. they rely on example registrations for training or require manual segmentations, unlike conventional image registration methods that are typically unsupervised. Training examples for registration have been generated by synthesizing transformation parameters for affine image registration (Miao et al., 2016) and deformable image registration (Sokooti, de Vos, Berendsen, Lelieveldt, Išgum, Staring, 2017, Eppenhof, Lafarge, Moeskops, Veta, Pluim, 2018), or require manual annotations (Hu, Modat, Gibson, Ghavami, Bonmati, Moore, Emberton, Noble, Barratt, Vercauteren, 2018, Hu, Modat, Gibson, Li, Ghavami, Bonmati, Wang, Bandula, Moore, Emberton, Ourselin, Noble, Barratt, Vercauteren, 2018). However, generating synthetic data may not be trivial as it is problem specific. In contrast to supervised methods, training examples can be be obtained by using conventional image registration methods (Liao, Miao, de Tournemire, Grbic, Kamen, Mansi, Comaniciu, 2017, Krebs, Mansi, Delingette, Zhang, Ghesu, Miao, Maier, Ayache, Liao, Kamen, 2017, Cao, Yang, Zhang, Nie, Kim, Wang, Shen, 2017, Yang, Kwitt, Styner, Niethammer, 2017). Alternatively, unsupervised deep learning methods could be employed. Wu et al. (2016) exploited unsupervised deep learning by employing a convolutional stacked auto-encoder (CAE) that extracted features from fixed and moving images. It improved registration with Demons (Vercauteren et al., 2009) and HAMMER (Shen and Davatzikos, 2002) on three different brain MRI datasets. However, while the CAE is unsupervised, the extracted features are optimized for image reconstruction and not for image registration. Thus, there is no guarantee that the extracted features are optimal for the specific image registration task.

Unsupervised deep learning has been used to estimate optical flow (Yu, Harley, Derpanis, 2016, Dosovitskiy, Fischer, Ilg, Hausser, Hazirbas, Golkov, van der Smagt, Cremers, Brox, 2015, Ilg, Mayer, Saikia, Keuper, Dosovitskiy, Brox, 2017) or to estimate depth (Garg et al., 2016) in video sequences. Such methods are related to medical image registration, but typically address different problems. They focus on deformations among frames in video sequences. These video sequences are in 2D, contain relatively low levels of noise, have high contrast due to RGB information, and have relatively small deformations between adjacent frames. In contrast, medical images are often 3D, may contain large amounts of noise, may have relatively low contrast and aligning them typically requires larger deformations.

We propose a Deep Learning Image Registration (DLIR) framework: an unsupervised technique to train ConvNets for medical image registration tasks. In the DLIR framework, a ConvNet is trained for image registration by exploiting image similarity between fixed and moving image pairs, thereby circumventing the need for registration examples. The DLIR framework bears similarity with a conventional iterative image registration framework, as shown in Fig. 1. However, in contrast to conventional image registration, the transformation parameters are not directly optimized, but indirectly, by optimizing the ConvNet’s parameters. In the DLIR framework the task of a ConvNet is to learn to predict transformation parameters by analyzing fixed and moving image pairs. The predicted transformation parameters are used to make a dense displacement vector field (DVF). The DVF is used to resample to moving image into a warped image that mimics the fixed image. During training, the ConvNet learns the underlying patterns of image registration by optimizing image similarity between the fixed and warped moving images. Once a ConvNet is trained, it has learned the image registration task and it is able to perform registration on pairs of unseen fixed and moving images in one shot, thus non-iteratively.

The current paper extends our preliminary study of unsupervised deformable image registration (de Vos et al., 2017) in several ways. First, we extend the analysis from 2D to 3D images. Second, we perform B-spline registration with transposed convolutions, which results in high registration speeds, reduces memory footprint, and allows simple implementation of B-spline registration on existing deep learning frameworks. Third, borrowed from conventional image registration where regularization often is an integral part in transformation models (Sotiras et al., 2013), we include a bending energy penalty term that encourages smooth displacements. Fourth, we present ConvNet designs for affine as well as deformable registration. Fifth, we introduce multi-stage ConvNets for registration of coarse-to-fine complexity in multiple-levels and multiple image resolutions by stacking ConvNets for affine and deformable image registration. Such a multi-stage ConvNet can perform registration tasks on fixed and moving pairs of different size, similarly to conventional iterative intensity-based registration strategies. Finally, in addition to evaluation on intra-patient registration of cardiac cine MR images, we conduct experiments on a diverse set of low-dose chest CTs for inter-patient registration, and we evaluate the method on the publicly available DIR-Lab dataset for image registration (Castillo, Castillo, Guerra, Johnson, McPhail, Garg, Guerrero, 2009, Castillo, Castillo, Martinez, Shenoy, Guerrero, 2010).

Section snippets

Method

In image registration the aim is to find a coordinate transformation T: IF → IM that aligns a fixed image IF and a moving image IM. In conventional image registration similarity between the images is optimized by minimizing a dissimilarity metric L:μ^=argminμ{L(Tμ;IF,IM)+R(Tμ)}, where Tμ is parameterized by transformation parameters μ and R is an optional regularization term to encourage smoothness of the transformation Tμ. Several dissimilarity metrics might be used, e.g. mean squared

Data

Like most deep learning approaches, the DLIR framework requires large sets of training data. Publicly available datasets that are specifically provided to evaluate registration algorithms, contain insufficient training data for our approach. Therefore, we made use of large datasets of cardiac cine MRIs for intra-patient registration experiments, and low-dose chest CTs from the National Lung Screening Trial (NLST) for inter-patient registration experiments. We used manually delineated anatomical

Evaluation

The DLIR framework was evaluated with intra-patient as well as inter-patient registration experiments. As image folding is anatomically implausible, especially in intra-patient image registration, after registration, we evaluated the topology of obtained DVFs quantitatively. For this we determined the Jacobian determinant–also known as the Jacobian–for every point p(i, j, k) in the DVF:det(J(i,j,k))=|ixjxkxiyjykyizjzkz|A Jacobian of 1 indicates that no volume change has

DLIR framework

All ConvNets were trained with the DLIR framework using the loss function provided in Section 2.4. The ConvNets were initialized with Glorot’s uniform distribution (Glorot and Bengio, 2010) and optimized with Adam (Kingma and Ba, 2015).

Rectified linear units were used for activation in all ConvNets, except in the output layers. The output of the deformable ConvNets were unconstrained to enable prediction of negative B-spline displacement vectors. The outputs of affine ConvNets were constrained

Intra-patient registration of cardiac cine MRI

Intra-patient registration experiments were conducted using cardiac cine MRIs. The task was to register volumes (i.e. 3D images) within the 4D scans. Experiments were performed with 3-fold cross-validation. In each fold 30 images were used for training and 15 for evaluation. Given that each scan has 20 timepoints, 11,400 different permutations of image pairs were available per fold for training. Performance was evaluated using registration between images at ED and ES by label propagation of

Inter-patient registration of low-dose chest CT

Inter-patient registration was performed with chest CT scans of different subjects from the NLST. In this set large variations in the field of view were present, which were caused by differences in scanning protocol and by the different CT-scanners that were used. Because of these variations, and the variations in anatomy among subjects, affine registration was necessary for initial alignment. Therefore, multi-stage image registration was performed with sequential affine and deformable image

Intra-patient registration of 4D chest CT

Current registration benchmark datasets unfortunately do not provide sufficient scans to train a ConvNet using the DLIR framework. Nevertheless, to give further insight in the method’s performance and especially to enable reproducing our results, we performed experiments using the publicly available DIR-Lab data. The used dataset consists of ten 4D chest CTs that encompass a full breathing cycle in 10 timepoints. For each scan, 300 manually identified anatomical landmarks in the lungs in two

Discussion

We have presented a new framework for unsupervised training of ConvNets for 3D image registration: the Deep Learning Image Registration (DLIR) framework. The DLIR framework exploits image similarity between fixed and moving image pairs to train a ConvNet for image registration. Labeled training data, in the form of example registrations, are not required. The DLIR framework can train ConvNets for hierarchical multi-resolution and multi-level image registration and it can achieve accurate

Conclusion

We presented the Deep Learning Image Registration framework for unsupervised affine and deformable image registration with convolutional neural networks. We demonstrated that the DLIR framework is able train ConvNets without training examples for accurate affine and deformable image registration within very short execution times.

Acknowledgment

This work is part of the research programme ImaGene with project number 12726, which is partly financed by the Netherlands Organisation for Scientific Research (NWO). The authors thank the National Cancer Institute for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI.

References (38)

  • K. Eppenhof et al.

    Deformable image registration using convolutional neural networks

    Proc.SPIE

    (2018)
  • R. Garg et al.

    Unsupervised CNN for single view depth estimation: geometry to the rescue

    Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII

    (2016)
  • X. Glorot et al.

    Understanding the difficulty of training deep feedforward neural networks

  • Y. Hu et al.

    Label-driven weakly-supervised learning for multimodal deformable image registration

    2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018)

    (2018)
  • E. Ilg et al.

    Flownet 2.0: Evolution of optical flow estimation with deep networks

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • M. Jaderberg et al.

    Spatial transformer networks

  • D. Kingma et al.

    Adam: A method for stochastic optimization

    The International Conference on Learning Representations (ICLR)

    (2015)
  • S. Klein et al.

    Elastix: a toolbox for intensity–based medical image registration

    IEEE Trans. Med. Imag.

    (2010)
  • J. Krebs et al.

    Robust non–rigid registration through agent–based action learning

  • Cited by (663)

    View all citing articles on Scopus
    View full text