Elsevier

Medical Image Analysis

Volume 54, May 2019, Pages 10-19
Medical Image Analysis

Medical image classification using synergic deep learning

https://doi.org/10.1016/j.media.2019.02.010Get rights and content

Highlights

  • Propose the synergic deep learning (SDL) model for medical image classification.

  • Using synergic networks to enable multiple DCNN components to learn from each other.

  • Learning from image pairs including similar inter-class/dissimilar intra-class ones.

  • Achieving state-of-the-art performances on four medical image classification datasets.

Abstract

The classification of medical images is an essential task in computer-aided diagnosis, medical image retrieval and mining. Although deep learning has shown proven advantages over traditional methods that rely on the handcrafted features, it remains challenging due to the significant intra-class variation and inter-class similarity caused by the diversity of imaging modalities and clinical pathologies. In this paper, we propose a synergic deep learning (SDL) model to address this issue by using multiple deep convolutional neural networks (DCNNs) simultaneously and enabling them to mutually learn from each other. Each pair of DCNNs has their learned image representation concatenated as the input of a synergic network, which has a fully connected structure that predicts whether the pair of input images belong to the same class. Thus, if one DCNN makes a correct classification, a mistake made by the other DCNN leads to a synergic error that serves as an extra force to update the model. This model can be trained end-to-end under the supervision of classification errors from DCNNs and synergic errors from each pair of DCNNs. Our experimental results on the ImageCLEF-2015, ImageCLEF-2016, ISIC-2016, and ISIC-2017 datasets indicate that the proposed SDL model achieves the state-of-the-art performance in these medical image classification tasks.

Introduction

The significance of digital medical imaging in the modern healthcare has led to the indispensable role of medical image analysis in the clinical therapy (Ghosh, Antani, Long, Thoma, 2011, de Bruijne, 2016, Kalpathy-Cramer, Herrera, Demner-Fushman, Antani, Bedrick, Muller, 2015). Medical image classification, a fundamental step in medical image analysis, aims to distinguish medical images according to a certain criterion, such as clinical pathologies or imaging modalities. A reliable medical image classification system is able to assist doctors in the fast and accurate interpretation of medical images.

Medical image classification has been thoroughly studied during the past decades with a huge number of solutions in the literature (Baloch, Krim, 2007, Song, Cai, Zhou, Feng, 2013, Koitka, Friedrich, 2016), most of which are based on handcrafted features. Despite the success of these methods, it is usually difficult to design handcrafted features that are optimal for a specific classification task. In recent years, deep learning techniques (Simonyan, Zisserman, Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke, Rabinovich, 2015, He, Zhang, Ren, Sun, 2016, Chen, Qi, Yu, Dou, Qin, Heng, 2017, Li, Zeng, peng, Ji, 2017), especially deep convolutional neural networks (DCNN), have led to significant breakthroughs in medical image classification (Koitka, Friedrich, 2016, Xu, Mo, Feng, Zhong, Lai, Eric, Chang, 2014, Shen, Zhou, Yang, Yu, Dong, Yang, Zang, Tian, 2017, Esteva, Kuprel, Novoa, Ko, Swetter, Blau, Thrun, 2017, Personnaz, Guyon, Dreyfus, 1986, Kumar, Kim, Lyndon, Fulham, Feng, 2017, Yu, Lin, Meng, Wei, Guo, Zhao, 2017b), and medical image segmentation (Dong, Yang, Liu, Mo, Guo, 2017, Soltaninejad, Zhang, Lambrou, Yang, Allinson, Ye, 2017). However, although these methods are more accurate than handcrafted feature-based approaches, they have not achieved the same success on medical image classification (Sirinukunwattana, Ahmed, Tsang, Snead, Cree, Rajpoot, 2016, Xie, Fan, Li, Jiang, Meng, Bovik, 2017) as they have done in the ImageNet Challenge (Deng, Dong, Socher, Li, Li, Fei-Fei, 2009, Krizhevsky, Sutskever, Hinton, 2012). The suboptimal performance is attributed mainly to two reasons.

First, deep models may overfit the training data, which is far from adequate, as there is usually a small dataset in medical image analysis and this relates to the work required in acquiring the image data and then in image annotation (Weese and Lorenz, 2016). To address this issue, pre-trained deep models have been adopted, since it has been widely recognized that the image representation ability learned from large-scale datasets, such as ImageNet (Deng et al., 2009), can be efficiently transferred to generic visual recognition tasks, where the training data is limited (Zhou, Shin, Zhang, Gurudu, Gotway, Liang, 2017, Ravishankar, Venkataramani, Thiruvenkadam, Sudhakar, Vaidya, 2017, Oquab, Bottou, Laptev, Sivic, 2014, Mettes, Koelma, Snoek, 2016).

Second, and more significantly, the intra-class variation and inter-class similarity pose an even greater challenge to the classification of medical images (Song et al., 2015). As an example shown in Fig. 1(a)–(d), the separation of images from computed tomography (CT) and magnetic resonance (MR) imaging scanners is difficult in: (1) both CT and MR images provide the anatomical information about the body parts that are imaged, and hence share many visual similarities and non-professionals can have difficulty in separating them (see Fig. 1(a) vs (c), or Fig. 1(b) vs (d)); and (2) images from the same modality will differ depending upon the anatomical location and individual variability (see Fig. 1(a) vs (b), or Fig. 1(c) vs (d)). Another example shown in Fig. 1(e)–(h) is the separation of malignant skin lesions from benign ones. It reveals that there is a big visual difference between the benign skin lesions (e) and (f) and between malignant ones (g) and (h). Nevertheless, the benign skin lesions (e) and (f) are even more similar to the malignant lesions (g) and (h), respectively, in the shape and color. To address this challenge, human observers focus more on the ambiguity caused by hard cases, which may provide more discriminatory information than easy ones (Bengio et al., 2009). The pair-wise learning strategy is an effective technique that learns from pairs of samples and captures more information in favor of distinguishing hard cases.

Handcrafted feature-based medical image classification:The descriptors for color, texture, and shape and combined descriptors have been widely used in medical image classification. Baloch and Krim (2007) proposed a flexible skew-symmetric shape model to capture shape variability within a certain neighborhood and account for all potential variability. Song et al. (2013) designed a novel texture descriptor to represent rich texture features by integrating multi-scale Gabor filters and local binary patterns (LBP) histograms for lung tissue classification. Koitka and Friedrich (2016) extracted up to 11 handcrafted visual descriptors and jointly used them for modality based medical image classification. Compared with these handcrafted feature-based methods, the proposed SDL can learn the discriminative feature representation from data adaptively and effectively.

Deep learning-based medical image classification:DCNN models provide a unified feature extraction-classification framework to free human users from the troublesome handcrafted feature extraction for medical image classification. Xu et al. (2014) adopted a DCNN to minimize manual annotation and produced good feature representations for histopathological colon cancer image classification. Shen et al. (2017) proposed a multi-crop pooling strategy and applied it to a DCNN to capture object salient information for lung nodule classification on chest CT images. Esteva et al. (2017) trained a DCNN using 129,450 clinical images for diagnosing the most common and deadliest skin cancers and achieved the performance that matches the performance of 21 board-certified dermatologists. Koitka and Friedrich (2016) extracted the output of the last fully connected layer in a pre-trained ResNet-152 model and adopted them to train a custom network layer using the pseudo-inverse method (Personnaz et al., 1986). Kumar et al. (2017) integrated two different pre-trained DCNN architectures and combined them into a stronger classifier. Yu et al. (2017b) presented an ensemble of multiple pre-trained ResNet-50 and VGGNet-16 models and multiple fully-trained DCNNs by calculating the weighted sum of predicted probabilities. In our previous work (Zhang, Xia, Xie, Fulham, Feng, 2018a, Xie, Zhang, Xia, Fulham, Zhang, 2018), we jointly used deep and handcrafted visual features for medical image classification and found that handcrafted features were able to complement the image representation learned by DCNNs on small training datasets. Different from these networks, the proposed SDL model simultaneously takes multiple images as input, and thus enables multiple DCNN components mutually improve each other for learning better discriminative representation.

Pair-wise learning:In the past decade, the pair-wise learning strategy has been applied to various perception tasks, such as signature verification (Bromley et al., 1994), face verification (Chopra et al., 2005), speech analysis (Kamper, Elsner, Jansen, Goldwater, 2015, Kamper, Wang, Livescu, 2016, Renshaw, Kamper, Jansen, Goldwater, 2015), and natural language processing (Mueller and Thyagarajan, 2016). Bromley et al. (1994) described a Siamese neural network for verification of signatures written on a pen-input tablet by comparing the distance which is cosine of the angle between an extracted feature vector and a stored feature vector. Chopra et al. (2005) presented a general discriminative method for learning a similarity metric from data pairs by minimizing a discriminative loss function that enlarges the metrics of pairs of faces from the same person and narrows the pairs from different persons. Recent years have witnessed the widespread applications of pair-wise learning in unsupervised speech feature learning. Kamper et al. (2015) proposed an unsupervised deep auto-encoder feature extractor for zero-resource speech processing by using weak top-down supervision from word pairs obtained by an unsupervised term discovery system. Kamper et al. (2016) also used word pairs to train a Siamese DCNN that takes a pair of speech segments as input and uses a hinge loss to classify same-word pairs and different-word pairs. Renshaw et al. (2015) claimed that guiding the representation learning using word pairs provides a major benefit over standard unsupervised methods. Pair-wise learning has also been applied to natural language processing. Mueller and Thyagarajan (2016) presented a Siamese recurrent architecture which is trained on paired examples to learn a highly structured space of sentence representations that captures rich semantics for learning sentence similarity. Different from the traditional pair-wise learning, the SDL model avoids handcrafted design of tricky distance metric loss functions for optimization, and automatically learns whether image pairs belong to the same category or not by using a cross-entropy loss function. Besides, the SDL model supports the simultaneous learning of multiple image pairs, which works with multiple DCNN components on the premise of not sharing parameters such that the model can benefit from an ensemble of multiple networks.

In this paper, we propose a synergic deep learning (SDL) model to learn the discriminative representation simultaneously from pairs of images, which include both similar images in different categories and dissimilar images in the same category, for medical image classification. The SDL model consists of n pre-trained DCNNs and Cn2 synergic networks. Each DCNN learns image representation and classification, and each pair of DCNNs has their learned image representation concatenated as the input of a synergic network, which has a fully connected structure, to predict whether the pair of input images belongs to the same class or not. Thus, the SDL model can be trained in an end-to-end fashion under the supervision of both the classification error from each DCNN and the synergic error from each pair of DCNNs. We have evaluated the proposed model on the 2015/2016 Image Cross Language Evaluation Forum (ImageCLEF) subfigure classification challenge datasets, and the 2016/2017 International Skin Imaging Collaboration (ISIC) skin lesion classification challenge datasets. Our results suggest that the SDL model achieves the state-of-the-art performance on these four medical image classification tasks.

The main contributions of this paper are three-fold. First, we propose the SDL model that learns the discriminative feature representation from multiple images simultaneously including both similar inter-class images and dissimilar intra-class images. Second, we enable each pair of DCNNs in the SDL model to mutually facilitate each other during the learning process, since, if one DCNN makes correct decision, the mistake made by the other DCNN may lead to a synergic error that serves as an extra force to learn the discriminative representation. Finally, we achieve the state-of-the-art performance on the ImageCLEF-2015, ImageCLEF-2016 Subfigure Classification datasets, ISIC-2016 and ISIC-2017 Skin Lesion Classification datasets.

A pilot data of this work was presented in MICCAI 2018 (Zhang et al., 2018b). In this paper, we have substantially revised and extended the conference paper. The main extension includes that (1) the SDL model was generalized from a special version SDL2 which has only two DCNN components, to a generalized version SDLn with n DCNNs, and the generalization leads to improved performance in medical image classification; and (2) the proposed model was evaluated not only on pathology-based image classification datasets (i.e. ISIC-2016 and ISIC-2017 datasets), but also on modality-based image classification datasets (i.e. ImageCLEF-2015 and ImageCLEF-2016 datasets).

Section snippets

Datasets

For this study, we use four medical image classification datasets, including two modality-based medical image classification datasets, i.e. ImageCLEF 2015 (de Herrera et al., 2015) and ImageCLEF 2016 (de Herrera et al., 2016) datasets, and two pathology-based medical image classification datasets, i.e. ISIC-2016 (Gutman et al., 2016) and ISIC-2017 (Codella et al., 2018) datasets.

ImageCLEF-2015, ImageCLEF-2016:Recognizing the increasing complexity of images in biomedical literatures, ImageCLEF

Experimental settings

To alleviate the overfitting of deep models, we employed two data argumentation (DA) strategies to enlarge the training dataset. The first strategy (DA1) is to use the ImageDataGenetrator toolbox (Chollet et al., 2015) to apply geometric transformations to training images, including random rotation ([10,+10]), shifts (0 ∼ 10% of total width and height), shear (0 ∼ 0.1 radians in the counter-clockwise direction), zoom (90% ∼ 110% of width and height), and horizontally and vertically flip. The

Stability interval of hyper parameter λ

We used the SDL2 model as a case study to evaluate the impact of hyper parameter λ on the classification performance. Fig. 5 shows the accuracy obtained by applying the SDL2 model with different values of λ to four datasets. It reveals that, when λ takes a value from the range [3, 8], the SDL2 model achieved good accuracy on all datasets and its performance is relatively robust to the value of λ. Therefore, we suggest taking the value of λ from [3, 8].

Performance without data augmentation

There are several commonly used data

Conclusion

In this paper, we propose the SDL model to address the challenge caused by the intra-class variation and inter-class similarity for medical image classification. This model simultaneously uses multiple DCNNs with synergic networks to enable those DCNNs to mutually learn from each other. Our results on the ImageCLEF-2015, ImageCLEF-2016, ISIC-2016, and ISIC-2017 datasets show that the proposed SDL model achieves the state-of-the-art performance in these medical image classification tasks. In the

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grants 61771397 and 61471297.

References (56)

  • J. Bromley et al.

    Signature verification using a” siamese” time delay neural network

    Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS)

    (1994)
  • Chollet, F., et al., 2015. Keras. GitHub repository...
  • S. Chopra et al.

    Learning a similarity metric discriminatively, with application to face verification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2005)
  • P. Cirujeda et al.

    Medical image classification via 2d color feature based covariance descriptors

    Proceedings of CLEF (Working Notes)

    (2015)
  • N.C. Codella et al.

    Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic)

    Proceedings of IEEE 15th International Symposium on Biomedical Imaging (ISBI)

    (2018)
  • J. Deng et al.

    Imagenet: a large-scale hierarchical image database

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2009)
  • DeVries, T., Ramachandram, D., 2017. Skin lesion classification using deep multi-scale convolutional neural networks....
  • Díaz, I.G., 2017. Incorporating the knowledge of dermatologists to convolutional neural networks for the diagnosis of...
  • H. Dong et al.

    Automatic brain tumor detection and segmentation using u-net based fully convolutional networks

    Annual Conference on Medical Image Understanding and Analysis

    (2017)
  • A. Esteva et al.

    Corrigendum: dermatologist-level classification of skin cancer with deep neural networks

    Nature

    (2017)
  • P. Ghosh et al.

    Review of medical image retrieval systems and future directions

    Proceedings of 2011 24th International Symposium on Computer-Based Medical Systems (CBMS)

    (2011)
  • Gutman, D., Codella, N.C., Celebi, E., Helba, B., Marchetti, M., Mishra, N., Halpern, A., 2016. Skin lesion analysis...
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • A.G.S. de Herrera et al.

    Overview of the imageclef 2013 medical tasks.

    Proceedings of CLEF (Working Notes)

    (2013)
  • A.G.S. de Herrera et al.

    Overview of the imageclef 2015 medical classification task.

    CLEF (Working Notes)

    (2015)
  • A.G.S. de Herrera et al.

    Overview of the imageclef 2016 medical classification task.

    CLEF (Working Notes)

    (2016)
  • H. Kamper et al.

    Unsupervised neural network based feature extraction using weak top-down constraints

    Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    (2015)
  • H. Kamper et al.

    Deep convolutional acoustic word embeddings using word-pair side information

    Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    (2016)
  • Cited by (303)

    • Medical image identification methods: A review

      2024, Computers in Biology and Medicine
    View all citing articles on Scopus

    Conflicts of lnterest Statement: The authors declare no conflict of interest.

    View full text