Medical image classification using synergic deep learning
Graphical abstract
Introduction
The significance of digital medical imaging in the modern healthcare has led to the indispensable role of medical image analysis in the clinical therapy (Ghosh, Antani, Long, Thoma, 2011, de Bruijne, 2016, Kalpathy-Cramer, Herrera, Demner-Fushman, Antani, Bedrick, Muller, 2015). Medical image classification, a fundamental step in medical image analysis, aims to distinguish medical images according to a certain criterion, such as clinical pathologies or imaging modalities. A reliable medical image classification system is able to assist doctors in the fast and accurate interpretation of medical images.
Medical image classification has been thoroughly studied during the past decades with a huge number of solutions in the literature (Baloch, Krim, 2007, Song, Cai, Zhou, Feng, 2013, Koitka, Friedrich, 2016), most of which are based on handcrafted features. Despite the success of these methods, it is usually difficult to design handcrafted features that are optimal for a specific classification task. In recent years, deep learning techniques (Simonyan, Zisserman, Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke, Rabinovich, 2015, He, Zhang, Ren, Sun, 2016, Chen, Qi, Yu, Dou, Qin, Heng, 2017, Li, Zeng, peng, Ji, 2017), especially deep convolutional neural networks (DCNN), have led to significant breakthroughs in medical image classification (Koitka, Friedrich, 2016, Xu, Mo, Feng, Zhong, Lai, Eric, Chang, 2014, Shen, Zhou, Yang, Yu, Dong, Yang, Zang, Tian, 2017, Esteva, Kuprel, Novoa, Ko, Swetter, Blau, Thrun, 2017, Personnaz, Guyon, Dreyfus, 1986, Kumar, Kim, Lyndon, Fulham, Feng, 2017, Yu, Lin, Meng, Wei, Guo, Zhao, 2017b), and medical image segmentation (Dong, Yang, Liu, Mo, Guo, 2017, Soltaninejad, Zhang, Lambrou, Yang, Allinson, Ye, 2017). However, although these methods are more accurate than handcrafted feature-based approaches, they have not achieved the same success on medical image classification (Sirinukunwattana, Ahmed, Tsang, Snead, Cree, Rajpoot, 2016, Xie, Fan, Li, Jiang, Meng, Bovik, 2017) as they have done in the ImageNet Challenge (Deng, Dong, Socher, Li, Li, Fei-Fei, 2009, Krizhevsky, Sutskever, Hinton, 2012). The suboptimal performance is attributed mainly to two reasons.
First, deep models may overfit the training data, which is far from adequate, as there is usually a small dataset in medical image analysis and this relates to the work required in acquiring the image data and then in image annotation (Weese and Lorenz, 2016). To address this issue, pre-trained deep models have been adopted, since it has been widely recognized that the image representation ability learned from large-scale datasets, such as ImageNet (Deng et al., 2009), can be efficiently transferred to generic visual recognition tasks, where the training data is limited (Zhou, Shin, Zhang, Gurudu, Gotway, Liang, 2017, Ravishankar, Venkataramani, Thiruvenkadam, Sudhakar, Vaidya, 2017, Oquab, Bottou, Laptev, Sivic, 2014, Mettes, Koelma, Snoek, 2016).
Second, and more significantly, the intra-class variation and inter-class similarity pose an even greater challenge to the classification of medical images (Song et al., 2015). As an example shown in Fig. 1(a)–(d), the separation of images from computed tomography (CT) and magnetic resonance (MR) imaging scanners is difficult in: (1) both CT and MR images provide the anatomical information about the body parts that are imaged, and hence share many visual similarities and non-professionals can have difficulty in separating them (see Fig. 1(a) vs (c), or Fig. 1(b) vs (d)); and (2) images from the same modality will differ depending upon the anatomical location and individual variability (see Fig. 1(a) vs (b), or Fig. 1(c) vs (d)). Another example shown in Fig. 1(e)–(h) is the separation of malignant skin lesions from benign ones. It reveals that there is a big visual difference between the benign skin lesions (e) and (f) and between malignant ones (g) and (h). Nevertheless, the benign skin lesions (e) and (f) are even more similar to the malignant lesions (g) and (h), respectively, in the shape and color. To address this challenge, human observers focus more on the ambiguity caused by hard cases, which may provide more discriminatory information than easy ones (Bengio et al., 2009). The pair-wise learning strategy is an effective technique that learns from pairs of samples and captures more information in favor of distinguishing hard cases.
Handcrafted feature-based medical image classification:The descriptors for color, texture, and shape and combined descriptors have been widely used in medical image classification. Baloch and Krim (2007) proposed a flexible skew-symmetric shape model to capture shape variability within a certain neighborhood and account for all potential variability. Song et al. (2013) designed a novel texture descriptor to represent rich texture features by integrating multi-scale Gabor filters and local binary patterns (LBP) histograms for lung tissue classification. Koitka and Friedrich (2016) extracted up to 11 handcrafted visual descriptors and jointly used them for modality based medical image classification. Compared with these handcrafted feature-based methods, the proposed SDL can learn the discriminative feature representation from data adaptively and effectively.
Deep learning-based medical image classification:DCNN models provide a unified feature extraction-classification framework to free human users from the troublesome handcrafted feature extraction for medical image classification. Xu et al. (2014) adopted a DCNN to minimize manual annotation and produced good feature representations for histopathological colon cancer image classification. Shen et al. (2017) proposed a multi-crop pooling strategy and applied it to a DCNN to capture object salient information for lung nodule classification on chest CT images. Esteva et al. (2017) trained a DCNN using 129,450 clinical images for diagnosing the most common and deadliest skin cancers and achieved the performance that matches the performance of 21 board-certified dermatologists. Koitka and Friedrich (2016) extracted the output of the last fully connected layer in a pre-trained ResNet-152 model and adopted them to train a custom network layer using the pseudo-inverse method (Personnaz et al., 1986). Kumar et al. (2017) integrated two different pre-trained DCNN architectures and combined them into a stronger classifier. Yu et al. (2017b) presented an ensemble of multiple pre-trained ResNet-50 and VGGNet-16 models and multiple fully-trained DCNNs by calculating the weighted sum of predicted probabilities. In our previous work (Zhang, Xia, Xie, Fulham, Feng, 2018a, Xie, Zhang, Xia, Fulham, Zhang, 2018), we jointly used deep and handcrafted visual features for medical image classification and found that handcrafted features were able to complement the image representation learned by DCNNs on small training datasets. Different from these networks, the proposed SDL model simultaneously takes multiple images as input, and thus enables multiple DCNN components mutually improve each other for learning better discriminative representation.
Pair-wise learning:In the past decade, the pair-wise learning strategy has been applied to various perception tasks, such as signature verification (Bromley et al., 1994), face verification (Chopra et al., 2005), speech analysis (Kamper, Elsner, Jansen, Goldwater, 2015, Kamper, Wang, Livescu, 2016, Renshaw, Kamper, Jansen, Goldwater, 2015), and natural language processing (Mueller and Thyagarajan, 2016). Bromley et al. (1994) described a Siamese neural network for verification of signatures written on a pen-input tablet by comparing the distance which is cosine of the angle between an extracted feature vector and a stored feature vector. Chopra et al. (2005) presented a general discriminative method for learning a similarity metric from data pairs by minimizing a discriminative loss function that enlarges the metrics of pairs of faces from the same person and narrows the pairs from different persons. Recent years have witnessed the widespread applications of pair-wise learning in unsupervised speech feature learning. Kamper et al. (2015) proposed an unsupervised deep auto-encoder feature extractor for zero-resource speech processing by using weak top-down supervision from word pairs obtained by an unsupervised term discovery system. Kamper et al. (2016) also used word pairs to train a Siamese DCNN that takes a pair of speech segments as input and uses a hinge loss to classify same-word pairs and different-word pairs. Renshaw et al. (2015) claimed that guiding the representation learning using word pairs provides a major benefit over standard unsupervised methods. Pair-wise learning has also been applied to natural language processing. Mueller and Thyagarajan (2016) presented a Siamese recurrent architecture which is trained on paired examples to learn a highly structured space of sentence representations that captures rich semantics for learning sentence similarity. Different from the traditional pair-wise learning, the SDL model avoids handcrafted design of tricky distance metric loss functions for optimization, and automatically learns whether image pairs belong to the same category or not by using a cross-entropy loss function. Besides, the SDL model supports the simultaneous learning of multiple image pairs, which works with multiple DCNN components on the premise of not sharing parameters such that the model can benefit from an ensemble of multiple networks.
In this paper, we propose a synergic deep learning (SDL) model to learn the discriminative representation simultaneously from pairs of images, which include both similar images in different categories and dissimilar images in the same category, for medical image classification. The SDL model consists of n pre-trained DCNNs and synergic networks. Each DCNN learns image representation and classification, and each pair of DCNNs has their learned image representation concatenated as the input of a synergic network, which has a fully connected structure, to predict whether the pair of input images belongs to the same class or not. Thus, the SDL model can be trained in an end-to-end fashion under the supervision of both the classification error from each DCNN and the synergic error from each pair of DCNNs. We have evaluated the proposed model on the 2015/2016 Image Cross Language Evaluation Forum (ImageCLEF) subfigure classification challenge datasets, and the 2016/2017 International Skin Imaging Collaboration (ISIC) skin lesion classification challenge datasets. Our results suggest that the SDL model achieves the state-of-the-art performance on these four medical image classification tasks.
The main contributions of this paper are three-fold. First, we propose the SDL model that learns the discriminative feature representation from multiple images simultaneously including both similar inter-class images and dissimilar intra-class images. Second, we enable each pair of DCNNs in the SDL model to mutually facilitate each other during the learning process, since, if one DCNN makes correct decision, the mistake made by the other DCNN may lead to a synergic error that serves as an extra force to learn the discriminative representation. Finally, we achieve the state-of-the-art performance on the ImageCLEF-2015, ImageCLEF-2016 Subfigure Classification datasets, ISIC-2016 and ISIC-2017 Skin Lesion Classification datasets.
A pilot data of this work was presented in MICCAI 2018 (Zhang et al., 2018b). In this paper, we have substantially revised and extended the conference paper. The main extension includes that (1) the SDL model was generalized from a special version SDL2 which has only two DCNN components, to a generalized version SDLn with n DCNNs, and the generalization leads to improved performance in medical image classification; and (2) the proposed model was evaluated not only on pathology-based image classification datasets (i.e. ISIC-2016 and ISIC-2017 datasets), but also on modality-based image classification datasets (i.e. ImageCLEF-2015 and ImageCLEF-2016 datasets).
Section snippets
Datasets
For this study, we use four medical image classification datasets, including two modality-based medical image classification datasets, i.e. ImageCLEF 2015 (de Herrera et al., 2015) and ImageCLEF 2016 (de Herrera et al., 2016) datasets, and two pathology-based medical image classification datasets, i.e. ISIC-2016 (Gutman et al., 2016) and ISIC-2017 (Codella et al., 2018) datasets.
ImageCLEF-2015, ImageCLEF-2016:Recognizing the increasing complexity of images in biomedical literatures, ImageCLEF
Experimental settings
To alleviate the overfitting of deep models, we employed two data argumentation (DA) strategies to enlarge the training dataset. The first strategy (DA1) is to use the ImageDataGenetrator toolbox (Chollet et al., 2015) to apply geometric transformations to training images, including random rotation (), shifts (0 ∼ 10% of total width and height), shear (0 ∼ 0.1 radians in the counter-clockwise direction), zoom (90% ∼ 110% of width and height), and horizontally and vertically flip. The
Stability interval of hyper parameter λ
We used the SDL2 model as a case study to evaluate the impact of hyper parameter λ on the classification performance. Fig. 5 shows the accuracy obtained by applying the SDL2 model with different values of λ to four datasets. It reveals that, when λ takes a value from the range [3, 8], the SDL2 model achieved good accuracy on all datasets and its performance is relatively robust to the value of λ. Therefore, we suggest taking the value of λ from [3, 8].
Performance without data augmentation
There are several commonly used data
Conclusion
In this paper, we propose the SDL model to address the challenge caused by the intra-class variation and inter-class similarity for medical image classification. This model simultaneously uses multiple DCNNs with synergic networks to enable those DCNNs to mutually learn from each other. Our results on the ImageCLEF-2015, ImageCLEF-2016, ISIC-2016, and ISIC-2017 datasets show that the proposed SDL model achieves the state-of-the-art performance in these medical image classification tasks. In the
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grants 61771397 and 61471297.
References (56)
Machine learning approaches in medical image analysis: from detection to diagnosis
Med. Image Anal.
(2016)- et al.
Dcan: deep contour-aware networks for object instance segmentation from histology images
Med. Image Anal.
(2017) - et al.
Evaluating performance of biomedical image retrieval systems -an overview of the medical image retrieval task at imageclef 2004 - 2013
Comput. Med. Imaging Graphics
(2015) - et al.
Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification
Pattern Recognit.
(2017) - et al.
Four challenges in medical image analysis from an industrial perspective
Med. Image Anal.
(2016) - et al.
Fusing texture, shape and deep model-learned information at decision level for automated classification of lung nodules on chest ct
Inf. Fusion
(2018) - et al.
Deep transfer learning for modality classification of medical images
Information
(2017) - et al.
Flexible skew-symmetric shape model for shape representation, classification, and sampling
IEEE Trans. Image Process.
(2007) - et al.
Curriculum learning
Proceedings of the 26th Annual International Conference on Machine Learning (ICML)
(2009) - Bi, L., Kim, J., Ahn, E., Feng, D., 2017. Automatic skin lesion analysis using large-scale dermoscopy images and deep...
Signature verification using a” siamese” time delay neural network
Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS)
Learning a similarity metric discriminatively, with application to face verification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Medical image classification via 2d color feature based covariance descriptors
Proceedings of CLEF (Working Notes)
Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic)
Proceedings of IEEE 15th International Symposium on Biomedical Imaging (ISBI)
Imagenet: a large-scale hierarchical image database
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Automatic brain tumor detection and segmentation using u-net based fully convolutional networks
Annual Conference on Medical Image Understanding and Analysis
Corrigendum: dermatologist-level classification of skin cancer with deep neural networks
Nature
Review of medical image retrieval systems and future directions
Proceedings of 2011 24th International Symposium on Computer-Based Medical Systems (CBMS)
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Overview of the imageclef 2013 medical tasks.
Proceedings of CLEF (Working Notes)
Overview of the imageclef 2015 medical classification task.
CLEF (Working Notes)
Overview of the imageclef 2016 medical classification task.
CLEF (Working Notes)
Unsupervised neural network based feature extraction using weak top-down constraints
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Deep convolutional acoustic word embeddings using word-pair side information
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Cited by (303)
ECMS-NET:A multi-task model for early endometrial cancer MRI sequences classification and segmentation of key tumor structures
2024, Biomedical Signal Processing and ControlContent-based medical image retrieval using deep learning-based features and hybrid meta-heuristic optimization
2024, Biomedical Signal Processing and ControlSEG-LUS: A novel ultrasound segmentation method for liver and its accessory structures based on muti-head self-attention
2024, Computerized Medical Imaging and GraphicsHSA-net with a novel CAD pipeline boosts both clinical brain tumor MR image classification and segmentation
2024, Computers in Biology and MedicineMedical image identification methods: A review
2024, Computers in Biology and MedicineGannet devil optimization-based deep learning for skin lesion segmentation and identification
2024, Biomedical Signal Processing and Control
Conflicts of lnterest Statement: The authors declare no conflict of interest.