Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures

Almaraz-Damian, Jose-Agustin; Ponomaryov, Volodymyr; Sadovnychiy, Sergiy; Castillejos-Fernandez, Heydy

doi:10.3390/e22040484

Open AccessArticle

Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures

¹

Instituto Politecnico Nacional, Santa Ana Ave. # 1000, Mexico City 04430, Mexico

²

Instituto Mexicano del Petroleo, Lazaro Cardenas Ave. # 152, Mexico City 07730, Mexico

³

Academic Area of Computer and Electronics, Institute of Basic Sciences and Engineering, Universidad Autonoma del Estado de Hidalgo, Pachuca–Tulancingo Highway Km. 4.5, Mineral de la Reforma, Hidalgo 42083, Mexico

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2020, 22(4), 484; https://doi.org/10.3390/e22040484

Submission received: 26 February 2020 / Revised: 16 April 2020 / Accepted: 20 April 2020 / Published: 23 April 2020

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this paper, a new Computer-Aided Detection (CAD) system for the detection and classification of dangerous skin lesions (melanoma type) is presented, through a fusion of handcraft features related to the medical algorithm ABCD rule (Asymmetry Borders-Colors-Dermatoscopic Structures) and deep learning features employing Mutual Information (MI) measurements. The steps of a CAD system can be summarized as preprocessing, feature extraction, feature fusion, and classification. During the preprocessing step, a lesion image is enhanced, filtered, and segmented, with the aim to obtain the Region of Interest (ROI); in the next step, the feature extraction is performed. Handcraft features such as shape, color, and texture are used as the representation of the ABCD rule, and deep learning features are extracted using a Convolutional Neural Network (CNN) architecture, which is pre-trained on Imagenet (an ILSVRC Imagenet task). MI measurement is used as a fusion rule, gathering the most important information from both types of features. Finally, at the Classification step, several methods are employed such as Linear Regression (LR), Support Vector Machines (SVMs), and Relevant Vector Machines (RVMs). The designed framework was tested using the ISIC 2018 public dataset. The proposed framework appears to demonstrate an improved performance in comparison with other state-of-the-art methods in terms of the accuracy, specificity, and sensibility obtained in the training and test stages. Additionally, we propose and justify a novel procedure that should be used in adjusting the evaluation metrics for imbalanced datasets that are common for different kinds of skin lesions.

Keywords:

fusion; handcraft; deep learning; melanoma; convolutional neural networks; transfer learning; computer-aided systems; mutual information; balance; data

1. Introduction

Skin cancer has become one of the deadliest diseases for human beings. Globally, each year, between two and three million non-melanoma (less aggressive) cases occur, and over 130,000 melanoma (aggressive) types are diagnosed [1].

Melanoma is the deadliest type of skin cancer. Australia has the highest rates of skin cancer in the world. In 2018, melanoma accounted for about

22 %

of skin cancer diagnoses, and non-melanoma tumors accounted for about

78 %

[2]. Studies have shown that this disease is caused most of the time by exposure to UV radiation in daylight, tanning on sunbeds, and skin color, among others. Physicians have suggested that the best way to detect a malignant skin lesion of any kind is early detection. The rate of survival increases to almost

99 %

over five years if the disease is spotted in the early stages.

Dermoscopy or Epiluminicence Microscopy (ELM) is a medical method that helps a physician to recognize if a skin lesion belongs to a benign or malignant type of the disease. This method uses a dermatoscope, a tool that consists of a light source and amplification lens to enhance the view of medical patterns such as ramifications, globs, pigmented networks, veils, and colors, among others.

Since the image processing techniques were developed, Computer-Aided Detection (CAD) systems and approaches in the classification [3,4,5,6,7] and segmentation [8] of a Pigmented Skin Lesion (PSL) have been improved, benefiting patient diagnoses in early stages of the disease without shocking or painful medical procedures.

In this work, we propose a novel approach in the detection of a skin lesion among melanoma or nevus types, using handcraft features that depend on shape, color, and texture, which represent the ABCD rule (Asymmetry Borders-Colors-Dermatoscopic Structures), and combining them with deep learning features; these latter features were extracted using the transfer learning method as a generic feature extractor. At the next step, the most important features according to the Mutual Information (MI) metric features should be selected using the fusion technique, aiming at the best performance by taking into account the influences of both sets of features on the binary classification result.

This paper is organized as follows: Section 2 presents a brief review of the methods used in CAD developments with fused features and their nature, Section 3 explains in detail the proposed method, the materials used, and the evaluation metrics employed, and Section 4 describes the experimental results and presents a brief discussion. The conclusions are detailed in Section 5.

2. Literature Survey

Medical detection algorithms are one of the first tools used for determining whether a skin lesion is malignant or benign [9,10,11,12,13]. Nachbar et al. [9] developed a subjective method based on the visual perception of the lesion. ABCD Rule is based on color, shape, and particular structures that appear on skin lesions. Due to the simplicity of the algorithm, it is one of the most practiced for evaluating a lesion by a naked eye exam or using a dermatoscope.

The ABCD medical algorithm is composed of the following parts:

Asymmetry A: The lesion is bisected into two perpendicular axes at 90° of each other, so as to yield the lowest possible asymmetry score. In other words, whether the lesion is symmetrical or not is determined. For each axis, where asymmetry is found, one point is added.
Borders B: The lesion is divided into slices by eight axes determining whether a lesion has abrupt borders. If one segment presents an abrupt border, one point is added.
Colors C: The lesion can contain one or more of the following colors: white, brownish, dark brown, black, blue, and red. They are generated by vessels and melanin concentrations, so for each one color founded, one point per color is added.
Dermatoscopic Structures D: The lesion has the appearance of the following structures: dots, blobs, pigmented networks, and non-structured areas. A point is added for each structure spotted on the lesion.

The described features are weighted as follows:

T D S = 1.3 * A + 0.1 * B + 0.5 * C + 0.5 * D,

(1)

where TDS is the Total Dermatoscopic Score; if it is less than 4.75, it is concluded that the lesion is benign; if the score is between 4.75 and 5.45, the lesion is considered suspicious; if it is more than 5.45, then it is considered malignant.

This algorithm has subversions, where some elements such as dermatoscopic structures are changed by diameter or darkness [14,15]. Additionally, in [16,17], the addition of features known as EFG properties has been suggested; E stands for elevation or evolution, F for firmness, and G for growth. These features work as complementary information obtained from a PSL. Modifications in the ABCD rule are based on simplifying the evaluation of a skin lesion such that anyone can evaluate themselves and record any change in the injury. If the lesion is not identified by these methods, physicians are obliged to initiate invasive methods such as a biopsy to determine their type.

Adjed et al. [18] proposed a method where the aim is the fusion of structural features using Curvelet and Wavelet transform employing the Fast Digital Curvelet Transform (FDCT) wrapping method, and statistical metrics and texture features such as local binary pattern are then computed. They fused around 200 features via concatenation using the PH2 dataset [19].

Hagerty et al. [20] developed a fusion method where deep features are extracted from images using transfer learning method based on the ResNET-50 Convolutional Neural Network (CNN) architecture. However, the question as to which handcraft features are used in their method is not clear. Moreover, they used a feature selection algorithm, in this case, the

χ^{2}

method, for performance revision, employing two datasets: the private set and the second set (a modified version of the ISIC 2018 dataset) [21].

Li et al. [22] used a deep learning approach with the fusion of clinical criteria representations, where as a classifier and fusion method, a boosting tree-learning algorithm called LightGBM is used [23]. This method is applied for color properties (RGB and HSL features), texture properties (SIFT and LBP), and shape properties (solidity and circularity, image ratio, and area ratio). The deep learning features were obtained using the transfer learning method based on the ResNET-50 and DenseNET-201 CNN architectures. Data pertaining to 566 features were processed using the ISIC 2018 dataset [21].

Abbas and Celebi [24] proposed a CAD system where the lesion is processed by a Stack-Based Auto-Encoder (SAE), extracting the deep features from the pixels of a lesion while minimizing the information loss. The handcraft features are extracted for color (the Hill climbing algorithm (HCA)) and for texture (the speed-up robust features (SURF)). Applying a feature fusion approach, they used Principal Component Analysis (PCA), and in the concluding stage, Recurrent Neural Networks (RNN) and A Softmax Linear Classifier were employed.

Among the reviewed methods, most of them use handcraft features and deep learning features with the help of the transfer learning method based on well-known CNN architectures [25,26,27,28,29,30,31]. As one can see, the revised schemes tried to fuse the information extracted from the lesion images, gathering data via the concatenation of the feature vectors, classifiers, and feature selection. The main drawback of such methods is the unawareness of the medical information, which is relevant to physicians besides the extracted data from image processing algorithms. The analyzed methods above employ several possibilities in the fusion of features, but most of them do not consider the importance of each extracted feature according to their nature, which is relevant for a pertinence class. Moreover, some of them lack features based on medical algorithms due to their assumption of the weakness of the perceptual handcraft features based on the subjective visual human system. Modern image processing techniques and machine learning approaches are able to learn the patterns and implement those features, as in a vision scoring system. Finally, some of the reviewed methods attempt to perform a multiclass classification, where the system classifies a lesion image to a specific lesion category. Nevertheless, here a problem endures in the multiclass classification, because the data available for each class are limited, and some of the public databases are not well balanced to perform this classification. As a result, a designed system can perform incorrect classifications. Summarizing, we consider the importance of developing an intelligent system that is able to perform the correct classification of melanoma disease employing both types of features, where medical features have relevance on the classification with the aid of deep learning features, aiming for the best performance.

The novel method considers relevant information obtained from handcraft and deep learning features, improving performance quality presented by commonly used criteria: accuracy, specificity, and sensibility. Different from other schemes, our novel framework encourages the use of ABCD rule features, also known as perceptual features, with a set of features equivalent to or based on a similar medical nature.

Principal Contributions

The principal contributions of our novel approach in the classification of dermoscopic lesions are summarized as follows:

A brief survey of computer-aided detection methods that employ fusion between handcraft and deep learning features is presented.
Despite the new tendencies of avoiding the ABCD medical algorithm or any of its variations, we utilized descriptors based on them, such as shape, color, and texture, as a new aggregation, and the extraction of deep learning features were used afterwards.
A balanced method was employed due to the inconsistency of the ISIC database with respect to classes. A SMOTE oversampling technique was applied, which in this work demonstrates an improvement in performance at the differentiation of melanoma and benign lesion images.
A fusing method that employs relevant mutual information obtained from handcraft and deep learning features was designed, and it appears to demonstrate better performance in comparison with state-of-the-art CAD systems.

3. Materials and Methods

In this section, the proposed system is described. A brief conceptual block diagram of the system is illustrated in Figure 1. As an initial step, the pigmented skin lesion image is segmented from surrounding normal skin tissue and artefacts such as veils, hairs, and air bubbles, among others, by color space transformation, mean thresholding, and extraction of the Region of Interest (ROI). Subsequently, using the binary mask image, the ROI image and a set of handcraft features based on shape, color, and texture are extracted. Thereafter, deep learning features are obtained using a selected CNN architecture, which is pre-trained in an Imagenet classification task; this CNN is employed as a feature extractor. All extracted features are concatenated in one vector, which later is fused according to the MI criterion. The selected classifier is trained on the ISIC dataset comprised of both malignant and other benign skin lesion images. Finally, the trained classifier model is used to predict each unseen pigmented skin lesion image as a benign or malignant lesion. The details of each stage of the proposed method are described in the remainder of this section.

3.1. Preprocessing

An image

I (x, y)

that is analyzed can contain a lesion with some artefacts such as veils, hairs, stamps, among others. In the first step, we apply a preprocessing stage, where an image is enhanced [8,32,33].

A Gaussian filter is applied, and this filter is used to blur the artefacts contained on the image as primary targets: hair, marks, and spots, among others. This maintains the geometric/shape form of the lesion.

The Gaussian filter is denoted as follows:

G (x, y) = \frac{1}{2 π σ^{2}} exp (- \frac{x^{2} + y^{2}}{2 σ^{2}}),

(2)

where

σ^{2}

is the variance of the spatial kernel, this step is shown in Figure 2.

CIEL*a*b* space is characterized to more closely approximate the human perception system, where there are channels: L stands for lightness; a* and b* stand for chroma channels, where a* is a parametric measure between magenta and green, and b* is a parametric measure between blue and yellow. The L channel presents values between

[0, 100]

, and a* and b* chroma channels have values between

[- 30, 30]

. This transformation is used to avoid the correlation between channels, while keeping the perceptual data intact, such as a pigmented skin lesion that is darker than healthy skin tissue, as one of the sub-variants of the ABCD algorithm states [14].

In each channel of CIEL*a*b* for images

I_{L}, I_{a}, I_{b}

, the mean thresholding procedure is applied. Such thresholding allows one to differentiate skin tissue from lesion tissue. In Figure 3, one can see how the CIEL*a*b* space is able to visually separate this information. These mean values are calculated as follows:

\bar{I_{L}} = \sum_{x = 1}^{m} \sum_{y = 1}^{n} \frac{1}{m n} I_{L} (x, y),

(3)

\bar{I_{a}} = \sum_{x = 1}^{m} \sum_{y = 1}^{n} \frac{1}{m n} I_{a} (x, y),

(4)

\bar{I_{b}} = \sum_{x = 1}^{m} \sum_{y = 1}^{n} \frac{1}{m n} I_{b} (x, y),

(5)

where

(x, y)

are the spatial coordinates,

m, n

are the sizes of an image, and

\bar{I_{L}}, \bar{I_{a}}, \bar{I_{b}}

denote the mean values. The thresholding operation is applied in each channel of the CIEL*a*b* space, forming the thresholded channel images as follows:

I_{T h_{L}} (x, y) = {\begin{cases} 1, & I_{L} (x, y) \geq \bar{I_{L}}, \\ 0, & o t h e r w i s e, \end{cases}

(6)

I_{T h_{a}} (x, y) = {\begin{cases} 1, & I_{a} (x, y) \geq \bar{I_{a}}, \\ 0, & o t h e r w i s e, \end{cases}

(7)

I_{T h_{b}} (x, y) = {\begin{cases} 1, & I_{b} (x, y) \geq \bar{I_{b}}, \\ 0, & o t h e r w i s e . \end{cases}

(8)

Afterwards, the following logic operation is applied on each binarized image,

I_{T h_{L}}, I_{T h_{a}}

, and

I_{T h_{b}}

, to form a binary mask

I_{b i n} (x, y)

of the image.

I_{b i n} (x, y) = I_{T h_{L}} \cap I_{T h_{a}} \cap I_{T h_{b}} .

(9)

Example of extracted Binary Masks images are given in Figure 4. Finally, a median filter with a kernel

5 \times 5

is applied to the

I_{b i n} (x, y)

, removing the remaining artefacts, which resists thresholding. Next, a bounding box algorithm is performed. The bounding box [34] is a method that is used to compute an imaginary rectangle that completely encloses the given object. This rectangle can be determined by the x- and y-axis coordinates in the upper-left and lower-right corners of a shape. This method is commonly used in object detection tasks because it estimates the coordinates of the ROI in an image.

Bissotto et al. [35] have shown the effect of the bias between different types of image segmentation, where those biases can negatively affect the performance of classification models. They consider that the usage of the bounding box algorithm to segment a lesion is appropriate because a CNN architecture can extract all the relevant features of a lesion and distinguish it from the surrounding healthy skin. Therefore, we consider this solution to reduce the bias of the classification model before processing.

3.2. Handcraft Features

The ABCD rule represents a set of perceptual features stated by the findings of patterns in PSLs. The ABCD method employs features that are based mostly on shapes, color, and texture. The selected features in this study are the representation of medical attributes using image processing algorithms.

Sirakov et al. [36] proposed a method to estimate the asymmetry of a lesion based on the binary mask

I_{b i n}

, which is obtained from the previous thresholding step. Then, by rotating it through 180°, the symmetry mask

S_{I_{b i n}}

is formed, and the synthetic image A is calculated as follows:

A = I_{b i n} \cup S_{I_{b i n}}

(10)

where A is the generated image, which contains the non-overlapping regions of the lesion called false symmetry

F S

, therefore applying

S y m_{0 °} = 1 - (F S / A),

(11)

and this technique is applied on the 0° axis of the binary image.

In this study, a variation of the previous method is proposed, where from the A generated image is rotated from the major axis and the minor axis, by applying the same procedure again and finally, computing the average symmetry value between the two axes, as follows:

S y m m e t r y = \frac{S y m_{0 °} + S y m_{90 °}}{2} .

(12)

The symmetry values belong to the interval [0, 1], where, if this index approaches the highest value (1), a lesion is more symmetric. The Figure 5 shown the extracted ROI images.

3.2.1. Shape Features

Shape features or geometric features [34] can describe an object or a form in numerical values to represent human perception.

For shape features, the following equations are employed:

A r e a = \sum_{x = 1}^{m} \sum_{y = 1}^{n} I_{b i n} (x, y),

(13)

where

m, n

are the sizes of the image, and

x, y

are the spatial coordinates; therefore, the area consists of the amount of pixels contained in the ROI of a lesion.

P e r i m e t e r = \sum_{i = 1}^{m} \sqrt[2]{(x_{1} - x_{i - 1}) + (y_{1} - y_{i - 1})},

(14)

where

(x, y)

are the spatial coordinates of the

i - t h

pixel, which constructs the contour of the region, and the perimeter contains the amount of pixels around the ROI of a lesion.

C i r c u l a r i t y = \frac{4 π \cdot A r e a}{P e r i m e t e r^{2}},

(15)

where the circularity shows the similarity between a shape and a circle.

D i a m e t e r = \sqrt[2]{\frac{1}{2} (μ_{2, 0} + μ_{0, 2}) \pm \sqrt[2]{4 μ_{1, 1}^{2} - {(μ_{2, 0} - μ_{0, 2})}^{2}}},

(16)

where the diameter is formed by obtaining the length of the major axis and the minor axis of the shape, computed from the 2nd central moment. This measure connects two pairs of points on the perimeter of the shape.

E c c e n t r i c i t y = \frac{{(μ_{0, 2} - μ_{2, 0})}^{2} + 4 μ_{1, 1}}{A},

(17)

which measures the aspect ratio of the length of the major axis to the length of the minor axis.

3.2.2. Colour Features

Medical algorithms, in particular the ABCD rule, try to present, as features, a set of colors contained on a PSL. Therefore, these features can be replaced by statistical characteristics obtained from color spaces. In this study, the following characteristics are used:

M i n_{c h} = m i n [I_{c h} (x, y)],

(18)

M a x_{c h} = m a x [I_{c h} (x, y)],

(19)

V a r_{c h} = v a r [I_{c h} (x, y)],

(20)

M e a n_{c h} = \bar{[I_{c h} (x, y)]},

(21)

where

I_{c h} (x, y)

is the image of a chosen channel for the PSL image in RGB and CIEL*a*b* color spaces.

3.2.3. Texture Features

Haralick et al. [37] proposed the Gray Level Co-occurrence Matrix (GLCM). This method analyzes the statistical texture features of an image. The texture features provide information about how the gray intensities of the PSL of the image are distributed. GLCM shows how often a gray level occurs at a pixel located in a fixed position, using

P_{d} (i, j)

as the

(i, j)

element of the normalized GLCM;

N_{g}

is the number of gray levels;

σ_{x}, σ_{y}

and

μ_{x}, μ_{y}

are the standard derivations and the mean values among the i and j axes of the GLCM, and are expressed as follows:

μ_{x} = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} i P_{d} (i, j),

(22)

μ_{y} = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} j P_{d} (i, j),

(23)

σ_{x} = \sqrt{\sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} {(i - μ_{x})}^{2} P_{d} (i, j)},

(24)

σ_{y} = \sqrt{\sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} {(j - μ_{y})}^{2} P_{d} (i, j)} .

(25)

The 13 features used in this study are as follows:

A S M = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{d}^{2} (i, j) .

(26)

The angular second moment measures consistency of the gray local values.

C o n t r a s t = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} {(i - j)}^{2} P_{d} (i, j),

(27)

This is the second moment. This characteristic measures the variations between pixels.

C o r r e l a t i o n = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{d} (i, j) \frac{(i - μ_{x}) (j - μ_{y})}{σ_{x} σ_{y}} .

(28)

This is the linear dependency of the gray level values.

V a r i a n c e = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} {(i - μ)}^{2} P_{d} (i, j) .

(29)

This is the second moment. It shows the spread around the mean in the surrounding neighborhood.

I D M = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} \frac{1}{1 + {(i - j)}^{2}} P_{d} (i, j) .

(30)

This is the Inverse Difference Moment (IDM). It shows how close the elements of the GLCM are in their distribution.

E n t r o p y = - \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{d} (i, j) \cdot l n [P_{d} (i, j)] .

(31)

This is the measure of randomness of the gray values in an image.

Additional texture features that are used in this study are based on the difference statistics using the probability

P_{x - y} (k)

that can be written as follows:

P_{x - y} (k) = \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{d} (i, j), k = 0, 1, \dots, N_{g} - 1,

(32)

where

P_{d} (i, j)

is the

(i, j)

th element contained in the GLCM, and

N_{g}

is the number of gray levels.

S u m V a r i a n c e = \sum_{k = 2}^{2 N_{g}} {(k - μ_{x + y})}^{2} P_{x + y} (k),

(33)

S u m E n t r o p y = - \sum_{k = 2}^{2 N_{g}} P_{x + y} (k) l o g [P_{x + y} (k)],

(34)

D i f f e r e n c e V a r i a n c e = \sum_{k = 0}^{N_{G} - 1} {(k - μ_{x - y})}^{2} P_{x - y} (k),

(35)

D i f f e r e n c e E n t r o p y = - \sum_{k = 0}^{N_{g} - 1} P_{x - y} (k) l o g [P_{x - y} (k)],

(36)

I M C o r r_{1} = \frac{H (X Y) - H (X Y_{1})}{m a x [H (X) H (Y)]},

(37)

I M C o r r_{2} = \sqrt{1 - exp {- 2 [H (X Y_{2}) - H (X Y)]}},

(38)

where

H (X), H (Y), H (X Y), H (X Y 1)

and

H (X Y 2)

are denoted as follows:

H (X) = - \sum_{i = 1}^{N_{g}} P_{x} (i) \cdot l o g [P_{x} (i)],

(39)

H (Y) = - \sum_{i = 1}^{N_{g}} P_{y} (i) \cdot l o g [P_{y} (i)],

(40)

H (X Y) = - \sum_{i = 1}^{N_{g}} P_{d} (i, j) \cdot l o g [P_{d} (i, j)],

(41)

H (X Y 1) = - \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{d} (i, j) \cdot l o g [P_{x} (i) \cdot P_{y} (j)],

(42)

H (X Y 2) = - \sum_{i = 1}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{x} (i) \cdot P_{y} (j) \cdot l o g [P_{x} (i) \cdot P_{y} (j)] .

(43)

3.3. Deep Learning Features

Based on the discrete convolution operation,

W (i, j) = (K * I) (i, j) = \sum_{m} \sum_{n} I (i - m, j - n) K (m, n),

(44)

the CNN [38,39] represents one type of method, namely, deep learning strategies [40], the basis of which is to obtain the information of an image

I (i, j)

, using filters

K (m, n)

, which are trained on a neural network as feed-forward and back-propagation, according to

y = z (W * X) + b,

(45)

where W are the computed values for the filters, z is the activation function, X is the input, and b is the bias [40,41].

The design of an architecture of a CNN is a rather complex task due to the statement of parameters such as the number and size of the filters, and the depth, even those that are task-related.

3.3.1. Transfer Learning

The main problem of using a deep learning approach is that a large amount of data is needed to train the network from scratch. Usually, to overcome this problem, the transfer learning method [42,43,44] is applied.

Transfer learning is a technique that can be defined as the generalization of a target task based on applied knowledge extracted from one or more source tasks [43]. The idea originates from human thinking: we do not learn exactly how to recognize a chair, a plane, or a book; we start by recognizing colors, shapes, and textures, and someone else then tells us how to differentiate a chair from an apple. This sharing of knowledge between beings helps us to understand the world as infants. Another idea is handling information collected for a task to solve related ones. Therefore, transfer learning can be defined as follows:

Assume a domain D, which consists of two components:

D = {χ, P (X)},

(46)

where

χ

is a feature space, and there is a marginal distribution:

P (X), X = {x_{1}, \dots, x_{n}}, x_{i} \in χ

.

Given a task T with two components,

T = {γ, P (Y | X)} = {γ, η}; Y = {y_{1}, \dots, y_{n}}, y_{i} \in γ,

(47)

where

γ

is a label space with a predictive function;

η

is trained from

(x_{i}, y_{i}), x_{i} \in χ, y_{i} \in γ

, for each feature vector in the D domain;

η

predicts the corresponding label

η (x_{i}) = y_{i}

.

The paper [43] states self-called scenarios:

Given a source domain and an objective domain

D_{S}

&

D_{T}

, where

D = {χ, P (X)}

and the related tasks are

T_{S}

&

T_{T}

, where

T = {γ, P (Y | X)}

, the conditions can vary as follows:

$χ_{S} \neq χ_{T}$ : Feature spaces are different.
$P (X_{S}) \neq P (X_{T})$ : The marginal probabilities of the distributions are different.
$γ_{S} \neq γ_{T}$ : The label spaces are different.
$P (Y_{S} | X_{S}) \neq P (Y_{T} | X_{T})$ : The conditional probabilities are different.

Transfer learning is defined in [44]. Given a source domain

D_{S}

with a corresponding source task

T_{s}

and a target domain

D_{T}

with a corresponding task

T_{T}

, transfer learning is the process of enhancement for a target predictive function

f_{T} (\cdot)

using the related information from the domain source

D_{S}

and the task-related

T_{S}

.

CNN architectures are overlay filters that sample the data contained in an image. Therefore, these filters are hierarchical representations called feature maps when all filters learn all the features based on image data. These are connected to the last layer of the CNN architecture, which is a neural network classifier referred to in the literature as a fully connected layer [40].

Moreover, CNN architectures belong to a class of inductive learning algorithms, where the objective of these algorithms is to map input features between classes seeking the generalization of the data.

Therefore, inductive learning can be transferred from an architecture trained on the source task to the target class, and this is done by adjustments of the model space, correcting the inductive bias. Commonly, this can be performed by replacing the last layer of the model, which is the classifier, from the original one to a lightweight classifier, which should be trained on the generalized features.

In this study, we employed the transfer learning method on architectures that are pre-trained on a similar task. In this research, the Imagenet classification task [45] was employed, by the assumption that

D_{S} = D_{T}

and the CNN Architecture should perform as a generic feature extractor.

3.3.2. Feature Extractor

In our case, the CNN Architecture was used as a deep feature extractor, where features are extracted by the following:

Consider an image

I (x, y)

of the domain

D_{T}

that is mapped or transformed by

W = {w_{1}, \dots, w_{n}}; w_{1} \in R^{M \times N \times h},

(48)

where W is the weight computed by the feature extractor, where

M, N,

and h are the proposed size of the CNN architecture. As a result, we can obtain

P = W (I (x, y)) = {w_{1} (I (x, y)), \dots, w_{n} (I (x, y))}; \in R^{M \times N \times h},

(49)

and the pooling transformations are based on

Q = f (0, P) = {f (0, w_{1} (I (x, y))), \dots, f (0, w_{n} (I (x, y)))},

(50)

where

f (\cdot)

is a mapping function.

3.3.3. Deep Learning Architectures

Below, we use the following architectures: VGG-16/VGG-19 [26], MobileNet [27], ResNet-50 [28], Inception v3 [29], Xception [30], and DenseNet-201 [31]. The selection of these architectures was based on the fact that they have been shown to obtain the Top-1 and Top-5 best accuracy and error rates on the Imagenet task proposed for [45], which is an image classification task.

In Table 1, the number of features extracted by each utilized architecture is explained.

The features for each selected architecture are concatenated with the extracted handcraft features, forming a unique feature vector:

F = D_{e e p} [# F e a t u r e s_{E x t r a c t e d}] \cup H_{a n d} [A s y m e t r y, A r e a, \dots, E n t r o p y, C o n t r a s t, S t d \dots, M a x, M i n, \dots],

(51)

where the size of vector F is equal to 43 for the handcraft features, plus the number of features extracted from the CNN architecture used.

3.4. Algorithm Summary

Let the sum of all proposed and explained procedures be described in the form of an algorithm for extraction features for PSL Images. The proposed CAD system consists of four principal stages: (a) preprocessing, (b) handcraft features, (c) deep learning features, and (d) the fusion stage. In the first stage, artifacts are removed by using a Gaussian filter after a CIELab color transformation is employed. Mean thresholding per channel is employed afterwards. We then extract the ROI by using the boundary box algorithm. In the second stage, shape features are extracted using Equations (12)–(17), Equations (18)–(21) are computed for statistical color features, and texture features are then calculated from Equations (22)–(38). The ROI image is finally processed by the chosen CNN Architecture, whose features are concatenated for the following steps.

Algorithm 1 presents the details of the feature extraction process for PSL images.

3.5. Feature Selection

After extracting the deep learning and handcraft features, the number of features is reduced, which is a typical problem in machine learning algorithms, because high-dimensional data increases the computing time for a prediction.

Feature selection is one method to resolve this problem. In [46], filtering methods are applied, and the features are selected based on various statistical tests, such as

χ^{2}

, ANOVA, Linear discriminant analysis (LDA) among others [47].

The extracted data can be represented in the form of a high-dimensional matrix defined by

X \in R^{n \times p}

(52)

where

X

is the extracted data, n represents the instances or elements, and p represents the features extracted for each element.

The idea is to reduce the data as much as possible, such that the features are extracted by selecting a subset of each element that is relevant to the pertinence category or label y. This subset is defined as

X_{S} \in R^{n \times k}

(53)

where

X_{S}

represents the reduced data, n is the same instance or element of the original data matrix, and k is the selected or reduced features based on

k < < p

.

Mutual Information

In this work, we propose the MI metric to reduce the data of the extracted features. MI is a measure based on the entropy measure:

I (X; Y) = H (X) - H (X | Y) = \sum_{y \in Y} \sum_{x \in X} P (X \cap Y) l o g_{2} (\frac{P (X \cap Y)}{P (X) P (Y)})

(54)

where

X = x_{1}, \dots, x_{n}

and

Y = y_{1}, \dots, y_{n}

in the multi-variable case,

H (X | Y)

is the conditional entropy between two random variables, and

H (X)

is the entropy of a random variable [48,49,50].

H (x) = - \sum_{i = 1}^{n} P (x_{i}) l o g_{2} P (x_{i})

(55)

Algorithm 1 Algorithm summary

Require: PSL Image

(a)Preprocessing

1:: Input: I
: Apply Gaussian Filter, Equation (2)
: Apply RGB to CIEL*a*b* Color Transform
: Separate CIEL*a*b* Image $I_{L a b}$ in Image Channels $I_{L}, I_{a}, I_{b}$
: Calculate Mean value of $I_{L}, I_{a}, I_{b}$ , Equations (3)–(5)
2:: for all $(i, y) \in I_{L}$ do
3:: if $I_{L} (x, y) \geq \bar{I_{L}}$ then
4:: Assign 1 to $I_{T h_{L}} (x, y)$
5:: else
6:: Assign 0 to $I_{T h_{L}} (x, y)$
7:: end if
8:: end for
9:: for all $(i, j) \in I_{a}$ do
10:: if $I_{a} (x, y) \geq \bar{I_{a}}$ then
11:: Assign 1 to $I_{T h_{a}} (x, y)$
12:: else
13:: Assign 0 to $I_{T h_{a}} (x, y)$
14:: end if
15:: end for
16:: for all $(i, j) \in I_{b}$ do
17:: if $I_{b} (x, y) \geq \bar{I_{b}}$ then
18:: Assign 1 to $I_{T h_{b}} (x, y)$
19:: else
20:: Assign 0 to $I_{T h_{b}} (x, y)$
21:: end if
22:: end for
: Compute $I_{b i n} (x, y)$ Applying Equation (9) to $I_{T h_{L}}, I_{T h_{a}}, I_{T h_{b}}$
: Apply Median Filter, size $= 5 \times 5$
: Compute Bounding Box Algorithm to estimate coordinates of Region Of Interest
: Crop $I_{r o i} (x, y)$ from estimated coordinates on $I (x, y)$ & $I_{L a b} (x, y)$
23:: Output: Region of Interest (ROI) Image $I_{r o i} (x, y)$

(b)Handcraft Features

24:: Input: $I_{R O I}$
: Compute Area, Perimeter, Circularity, Diameter and Eccentricity from Equations (13)–(17)
: Compute Asymetry from Equations (10)–(12)
: Compute Color Features from Equations (18)–(21) from $I_{r o i} (x, y)$
: Compute Texture Features from Equations (22)–(43) from $I_{r o i} (x, y)$
25:: Concatenate the extracted features H
26:: Output: H Handcraft features

(c)Deep Learning features

27:: Input: $I_{r o i}$
28:: Load the weights $W_{i}$ from selected CNN architecture
29:: Apply the weights $W_{i}$ to $I_{r o i}$
30:: Obtain the D deep learning features
31:: Output: D deep learning features
: (D)Wrapping Features
32:: Input: $D, H$
33:: Apply $H \cup D$ to the extracted features
34:: Output: F Full set of extracted features

Ross, in [51], proposed an MI estimator for continuous and discrete data aiming at the relationship between datasets.

Based on the nearest neighborhood rule, the idea is to find a k-nearest neighbor between a point i among all the data points

N_{x i}

using the Chebyshev distance metric:

D_{c h e b y s h e v} = max_{i} (| x_{i} - y_{i} |),

(56)

and the MI measure is then computed as follows:

I (X, Y) = 〈 I_{i} 〉 = ψ (N) - 〈 ψ (N_{x}) 〉 + ψ (k) - 〈 ψ (m) 〉,

(57)

where

ψ (\cdot)

is the digamma function,

〈 N_{x} 〉

is the average of the data points, k is the number of k-closest neighbors to the point i, and

〈 m 〉

is the average counted neighbors among the full dataset.

If our dataset contains continuous data points, these are discretized using a binning method—grouping the data into bins—leading to a binned approximation of MI as follows:

I (X, Y) = {〈log \frac{p (x_{i}, b_{i})}{p (x_{i}) p (b_{i})}〉}_{i} .

(58)

After applying the MI method, a new vector is delivered, and it shows the value of MI per concrete feature among all obtained features. Next, the mean value is calculated for the MI among all features. We proposed to use this value as a threshold to discard the features with the lowest MI values and to keep the features with the highest MI values. Table 2 and Table 3 expose examples of different parts of the fused feature vector.

L_{f e a t u r e s} (x, y) = F (x, y) > \bar{M . I (x, y)}

(59)

The found subset contains the features with the highest mutual information values that we consider as the fused data of both sets of extracted features from a PSL. Below, as an example, we present the behavior of the features and their MI values.

Table 2 and Table 3 expose, for illustration, several MI values between the extracted features for the dataset of images. Some of the deep learning features appear to demonstrate negligible MI values for the binary classification problem, according to the complete set of features extracted from the database, as one can see in Table 2.

Table 3 exposes several features with significant values of MI that are merged with additional significant handcraft features, forming the final set of features for the proposed system. Therefore, the proposed fusion method based on MI measurements demonstrates the fusion for both types of features in accordance with their influence on the binary classification problem.

Therefore, in contrast with state-of-the-art techniques that use Concat, PCA-based, and

χ^{2}

-based methods, among others, for the selection of significant features, the proposed approach employs the information measures, justifying the informative weight of each feature that is used in the classification stage.

4. Results and Discussion

The classifiers: logistic regression, support vector machine with linear and rbf kernel [52], and finally, relevant vector machine [53,54] algorithms were employed in this work. The rationale for the use of different classifiers lies in the fundamental idea of transfer learning, whereby after extracting generic features, a shallow classifier must be applied to test the proposed method.

4.1. Experimental Results

4.1.1. Experimental Setup

The described method was performed on a PC with an Intel^® Xeon E5 1230-V5 CPU, 32GB RAM, NVIDIA GeForce^® 1080Ti with 11 GB RAM, running a Linux 64-bit operating system, Python 3.5, and the libraries: Keras 2.3 [55], Sklearn [56], Mahotas [57], thundersvm [58], and Imblearn [59].

4.1.2. Evaluation Metrics

In this study, we used commonly applied performance metrics: accuracy, sensitivity, specificity, precision, f-score, and the Matthews correlation coefficient:

A c c u r a c y = \frac{t p + t n}{t p + t n + f p + f n} .

(60)

The accuracy value measures the appropriate classifications over the total elements evaluated.

S e n s i b i l i t y = \frac{t p}{t p + f n} .

(61)

The sensibility value, also known as recall, measures the number of positive elements that are correctly classified.

S p e c i f i c i t y = \frac{t n}{t n + f p} .

(62)

The specificity value measures the number of negative elements that are correctly classified.

P r e c i s i o n = \frac{t p}{t p + f p} .

(63)

The precision value measures the number of elements that are correctly classified among all the positive elements to evaluate.

F_{S c o r e} = \frac{2 t p}{2 t p + f p + f n} .

(64)

The F-score value measures the harmonic mean between precision and recall.

These criteria are described in terms of tp, tn, fp, and fn, which denote true positive, true negative, false positive, and false negative, respectively. Additionally, to characterize the classifier performance, we used the Matthews correlation coefficient [60]:

M C C = \frac{(t p \times t n) - (f p \times f n)}{\sqrt[2]{(t p + f p) (t p + f n) (t n + f p) (t n + f n)}},

(65)

where the MCC value measures the performance of the classification model as a coefficient between the predicted and the observed elements of the binary classification. It returns a value between

[- 1, 1]

, where a value of 1 represents a perfect prediction, a value of 0 is no better than a random prediction, and

- 1

indicates total disagreement between prediction and observation.

4.1.3. Dataset

This study uses the public ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection grand challenge datasets [21] task 3, also known as HAM10000 [61], which contains 10,015 separate images, as shown in Figure 6, where AKIEC corresponds to Actinic Keratosis, BCC is Basal Cell Carcinoma, DF is a Dermatofibroma, MEL is Melanoma, NV is Nevus, BKL is Pigmented Benign Keratosis, and VASC is Vascular. This distribution was obtained from the Ground Truth file, and each image is on RGB space with a size of

450 \times 600

.

As one can see in Figure 6, the ISIC dataset contains images that belong to different types of skin lesions that do not correspond to the melanoma-type lesion. Therefore, we decided to modify the dataset to develop a binary class classification by excluding all the classes, except for Melanoma and Nevus.

We split the ISIC dataset:

75 %

for the training set and

25 %

for test set. The features extracted were processed by Z-score normalization:

Z = \frac{x - μ}{σ} .

(66)

4.1.4. Balance Data

The adjustment of the ISIC dataset mentioned above means that the data are unbalanced, where one class contains more data than others. In most machine learning techniques, when they employ unbalanced data, this can result in a lower performance of the minority class, which can cause misclassification of the data.

SMOTE [62] is a data augmentation method that can oversample the data of the minority class, compensating with the majority class. This method is based on K-NN clustering and Euclidean distance, selecting two points of the minority class and computing a new one based on them. This method is iterative until it reaches an equivalent amount of information of the majority class.

The SMOTE technique has been employed in several studies [63,64], where extracted features that belong to an unbalanced dataset are oversampled to compensate for the number of instances between classes. In this work, we apply this method to compensate the data of the melanoma class against nevus ones to the selected features by the MI criterion, and as result, a balanced dataset with the fused features can be obtained.

The study [65] introduces new metrics to overcome this problem, where geometric mean attempts to maximize the accuracy of each of the two classes similarly, a performance metric that correlates both objectives:

G_{m e a n} = \sqrt[2]{S e n s i t i v i t y \cdot S p e c i f i c i t y} .

(67)

Dominance is aimed at quantifying the prevalence relation between the majority and minority classes and is used to analyze the behavior of a binary classifier:

D o m i n a n c e = S e n s i t i v i t y - S p e c i f i c i t y .

(68)

The Index of Balanced Accuracy (IBA) is a performance metric in classification that aims to make it more sensitive for imbalanced domains. This metric is defined as follows:

I B A = (1 - D o m i n a n c e) \cdot G_{m e a n}^{2} .

(69)

The objective of this procedure is to moderately favor the classification models with a higher prediction rate of the minority class, without underestimating the relevance of the majority class.

4.1.5. Experimental Results

In the following tables, the experimental results of binary classification for balance data are presented.

The experimental results in Table 4 show that the designed system appears to demonstrate sufficiently good performance when different CNN architectures are fused with handcraft features in accordance with the MI metric that can seek relevant information among features against concatenating or discriminative analysis.

Table 4 provides the experimental results obtained using the selected CNN architectures, where the Mobilenet v2 appears to demonstrate the best performance in comparison to the before-mentioned architectures. The proposed method shows notable evaluation metrics such as accuracy, Area Under Curve (AUC), and IBA metrics. The selected features contain the fused features that are the most relevant for the classification of a lesion with the MI metric.

Presented in Table 5, experimental results for different criteria show that the designed system outperforms several state-of-the-art methods. The experimentally justified performance, in our view, is due to the fusion technique employed, where a mutual information metric seeks relevant information among features upon concatenating or discriminative analysis. Moreover, the IBA metric is employed, achieving a value of 0.80, which confirms the stability and robustness of the system when balanced data are used.

The proposed method achieves an accuracy of

92.40 %

, sensitivity of

86.41 %

, AUC of

89.64 %

, and an IBA of 0.80. In [24], the authors proposed the usage of the complete PSL image, which contains healthy skin and artefacts. Whereby, there is a probability of generating misclassification of the recognizing patterns that belong to these objects. By contrast, the proposed system extracts the proposed features from the region of interest of an image only for the entire classification process. This guarantees that the feature extraction is performed exactly in the lesion image.

Moreover, our novel CAD employed the ISIC 2018 database [21] that contains more than 10,000 dermoscopy images that are authenticated by experts. Additionally, because of the unbalanced data that are present in this database, we applied the data augmentation procedure given in Section 4.1.4. This guarantees the robustness of the obtained classifications results. In contrast, the CAD DermoDeep system [24] obtained the experimental results using a synthetic database constructed by four different databases (private and public). In this case, an equal number of melanoma and benign skin lesions were subjectively selected from each of the four databases. This system showed slightly improved performance results as those reported in this study. In our opinion, such an approach does not guarantee that the same high performance can be repeated using data that are not previously preselected.

The proposed system used a data augmentation technique and presented the performance analysis for all images contained in the database and not only those that have been preselected according to a subjective criterion that has no statistical justification.

Finally, our proposed system was developed with medical-based and deep learning features, whereby the system employed data from both sets of features and merged them, applying the MI criterion. As a result, the system enhances the recognition of melanoma and nevus lesions compared to the use of a fully deep learning approach that is extremely computationally expensive to train, requires substantial amounts of labeled data and does not recognize dermoscopic features established in the ABCD algorithm.

5. Conclusions and Future Work

In this study, a novel competitive CAD system was designed to differentiate melanoma from nevus lesions. Different from commonly proposed CAD systems, the novel method employs handcraft features based on the medical algorithm ABCD rule and deep learning features and applies a transfer learning method as a feature extractor. Additionally, in the proposed system, the set features are fused using an MI metric that, in contrast with state-of-the-art systems, can select the most significant features in accordance with their influence on binary classification decisions.

The performance of the proposed system has been evaluated; the system achieved an accuracy of

92.4 %

, an IBA of

0.80

, and an MCC of

0.7953

using a balanced dataset. The system is competitive against the performance of other, state-of-the-art systems.

The proposed CAD system can help inexperienced physicians to visually distinguish the medical features to be applied. Furthermore, it could be used to provide a second opinion to a dermatologist. Our future work will consist of designing a method for multiclass classification using both sets of features, thus permitting the diagnosis of several diseases found in the ISIC challenge dataset.

Author Contributions

Methodology: J.-A.A.-D. and V.P.; software: J.-A.A.-D. and H.C.-F.; formal analysis: J.-A.A.-D., V.P. and S.S.; investigation: J.-A.A.-D. and V.P.; resources: J.-A.A.-D. and V.P.; data curation: J.-A.A.-D. and V.P.; writing—original draft preparation: J.-A.A.-D. and V.P.; writing—review and editing: J.-A.A.-D., V.P., S.S. and H.C.-F. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

The authors would like to thank the Instituto Politecnico Nacional (Mexico) and the Consejo Nacional de Ciencia y Tecnologia (Mexico) for their support in this work.

Conflicts of Interest

The authors declare that there is no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ELM	Epiluminance Microscopy
MD	Medical Doctor
PSL	Pigmented Skin Lesion
ROI	Region of Interest
GLCM	Gray Co-occurrence Matrix
DL	Deep Learning
CAD	Computer-Aided Detection
SVM	Support Vector Machine
RVM	Relevant Vector Machine
LR	Logistic Regression
CNN	Convolutional Neural Network
TF	Transfer Learning
MI	Mutual Information
K-NN	K-Nearest Neighborhood
MCC	Matthews Correlation Coefficient

References

Skin Cancers. Available online: http://www.who.int/uv/faq/skincancer/en/index1.html (accessed on 15 January 2020).
Skin Cancer. Available online: https://www.wcrf.org/dietandcancer/skin-cancer (accessed on 15 January 2020).
Baldi, A.; Quartulli, M.; Murace, R.; Dragonetti, E.; Manganaro, M.; Guerra, O.; Bizzi, S. Automated Dermoscopy Image Analysis of Pigmented Skin Lesions. Cancers 2010, 2, 262–273. [Google Scholar] [CrossRef]
Almaraz-Damian, J.A.; Ponomaryov, V.; Rendon-Gonzalez, E. Melanoma CADe based on ABCD Rule and Haralick Texture Features. In Proceedings of the 2016 9th International Kharkiv Symposium on Physics and Engineering of Microwaves, Millimeter and Submillimeter Waves (MSMW), Kharkiv, Ukraine, 20–24 June 2016; pp. 1–4. [Google Scholar]
Li, Y.; Shen, L. Skin Lesion Analysis towards Melanoma Detection Using Deep Learning Network. Sensors 2018, 18, 556. [Google Scholar] [CrossRef] [Green Version]
Lopez, A.R.; Giro-i-Nieto, X.; Burdick, J.; Marques, O. Skin lesion classification from dermoscopic images using deep learning techniques. In Proceedings of the 2017 13th IASTED International Conference on Biomedical Engineering (BioMed), Innsbruck, Austria, 20–21 February 2017; pp. 49–54. [Google Scholar]
Hosny, K.M.; Kassem, M.A.; Foaud, M.M. Classification of skin lesions using transfer learning and augmentation with Alex-net. PLoS ONE 2019, 14, e0217293. [Google Scholar] [CrossRef] [Green Version]
Castillejos, H.; Ponomaryov, V.; Nino-De-Rivera, L.; Golikov, V. Wavelet Transform Fuzzy Algorithms for Dermoscopic Image Segmentation. Comput. Math. Method. Med. 2012, 2012, 578721. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nachbar, F.; Stolz, W.; Merkle, T.; Cognetta, A.B.; Vogt, T.; Landthaler, M.; Bilek, P.; Braun-Falco, O.; Plewig, G. The ABCD Rule of Dermatoscopy. J. Am. Acad. Dermatol. 1994, 30, 551–559. [Google Scholar] [CrossRef] [Green Version]
Zalaudek, I.; Argenziano, G.; Soyer, H.P.; Corona, R.; Sera, F.; Blum, A.; Braun, R.P.; Cabo, H.; Ferrara, G.; Kopf, A.W.; et al. Three-point checklist of dermoscopy: An open internet study. Br. J. Dermatol. 2005, 154, 431–437. [Google Scholar] [CrossRef] [PubMed]
Henning, J.S.; Dusza, S.W.; Wang, S.Q.; Marghoob, A.A.; Rabinovitz, H.S.; Polsky, D.; Kopf, A.W. The CASH (color, archi-tecture, symmetry, and homogeneity) algorithm for dermoscopy. J. Am. Acad. Dermatol. 2007, 56, 45–52. [Google Scholar] [CrossRef] [PubMed]
Stolz, W.; Riemann, A.; Cognetta, A.B.; Pillet, L.; Abmayr, W.; Hölzel, D.; Bilek, P.; Nachbar, F.; Landthaler, M.; Braun-Falco, O. ABCD rule of dermatoscopy: A new practical method for early recognition of malignant melanoma. Eur. J. Dermatol. 1994, 4, 521–527. [Google Scholar]
Argenziano, G.; Fabbrocini, G.; Carli, P.; De Giorgi, V.; Sammarco, E.; Delfino, M. Epiluminescence Microscopy for the Diagnosis of Doubtful Melanocytic Skin Lesions: Comparison of the ABCD Rule of Dermatoscopy and a New 7-Point Checklist Based on Pattern Analysis. Arch. Dermatol. 1998, 134, 1563–1570. [Google Scholar] [CrossRef] [Green Version]
Melanoma Education Foundation. Finding Melanoma Early: Warning Signs & Photos. Available online: https://www.skincheck.org/Page4.php (accessed on 2 February 2020).
MoleMap NZ Official Site. The EFG of Nodular Melanomas. Available online: https://www.molemap.co.nz/knowledge-centre/efg-nodular-melanomas (accessed on 2 February 2020).
Jensen, J.D.; Elewski, B.E. The ABCDEF Rule: Combining the “ABCDE Rule” and the “Ugly Duckling Sign” in an Effort to Improve Patient Self-Screening Examinations. J. Clin. Aesthet. Dermatol. 2015, 8, 15. [Google Scholar]
Kalkhoran, S.; Milne, O.; Zalaudek, I.; Puig, S.; Malvehy, J.; Kelly, J.W.; Marghoob, A.A. Historical, Clinical, and Dermoscopic Characteristics of Thin Nodular Melanoma. Arch. Dermatol. 2010, 146, 311–318. [Google Scholar] [CrossRef] [Green Version]
Adjed, F.; Gardezi, S.J.S.; Ababsa, F.; Faye, I.; Dass, S.C. Fusion of structural and textural features for melanoma recognition. IET Comput. Vis. 2018, 12, 185–195. [Google Scholar] [CrossRef]
Mendonça, T.; Ferreira, P.M.; Marques, J.; Marcal, A.R.S.; Rozeira, J. PH²—A dermoscopic image database for research and benchmarking. In Proceedings of the 35th International Conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, 3–7 July 2013. [Google Scholar]
Hagerty, J.R.; Stanley, R.J.; Almubarak, H.A.; Lama, N.; Kasmi, R.; Guo, P.; Stoecker, W.V. Deep Learning and Handcrafted Method Fusion: Higher Diagnostic Accuracy for Melanoma Dermoscopy Images. IEEE J. Biomed. Health Inform. 2019, 23, 1385–1391. [Google Scholar] [CrossRef]
Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.; et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). arXiv 2018, arXiv:1902.03368. [Google Scholar]
Li, X.; Wu, J.; Jiang, H.; Chen, E.Z.; Dong, X.; Rong, R. Skin Lesion Classification Via Combining Deep Learning Features and Clinical Criteria Representations. bioRxiv 2018. bioRxiv:382010. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Y.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
Abbas, Q.; Celebi, M.E. DermoDeep-A classification of melanoma-nevus skin lesions using multi-feature fusion of visual features and deep neural network. Multimed. Tool. Appl. 2019, 78, 23559–23580. [Google Scholar] [CrossRef]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report TR-2009; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2016; pp. 1800–1807. [Google Scholar]
Huang, G.; Liu, Z.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2016; pp. 2261–2269. [Google Scholar]
Orea-Flores, I.Y.; Gallegos-Funes, F.J.; Arellano-Reynoso, A. Local Complexity Estimation Based Filtering Method in Wavelet Domain for Magnetic Resonance Imaging Denoising. Entropy 2019, 21, 401. [Google Scholar] [CrossRef] [Green Version]
Goritskiy, Y.; Kazakov, V.; Shevchenko, O.; Mendoza, F. Model of Random Field with Piece-Constant Values and Sampling-Restoration Algorithm of Its Realizations. Entropy 2019, 21, 792. [Google Scholar] [CrossRef] [Green Version]
Yang, M.; Kpalma, K.; Joseph, R. A Survey of Shape Feature Extraction Techniques. In Pattern Recognition Techniques, Technology and Applications; Yin, P.-Y., Ed.; InTech: London, UK, 2008; Available online: http://www.intechopen.com/books/pattern_recognition_techniques_technology_and_applications/a_survey_of_shape_feature_extraction_techniques (accessed on 10 January 2020). [CrossRef] [Green Version]
Alceu, B.; Fornaciali, M.; Valle, E.; Avila, S. (De) Constructing Bias on Skin Lesion Datasets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 2766–2774. [Google Scholar]
Sirakov, N.M.; Mete, M.; Chakrader, N.S. Automatic boundary detection and symmetry calculation in dermoscopy images of skin lesions, Department of Mathematics, 2 Department of Computer Science. IEEE Intern. Conf. Image Process. 2011, 1, 1637–1640. [Google Scholar]
Haralick, R.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, 3, 610–621. [Google Scholar] [CrossRef] [Green Version]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef] [PubMed]
Chollet, F. Deep Learning with Python, 1st ed.; Manning Publications Co.: Greenwich, CT, USA, 2017. [Google Scholar]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2; MIT Press: Cambridge, MA, USA; pp. 3320–3328.
Rawat, W.; Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3. [Google Scholar] [CrossRef] [Green Version]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015. [Google Scholar] [CrossRef] [Green Version]
Bommert, A.; Sun, X.; Bischl, B.; Rahnenführer, J.; Lang, M. Benchmark for Filter Methods for Feature Selection in High-Dimensional Classification Data. Comput. Stat. Data Anal. 2020, 143, 106839. [Google Scholar] [CrossRef]
Karczmarek, P.; Pedrycz, W.; Kiersztyn, A.; Rutka, P. A study in facial features saliency in face recognition: An analytic hierarchy process approach. Soft Comput. 2017, 21, 7503–7517. [Google Scholar] [CrossRef] [Green Version]
Kozachenko, L.F.; Leonenko, N.N. Sample Estimate of the Entropy of a Random Vector. Probl. Peredachi Inf. 1987, 23, 9–16. [Google Scholar]
Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Houghton, C. Calculating mutual information for spike trains and other data with distances but no coordinates. R. Soc. Open Sci. 2015, 2. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ross, B.C. Mutual information between discrete and continuous data sets. PLoS ONE 2014, 9. [Google Scholar] [CrossRef]
Vapnik, V. Statistical Learning Theory; John Wiley: New York, NY, USA, 1998. [Google Scholar]
Tipping, M.E. Sparse bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
Bishop, C. Probabilistic graphical models and their role in machine learning. In Proceedings of the NATO ASI–LTP 2002 Tutorial, Leuven, Belgium, 8–19 July 2002. [Google Scholar]
Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 15 January 2020).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Coelho, L.P. Mahotas: Open source software for scriptable computer vision. J. Open Res. Softw. 2013, 1, e3. [Google Scholar] [CrossRef]
Wen, Z.; Shi, J.; Li, Q.; He, B.; Chen, J. ThunderSVM: A fast SVM library on GPUs and CPUs. J. Mach. Learn. Res. 2018, 19, 797–801. [Google Scholar]
Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 2017, 18, 559–563. [Google Scholar]
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011. [Google Scholar] [CrossRef]
Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 2018, 5, 180161. [Google Scholar] [CrossRef] [PubMed]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Celebi, M.E.; Kingravi, H.A.; Uddin, B.; Iyatomi, H.; Aslandogan, Y.A.; Stoecker, W.V.; Moss, R.H. A methodological approach to the classification of dermoscopy images. Comput. Med. Imag. Graph. 2007, 31, 362–373. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Capdehourat, G.; Corez, A.; Bazzano, A.; Alonso, R.; Musé, P. Toward a combined tool to assist dermatologists in melanoma detection from dermoscopic images of pigmented skin lesions. Pattern Recognit. Lett. 2011, 32, 2187–2196. [Google Scholar] [CrossRef]
García, V.; Mollineda, R.A.; Sánchez, J.S. Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2009; Volume 5524, pp. 441–448. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Block diagram of the novel Computer-Aided Detection (CAD) system.

Figure 2. (a) Original image

I (x, y)

; (b) image (a) processed with a Gaussian filter.

Figure 2. (a) Original image

I (x, y)

; (b) image (a) processed with a Gaussian filter.

Figure 3. (a) Original image

I (x, y)

, (b)

I (x, y)

on channel L, (c)

I (x, y)

on channel a*, and (d)

I (x, y)

on channel b*.

Figure 3. (a) Original image

I (x, y)

, (b)

I (x, y)

on channel L, (c)

I (x, y)

on channel a*, and (d)

I (x, y)

on channel b*.

Figure 4. Results of the thresholding stage: (a) Original image

I (x, y)

, (b) binary image

I_{T h_{L}}

obtained from the threshold of the L channel, (c) binary image

I_{T h_{a}}

obtained from the threshold of the a* channel, (d) binary image

I_{T h_{b}}

obtained from the threshold of the b* channel.

Figure 4. Results of the thresholding stage: (a) Original image

I (x, y)

, (b) binary image

I_{T h_{L}}

obtained from the threshold of the L channel, (c) binary image

I_{T h_{a}}

obtained from the threshold of the a* channel, (d) binary image

I_{T h_{b}}

obtained from the threshold of the b* channel.

Figure 5. Results of the preprocessing stage: Original image

I (x, y)

, (a) Region of Interest (ROI) obtained, (b) segmented image

S_{I_{b i n}}

, (c) obtained asymmetry at 0°, and (d) obtained asymmetry at 90°.

Figure 5. Results of the preprocessing stage: Original image

I (x, y)

, (a) Region of Interest (ROI) obtained, (b) segmented image

S_{I_{b i n}}

, (c) obtained asymmetry at 0°, and (d) obtained asymmetry at 90°.

Figure 6. Distribution of classes on the ISIC2018/HAM10000 dataset.

Table 1. The number of features extracted by architecture.

CNN Architecture	$#_{Features}$
VGG19	4096
VGG16	4096
ResNET-50	2048
Inception v3	2048
Mobilenet v1	1024
Mobilenet v2	1280
DenseNET-201	1920
Xception	2048

Table 2. Example of features that expose the lowest Mutual Information (MI) values.

Inception v3 + Handcraft Features
Feature	Mutual Info. value
1448	3.813956037657107 × 10 $^{- 7}$
648	1.4314941993776031 × 10 $^{- 5}$
1020	3.4236070477255964 × 10 $^{- 5}$
804	3.515106075036023 × 10 $^{- 5}$
333	4.213506368255793 × 10 $^{- 5}$
562	4.4971751217204314 × 10 $^{- 5}$
91	5.133156269931938 × 10 $^{- 5}$
852	6.514623080855486 × 10 $^{- 5}$
1689	7.53828133426282 × 10 $^{- 5}$
1788	7.605629690621285 × 10 $^{- 5}$

Table 3. Example of features that expose the highest MI values.

Inception v3 + Handcraft Features
Feature	Mutual Info. value
Mean_b	0.09829037284938669
Min_b	0.09536305749234275
Max_b	0.06834317593527395
578	0.06131147578510121
Min_G	0.05817685924318594
116	0.055293703553799256
389	0.05446628875169646
464	0.05424503079140042
Var_L	0.05420063226575533
288	0.053949117014718606

Table 4. Performance results of the proposed method using selected deep learning architectures fused with handcraft features.

	Acc. Train	Acc. Test	Sensibility	Specificity	Precision	F-Score	AUC	G-Mean	IBA	MCC
VGG16	88.60	84.90	79.23	0.85	88.74	83.71	84.79	0.85	0.72	0.7012
VGG19	90.23	87.14	82.46	0.87	90.44	86.26	87.05	0.87	0.76	0.7451
Mobilenet v1	91.48	89.32	84.04	0.89	93.49	88.51	89.21	0.89	0.79	0.7898
Mobilenet v2	92.40	89.71	86.41	0.90	92.08	89.16	89.64	0.90	0.80	0.7953
ResNET-50	90.67	87.86	81.24	0.88	93.09	86.76	87.72	0.87	0.77	0.7624
DenseNET-201	91.10	88.54	83.25	0.88	92.61	87.68	88.44	0.88	0.78	0.5985
Inception V3	91.33	88.10	84.87	0.88	90.59	87.42	88.02	0.88	0.77	0.7632
Xception	90.47	87.53	83.19	0.87	90.58	86.73	87.44	0.87	0.76	0.7525

Table 5. Comparison between our novel CAD and state-of-the-art CADs.

Metric	[18]	[24]	[22]	Proposed Method Using Mobilenet v2 Architecture
Accuracy	86.07	95	85.55	92.40
Sensibility	78.93	93	86	86.41
Specificity	93.25	-	-	90
Precision	-	93	85	92.08
F-Score	-	-	86	89.16
G-Mean	-	-	-	0.90
IBA	-	-	-	0.80
MCC	-	-	-	0.7953
Imbalance Data	Yes	No	No	No
Fused Data	Yes	Yes	Yes	Yes
Type of Classification	Binary	Binary	Multiclass	Binary

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Almaraz-Damian, J.-A.; Ponomaryov, V.; Sadovnychiy, S.; Castillejos-Fernandez, H. Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures. Entropy 2020, 22, 484. https://doi.org/10.3390/e22040484

AMA Style

Almaraz-Damian J-A, Ponomaryov V, Sadovnychiy S, Castillejos-Fernandez H. Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures. Entropy. 2020; 22(4):484. https://doi.org/10.3390/e22040484

Chicago/Turabian Style

Almaraz-Damian, Jose-Agustin, Volodymyr Ponomaryov, Sergiy Sadovnychiy, and Heydy Castillejos-Fernandez. 2020. "Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures" Entropy 22, no. 4: 484. https://doi.org/10.3390/e22040484

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures

Abstract

1. Introduction

2. Literature Survey

Principal Contributions

3. Materials and Methods

3.1. Preprocessing

3.2. Handcraft Features

3.2.1. Shape Features

3.2.2. Colour Features

3.2.3. Texture Features

3.3. Deep Learning Features

3.3.1. Transfer Learning

3.3.2. Feature Extractor

3.3.3. Deep Learning Architectures

3.4. Algorithm Summary

3.5. Feature Selection

Mutual Information

4. Results and Discussion

4.1. Experimental Results

4.1.1. Experimental Setup

4.1.2. Evaluation Metrics

4.1.3. Dataset

4.1.4. Balance Data

4.1.5. Experimental Results

5. Conclusions and Future Work

Author Contributions

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI