Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer
Introduction
The rapidly growing DNA microarray technology has enabled the researchers to measure the expression level of thousands of genes simultaneously in a single experiment [1]. The biomarker genes extracted from the microarray data helps in the clinical diagnosis, prognosis, and treatment of cancer. However, the high dimensionality of the microarray data increases the computational overhead and hence poses a significant challenge in the biomedical data analysis. To overcome this issue, the irrelevant and redundant genes need to be discarded using some feature selection [2,3] techniques. In fact, it is expected that selecting the relevant genes reduces the size of the gene expression data and enhances the CA.
Several gene selection (GS) techniques have been suggested in the literature to identify the useful genes present in the microarray data. These GS techniques can be broadly classified into three categories, namely, filter methods, wrapper methods, and hybrid methods [4,5]. Filter-based methods select the relevant genes from the original gene set based on some statistical characteristics. Despite the simplicity and computationally efficiency, filter techniques are incapable of exploiting the relationship among the genes, thereby reducing the overall accuracy. On the other hand, the wrapper-based techniques employ the knowledge of the classifiers, namely, kernel ridge regression (KRR) [6], support vector machine (SVM) [7], K-nearest neighbor (KNN) [8,9], Naive Bayes (NB) [10], radial basis functions neural networks (RBFN) [11], and decision tree (DT) [12] to find the bio-markers. The wrapper models use bio-inspired algorithms to identify the optimal solutions by analyzing the search area from a set of solutions (population). The evolutionary algorithms such as GA [[13], [14], [15]], differential evolution (DE) [16,17], artificial bee colony algorithm (ABC) [18], genetic bee colony optimization (GBC) [19], ant colony optimization (ACO) [20], salp swarm algorithm (SSA) [21], firefly algorithm (FA) [22], bidirectional elitist optimization [23], and PSO [[24], [25], [26], [27], [28]] have been successfully utilized for solving numerous feature selection problems. These methods are competent of learning the association among the genes and therefore, lead to better CA. The hybrid methods use the merits of the both by first employing a filter method to reduce the NSG and then applying the wrapper method to explore the optimal gene subset.
In order to select the biomarker genes in a faster and efficient manner, multi-objective methods have been designed. In the last two decades, several multi-objective optimization methods, namely, MOPSO [29], CMOPSO [30], NSGA-II [31], CGAMO [32], multi-objective FA (MOFA) [33], multi-objective teaching-learning-based optimization (MOTLBO) [34], multi-objective gravitational search algorithm (MOGSA) [35], and multi-objective differential evolution (MODE) [36] algorithms have been proposed. These methods prove their effectiveness in solving multi-objective problems. Though all the above mentioned algorithms are competent enough in solving a specific task, they can not fix all optimization problems with dissimilar characteristics [37]. Hence, there always remain a room for a novel method which can solve a problem that can not be addressed by the present methods.
The two important phases of any metaheuristic algorithm are diversification and intensification [38,39]. Diversification makes sure that the algorithm searches the various promising areas in a certain search space, whereas intensification investigates the optimal solutions around the promising areas which is resulted by the diversification phase [40]. The proper balancing between the above two phases is important for any optimization problem, which motivates us to employ the EPO algorithm. The second motivation is the ‘no free lunch theorem’, which says that none of the existing metaheuristic is capable of solving all optimization problems [37].
In this paper, a novel multi-objective version of the EPO algorithm, namely, MOCEPO is proposed. EPO [41] is a newly developed meta-heuristic method, originally designed for single objective optimization problems. In this work, we have extended the single objective EPO to multi-objective binary EPO by utilizing the multi-objective operators, namely, non-dominated sorting, and crowding distance. The two objectives of our problem are to minimize the NSG, and to maximize the CA. The CA is computed by the KRR classifier. In order to reduce the redundant genes, Fisher score and mRMR filters are employed independently.
The five major contributions of the suggested work are highlighted as:
- -
For the first time, a multi-objective version of the EPO algorithm is proposed.
- -
Chaos theory is introduced in the MOEPO for faster convergence.
- -
Multi-objective operator like non-dominated sorting is incorporated to rank the pareto optimal solutions.
- -
Selection of the fittest solution is carried out by the crowding distance operator.
- -
The proposed method is applied for simultaneous GS and cancer classification.
The proposed approach is implemented on seven standard datasets. The performance of the proposed framework is evaluated in terms of CA, NSG, F-measure, specificity, Matthews correlation coefficient (MCC), and sensitivity. The results show that our method can not only achieve higher CA, but also reduces the NSG effectively.
The remainder of the paper is organized as follows. Section 2 explains the methods used along with the proposed work. Experimental setup and performance metrics are presented in Section 3. The results of the work are presented and discussed in Section 4. Finally, we conclude the work in Section 5.
Section snippets
Pre-selection of genes
To effectively filter out the highly redundant and irrelevant genes, usually, filter-based gene ranking algorithms are used. In this paper, we have employed Fisher score [42] and mRMR [43] filters separately for gene pre-selection, which have reliable performance in segregating the relevant genes [44,45]. As compared to the methods like T-test, Z-score, and information gain, Fisher score and mRMR produce superior results [45,46]. Nonetheless, every technique has its advantages that influence
Datasets
The proposed method is applied on seven standard microarray cancer datasets [50,51], listed in Table 1. Out of the seven datasets, three datasets belong to binary-class and four datasets belong to multi-class. Prior to feature selection by the Fisher score and mRMR filters, min-max normalization in the range of [-1,1] is applied on the whole dataset.
Experimental setup
MATLAB 2017b is used to carry out the experiments with 8 GB of main memory and Core i5 processor (2.70 GHz). The simulation results are evaluated
Experimental results of feature selection using Fisher score and mRMR
In this experiment, two feature selection methods, namely, Fisher score and mRMR are used independently as initial filters to select top N statistically relevant biomarkers. N ranges from 1 to 500 (see Fig. 4).
Further, these selected features are sent to the KRR model with default C and γ. In this work, the default values of C and γ are taken as 1 and 100, respectively. Fig. 4 shows the change in CA with the increment in NSG on various datasets by Fisher score and mRMR filters. It is observed
Conclusion
Evolutionary algorithms play an important role in finding the relevant genes from high-dimensional microarray data and hence help the system biologist in cancer diagnosis. Identification of biomarkers with smaller numbers and higher CA substantially improves the quality of the expert systems used in the hospitals.
In the present work, a multi-objective model based on the principle of chaotic EPO algorithm has been proposed for microarray cancer classification. There are two major merits of the
References (66)
- et al.
Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system
Swarm Evol. Comput.
(2016) - et al.
An ensemble of decision trees with random vector functional link networks for multi-class classification
Appl. Soft Comput.
(2018) - et al.
An improved class of real-coded genetic algorithms for numerical optimization
Neurocomputing
(2018) - et al.
A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata
Inf. Med. Unlocked
(2017) - et al.
Recent advances in differential evolutionan updated survey
Swarm Evol. Comput.
(2016) - et al.
Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments
Appl. Soft Comput.
(2016) - et al.
Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification
Comput. Biol. Chem.
(2015) - et al.
Gene selection for microarray data classification using a novel ant colony optimization
Neurocomputing
(2015) - et al.
Analysis of high-dimensional genomic data employing a novel bio-inspired algorithm
Appl. Soft Comput.
(2019) - et al.
A comprehensive review of firefly algorithms
Swarm Evol. Comput.
(2013)
Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique
Expert Syst. Appl.
Chaotic multi-objective particle swarm optimization algorithm incorporating clone immunity
Mathematics
Financial time series prediction using hybrids of chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms
Swarm Evol. Comput.
A multi-objective improved teachinglearning based optimization algorithm (MO-ITLBO)
Inf. Sci.
A comprehensive survey on gravitational search algorithm
Swarm Evol. Comput.
A grid-based adaptive multi-objective differential evolution algorithm
Inf. Sci.
Hybrid metaheuristics with evolutionary algorithms specializing in intensification and diversification: overview and progress report
Comput. Oper. Res.
Emperor penguin optimizer: a bio-inspired algorithm for engineering problems
Knowl. Base Syst.
Feature selection based on FDA and F-score for multi-class classification
Expert Syst. Appl.
Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts
Genomics
A multi-objective heuristic algorithm for gene expression microarray data classification
Expert Syst. Appl.
Chaotic bat algorithm
J. Comput. Sci.
Markov blanket-embedded genetic algorithm for gene selection
Pattern Recogn.
Use of proteomic patterns in serum to identify ovarian cancer
Lancet
Informative gene selection for microarray classification via adaptive elastic net with conditional mutual information
Appl. Math. Model.
A hierarchical ensemble of ecoc for cancer classification based on multi-class microarray data
Inf. Sci.
A hierarchical-coevolutionary-mapreduce-based knowledge reduction algorithm with robust ensemble pareto equilibrium
Inf. Sci.
Multiagent-consensus-mapreduce-based attribute reduction using co-evolutionary quantum PSO for big data applications
Neurocomputing
Early fire detection using convolutional neural networks during surveillance for effective disaster management
Neurocomputing
Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation
Multimed. Tool. Appl.
Classification consistency analysis for bootstrapping gene selection
Neural Comput. Appl.
Feature selection for microarray data using least squares SVM and particle swarm optimization
Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce
IEEE Trans. Cybern.
Cited by (46)
A two-step image segmentation based on clone selection multi-object emperor penguin optimizer for fault diagnosis of power transformer
2024, Expert Systems with ApplicationsOptimizing microarray cancer gene selection using swarm intelligence: Recent developments and an exploratory study
2023, Egyptian Informatics JournalAn improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data
2023, Computers in Biology and MedicineEmperor penguin optimizer: A comprehensive review based on state-of-the-art meta-heuristic algorithms
2023, Alexandria Engineering JournalEffect of situational and instrumental distortions on the classification of brain MR images
2023, Biomedical Signal Processing and Control