Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer

doi:10.1016/j.swevo.2019.04.010

Swarm and Evolutionary Computation

Volume 48, August 2019, Pages 262-273

https://doi.org/10.1016/j.swevo.2019.04.010 Get rights and content

Abstract

Over the last two decades, there has been an expeditious expansion in the generation and exploration of high-dimensional biomedical data. Identification of biomarkers from the genomics data poses a significant challenge in microarray data analysis. Therefore, for the methodical analysis of the genomics dataset, it is paramount to develop some effective algorithms. In this work, a multi-objective version of the emperor penguin optimization (EPO) algorithm with chaos, namely, multi-objective chaotic EPO (MOCEPO) is proposed. The suggested approach extends the original continuous single objective EPO to a competent binary multi-objective model. The objectives are to minimize the number of selected genes (NSG) and to maximize the classification accuracy (CA). In this work, Fisher score and minimum redundancy maximum relevance (mRMR) are independently used as initial filters. Further, the proposed MOCEPO is employed for the simultaneous optimal feature selection and cancer classification. The proposed algorithm is successfully experimented on seven well-known high-dimensional binary-class as well as multi-class datasets. To evaluate the effectiveness, the proposed method is compared with non-dominated sorting genetic algorithm (NSGA-II), multi-objective particle swarm optimization (MOPSO), chaotic version of GA for multi-objective optimization (CGAMO), and chaotic MOPSO methods. The experimental results show that the proposed framework achieves better CA with minimum NSG compared to the existing schemes. The presented approach exhibits its efficacy with regard to NSG, accuracy, sensitivity, specificity, and F-measure.

Introduction

The rapidly growing DNA microarray technology has enabled the researchers to measure the expression level of thousands of genes simultaneously in a single experiment [1]. The biomarker genes extracted from the microarray data helps in the clinical diagnosis, prognosis, and treatment of cancer. However, the high dimensionality of the microarray data increases the computational overhead and hence poses a significant challenge in the biomedical data analysis. To overcome this issue, the irrelevant and redundant genes need to be discarded using some feature selection [2,3] techniques. In fact, it is expected that selecting the relevant genes reduces the size of the gene expression data and enhances the CA.

Several gene selection (GS) techniques have been suggested in the literature to identify the useful genes present in the microarray data. These GS techniques can be broadly classified into three categories, namely, filter methods, wrapper methods, and hybrid methods [4,5]. Filter-based methods select the relevant genes from the original gene set based on some statistical characteristics. Despite the simplicity and computationally efficiency, filter techniques are incapable of exploiting the relationship among the genes, thereby reducing the overall accuracy. On the other hand, the wrapper-based techniques employ the knowledge of the classifiers, namely, kernel ridge regression (KRR) [6], support vector machine (SVM) [7], K-nearest neighbor (KNN) [8,9], Naive Bayes (NB) [10], radial basis functions neural networks (RBFN) [11], and decision tree (DT) [12] to find the bio-markers. The wrapper models use bio-inspired algorithms to identify the optimal solutions by analyzing the search area from a set of solutions (population). The evolutionary algorithms such as GA [[13], [14], [15]], differential evolution (DE) [16,17], artificial bee colony algorithm (ABC) [18], genetic bee colony optimization (GBC) [19], ant colony optimization (ACO) [20], salp swarm algorithm (SSA) [21], firefly algorithm (FA) [22], bidirectional elitist optimization [23], and PSO [[24], [25], [26], [27], [28]] have been successfully utilized for solving numerous feature selection problems. These methods are competent of learning the association among the genes and therefore, lead to better CA. The hybrid methods use the merits of the both by first employing a filter method to reduce the NSG and then applying the wrapper method to explore the optimal gene subset.

In order to select the biomarker genes in a faster and efficient manner, multi-objective methods have been designed. In the last two decades, several multi-objective optimization methods, namely, MOPSO [29], CMOPSO [30], NSGA-II [31], CGAMO [32], multi-objective FA (MOFA) [33], multi-objective teaching-learning-based optimization (MOTLBO) [34], multi-objective gravitational search algorithm (MOGSA) [35], and multi-objective differential evolution (MODE) [36] algorithms have been proposed. These methods prove their effectiveness in solving multi-objective problems. Though all the above mentioned algorithms are competent enough in solving a specific task, they can not fix all optimization problems with dissimilar characteristics [37]. Hence, there always remain a room for a novel method which can solve a problem that can not be addressed by the present methods.

The two important phases of any metaheuristic algorithm are diversification and intensification [38,39]. Diversification makes sure that the algorithm searches the various promising areas in a certain search space, whereas intensification investigates the optimal solutions around the promising areas which is resulted by the diversification phase [40]. The proper balancing between the above two phases is important for any optimization problem, which motivates us to employ the EPO algorithm. The second motivation is the ‘no free lunch theorem’, which says that none of the existing metaheuristic is capable of solving all optimization problems [37].

In this paper, a novel multi-objective version of the EPO algorithm, namely, MOCEPO is proposed. EPO [41] is a newly developed meta-heuristic method, originally designed for single objective optimization problems. In this work, we have extended the single objective EPO to multi-objective binary EPO by utilizing the multi-objective operators, namely, non-dominated sorting, and crowding distance. The two objectives of our problem are to minimize the NSG, and to maximize the CA. The CA is computed by the KRR classifier. In order to reduce the redundant genes, Fisher score and mRMR filters are employed independently.

The five major contributions of the suggested work are highlighted as:

-
For the first time, a multi-objective version of the EPO algorithm is proposed.
-
Chaos theory is introduced in the MOEPO for faster convergence.
-
Multi-objective operator like non-dominated sorting is incorporated to rank the pareto optimal solutions.
-
Selection of the fittest solution is carried out by the crowding distance operator.
-
The proposed method is applied for simultaneous GS and cancer classification.

The proposed approach is implemented on seven standard datasets. The performance of the proposed framework is evaluated in terms of CA, NSG, F-measure, specificity, Matthews correlation coefficient (MCC), and sensitivity. The results show that our method can not only achieve higher CA, but also reduces the NSG effectively.

The remainder of the paper is organized as follows. Section 2 explains the methods used along with the proposed work. Experimental setup and performance metrics are presented in Section 3. The results of the work are presented and discussed in Section 4. Finally, we conclude the work in Section 5.

Section snippets

Pre-selection of genes

To effectively filter out the highly redundant and irrelevant genes, usually, filter-based gene ranking algorithms are used. In this paper, we have employed Fisher score [42] and mRMR [43] filters separately for gene pre-selection, which have reliable performance in segregating the relevant genes [44,45]. As compared to the methods like T-test, Z-score, and information gain, Fisher score and mRMR produce superior results [45,46]. Nonetheless, every technique has its advantages that influence

Datasets

The proposed method is applied on seven standard microarray cancer datasets [50,51], listed in Table 1. Out of the seven datasets, three datasets belong to binary-class and four datasets belong to multi-class. Prior to feature selection by the Fisher score and mRMR filters, min-max normalization in the range of [-1,1] is applied on the whole dataset.

Experimental setup

MATLAB 2017b is used to carry out the experiments with 8 GB of main memory and Core i5 processor (2.70 GHz). The simulation results are evaluated

Experimental results of feature selection using Fisher score and mRMR

In this experiment, two feature selection methods, namely, Fisher score and mRMR are used independently as initial filters to select top N statistically relevant biomarkers. N ranges from 1 to 500 (see Fig. 4).

Further, these selected features are sent to the KRR model with default C and γ. In this work, the default values of C and γ are taken as 1 and 100, respectively. Fig. 4 shows the change in CA with the increment in NSG on various datasets by Fisher score and mRMR filters. It is observed

Conclusion

Evolutionary algorithms play an important role in finding the relevant genes from high-dimensional microarray data and hence help the system biologist in cancer diagnosis. Identification of biomarkers with smaller numbers and higher CA substantially improves the quality of the expert systems used in the hospitals.

In the present work, a multi-objective model based on the principle of chaotic EPO algorithm has been proposed for microarray cancer classification. There are two major merits of the

References (66)

P. Mohapatra et al.
Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system
Swarm Evol. Comput.
(2016)
R. Katuwal et al.
An ensemble of decision trees with random vector functional link networks for multi-class classification
Appl. Soft Comput.
(2018)
M.Z. Ali et al.
An improved class of real-coded genetic algorithms for numerical optimization
Neurocomputing
(2018)
H. Motieghader et al.
A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata
Inf. Med. Unlocked
(2017)
S. Das et al.
Recent advances in differential evolutionan updated survey
Swarm Evol. Comput.
(2016)
J. Apolloni et al.
Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments
Appl. Soft Comput.
(2016)
H.M. Alshamlan et al.
Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification
Comput. Biol. Chem.
(2015)
S. Tabakhi et al.
Gene selection for microarray data classification using a novel ant colony optimization
Neurocomputing
(2015)
S.K. Baliarsingh et al.
Analysis of high-dimensional genomic data employing a novel bio-inspired algorithm
Appl. Soft Comput.
(2019)
I. Fister et al.
A comprehensive review of firefly algorithms
Swarm Evol. Comput.
(2013)

S. Kar et al.

Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

Expert Syst. Appl.

(2015)

Y. Sun et al.

Chaotic multi-objective particle swarm optimization algorithm incorporating clone immunity

Mathematics

(2019)

V. Ravi et al.

Financial time series prediction using hybrids of chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms

Swarm Evol. Comput.

(2017)

V.K. Patel et al.

A multi-objective improved teachinglearning based optimization algorithm (MO-ITLBO)

Inf. Sci.

(2016)

E. Rashedi et al.

A comprehensive survey on gravitational search algorithm

Swarm Evol. Comput.

(2018)

J. Cheng et al.

A grid-based adaptive multi-objective differential evolution algorithm

Inf. Sci.

(2016)

M. Lozano et al.

Hybrid metaheuristics with evolutionary algorithms specializing in intensification and diversification: overview and progress report

Comput. Oper. Res.

(2010)

G. Dhiman et al.

Emperor penguin optimizer: a bio-inspired algorithm for engineering problems

Knowl. Base Syst.

(2018)

Q. Song et al.

Feature selection based on FDA and F-score for multi-class classification

Expert Syst. Appl.

(2017)

M. Dashtban et al.

Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts

Genomics

(2017)

J. Lv et al.

A multi-objective heuristic algorithm for gene expression microarray data classification

Expert Syst. Appl.

(2016)

A.H. Gandomi et al.

Chaotic bat algorithm

J. Comput. Sci.

(2014)

Z. Zhu et al.

Markov blanket-embedded genetic algorithm for gene selection

Pattern Recogn.

(2007)

E.F. Petricoin et al.

Use of proteomic patterns in serum to identify ovarian cancer

Lancet

(2002)

Y. Wang et al.

Informative gene selection for microarray classification via adaptive elastic net with conditional mutual information

Appl. Math. Model.

(2019)

K.-H. Liu et al.

A hierarchical ensemble of ecoc for cancer classification based on multi-class microarray data

Inf. Sci.

(2016)

W. Ding et al.

A hierarchical-coevolutionary-mapreduce-based knowledge reduction algorithm with robust ensemble pareto equilibrium

Inf. Sci.

(2016)

W. Ding et al.

Multiagent-consensus-mapreduce-based attribute reduction using co-evolutionary quantum PSO for big data applications

Neurocomputing

(2018)

K. Muhammad et al.

Early fire detection using convolutional neural networks during surveillance for effective disaster management

Neurocomputing

(2018)

Y.-D. Zhang et al.

Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation

Multimed. Tool. Appl.

(2017)

S. Pang et al.

Classification consistency analysis for bootstrapping gene selection

Neural Comput. Appl.

(2007)

E.K. Tang et al.

Feature selection for microarray data using least squares SVM and particle swarm optimization

W. Ding et al.

Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce

IEEE Trans. Cybern.

(2018)

Cited by (46)

A two-step image segmentation based on clone selection multi-object emperor penguin optimizer for fault diagnosis of power transformer
2024, Expert Systems with Applications
Power transformer is one of the critical and valuableness apparatus in the secure operation of the power system. The infrared image fault diagnosis can reflect the status of the power transformer. However, how to detect the fault point of the diverse infrared images is a difficult problem. We present a two-stage multilevel threshold image segmentation method to finish the task of power transformers fault diagnosis. The multi-objective emperor penguin optimizer based on the clone selection strategy optimizes the multi-level image segmentation method for discovering the optimal threshold. The 3DOtsu and determination of optimal target number function as the fitness function and we convert them into multi-objective optimization problems. The proposed method is compared with the multi-objective optimization algorithms and novel fault diagnosis methods under the classic images and infrared images of the power transformer. The experiment results show that MOEPO/C has a good performance than others compared algorithms in Feature Similarity Index, Uniformity measure, Peak Signal‑to‑Noise Ratio, and CPU time. Especially, the proposed method obtains high accuracy in power transformer fault diagnosis.
Optimizing microarray cancer gene selection using swarm intelligence: Recent developments and an exploratory study
2023, Egyptian Informatics Journal
Microarray data represents a valuable tool for the identification of biomarkers associated with diseases and other biological conditions. Genes, in particular, are a type of biomarker that holds great importance for the identification and understanding of various types of tumors, including brain, lung, and breast cancers. However, a significant portion of these cancer genes are not directly associated with the target disease, which can lead to challenges during analysis, such as increased computational complexity, poor generalization, and decreased classification accuracy, among others. To address this issue, a range of techniques and algorithms have been developed to optimize the selection of the most relevant subset of cancer genes. One highly effective approach to handle this challenge is the use of Swarm Intelligent (SI) algorithms, which are known for their efficiency and effectiveness as global search agents. In this paper, we present two distinct but related sections. First, we conduct a survey of current literature from 2019 to the present, on the use of SI algorithms for optimizing the selection of an optimal subset of cancer genes. Secondly, based on the analysis and findings from the first part, a presentation of an experimental study that evaluates the efficacy of four classical SI algorithms - Particle Swarm Optimization (PSO), Salp Swarm Optimization (SSA), Firefly Algorithm (FA), and Cuckoo Search (CS) – for optimizing the selection of relevant genes in three different cancer datasets. For the experimental study, we used the Chi-square, Mutual Information, and ANOVA filter methods to individually select 100, 200, and 500 relevant genes from the identified cancer datasets. We then passed these genes as input to each of the SI algorithms. The results of the study indicate that diverse filter-wrapper combinations can effectively address the challenge of selecting cancer genes across various datasets.
An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data
2023, Computers in Biology and Medicine
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Emperor penguin optimizer: A comprehensive review based on state-of-the-art meta-heuristic algorithms
2023, Alexandria Engineering Journal
Meta heuristics is an optimization approach that works as an intelligent technique to solve optimization problems. Evolutionary algorithms, human-based algorithms, physics-based algorithms and swarm intelligence are categorized under meta-heuristic algorithms. This study presents a critical review of meta-heuristic algorithms for future reference, including concepts, applications, advantages and disadvantages, before focusing on one specific meta-heuristic algorithm, namely, Emperor Penguin Optimizer (EPO). It is an intelligent algorithm developed after observing the behaviour of emperor penguins during cold winters. This technique was introduced by Dhiman in 2018 and adopted to solve optimization problems. The study reviews the algorithm variants starting from its invention in 2018 until 2022. The literature is comprehensively reviewed to reflect on the progress of the algorithm’s adoption, highlighting a new area for improvement. The most significant result is that the proposed algorithm has been proven an effective technique. The merits and demerits of the algorithm are explored to provide valuable perspectives for future research. This study answers the question regarding meta-heuristic algorithms’ effectiveness, especially EPO. Both beginners and experts of EPO research can use the findings of this study as guidelines for enhancing current concepts and applications of state-of-the-art algorithms for future development works.
Effect of situational and instrumental distortions on the classification of brain MR images
2023, Biomedical Signal Processing and Control
Magnetic Resonance (MR) images of the brain play key role in exploiting pathological changes and non-invasive investigation of many neuro-degenerative diseases. Computer Aided Diagnosis (CAD) systems assist radiologists in interpreting MR images and classifying them into “normal” and “abnormal” categories. However, reduced strength of the used magnet in the machine or involuntary motions of the patients may lead to degraded MR images, which can negatively affect the performance of CAD system compromising the classification accuracy. This work aims at modeling these types of situations via out-of-focus blur, motion blur, effect of variation in resolution, and a combination of these on brain MR images for validating the impact of image quality on classification performance. To validate this, this article mathematically models the blurs (both individually and simultaneously) by varying the strength of image quality covariates and afterwards Deep Convolutional Neural Networks (DCNN) are employed to train and classify the distorted brain MR images. Besides, a single DCNN is experimented with a good mix of image quality and characteristics to test the reliability of the model for real-life scenario. The CNN models are validated through comprehensive evaluation on both original and degraded versions of brain MR images from two benchmark datasets DS-75 and DS-160 collected by Harvard Medical School as well as a self-collected dataset NITR-DHH. This study reveals that the models are able to classify distorted MR images and hence can be used for assisting the clinicians.
Review of bio-inspired optimization applications in renewable-powered smart grids: Emerging population-based metaheuristics
2022, Energy Reports
The management of renewable-powered smart grids deals with nonlinear optimization problems featuring a variety of linear or nonlinear constraints, discrete or continuous optimization variables, involving high dimensionality of the solution space, and strict time requirements to identify the optimal or near-optimal solution. One promising approach for addressing such optimization problems is to apply bio-inspired population-based optimization algorithms, many such metaheuristics emerging lately. In this paper, we have identified the metaheuristics with the highest impact published recently and reviewed their applications in the management of renewable-powered smart energy grids using the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) methodology and the Web of Science Core Collection as the reference database. Four main smart grid application domains we been analyzed: (i) energy prediction models’ optimization to reduce uncertainty (ii) energy resources coordination to handle the stochastic nature of renewables, (iii) demand response using controllable loads and flexibility while considering the consumers’ needs and constraints and (iv) optimization of grid energy efficiency and costs. The results showed the advantages of such metaheuristics for decentralized optimization problems with low computational time and resource overhead. At the same time, several issues need to be addressed to increase their adoption in the smart grid management scenarios: the lack of standard testing methodologies and benchmarks, efficient management of exploration and exploitation of the optimization search space, guidelines for metaheuristics application with clear links to the type of optimization problems, etc.

View all citing articles on Scopus

View full text

Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer

Abstract

Introduction

Section snippets

Pre-selection of genes

Datasets

Experimental setup

Experimental results of feature selection using Fisher score and mRMR

Conclusion

Swarm Evol. Comput.

Appl. Soft Comput.

Neurocomputing

Inf. Med. Unlocked

Swarm Evol. Comput.

Appl. Soft Comput.

Comput. Biol. Chem.

Neurocomputing

Appl. Soft Comput.

Swarm Evol. Comput.

Expert Syst. Appl.

Mathematics

Swarm Evol. Comput.

Inf. Sci.

Swarm Evol. Comput.

Inf. Sci.

Comput. Oper. Res.

Knowl. Base Syst.

Expert Syst. Appl.

Genomics

Expert Syst. Appl.

J. Comput. Sci.

Pattern Recogn.

Lancet

Appl. Math. Model.

Inf. Sci.

Inf. Sci.

Neurocomputing

Neurocomputing

Multimed. Tool. Appl.

Classification consistency analysis for bootstrapping gene selection

Neural Comput. Appl.

Feature selection for microarray data using least squares SVM and particle swarm optimization

Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce

IEEE Trans. Cybern.