Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

Kečo, Dino; Subasi, Abdulhamit; Kevric, Jasmin

doi:10.1007/s00521-016-2780-z

Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

Original Article
Published: 19 December 2016

Volume 30, pages 1601–1610, (2018)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

783 Accesses
23 Citations
Explore all metrics

Abstract

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MapReduce based parallel gene selection method

Article 30 July 2014

A. K. M. Tauhidul Islam, Byeong-Soo Jeong, … Seok-Hee Jeon

High-Performance Multiclass Classification Framework Using Cloud Computing Architecture

Article 21 November 2015

Feng-Sheng Lin, Chia-Ping Shen, … Jeng-Wei Lin

Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data

Article 19 August 2019

Lokeswari Venkataramana, Shomona Gracia Jacob, … Kunthipuram Manoja

References

Welcome to Apache™ Hadoop^®!. http://hadoop.apache.org. Accessed 13 Dec 2015
Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 7:22
Article Google Scholar
Coulouris GF, Dollimore J, Kindberg T (2005) Distributed systems: concepts and design. Pearson Education, Upper Saddle River
MATH Google Scholar
Apache Spark™—Lightning-Fast Cluster Computing. http://spark.apache.org/. Accessed 24 Apr 2016
Quackenbush J, John Q (2001) Computational genetics: computational analysis of microarray data. Nat Rev Genet 2(6):418–427
Article Google Scholar
Wong T-T, Tzu-Tsung W, Ching-Han H (2008) Two-stage classification methods for microarray data. Expert Syst Appl 34(1):375–383
Article Google Scholar
Lee C-P, Chien-Pang L, Wen-Shin L, Yuh-Min C, Bo-Jein K (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst Appl 38(5):4661–4667
Article Google Scholar
Roffo G, Giorgio R, Simone M, Marco C (2015) Infinite Feature Selection, in 2015 IEEE International Conference on Computer Vision (ICCV)
Phuong TM, Lin Z, Altman RB (2005) Choosing SNPs using feature selection. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 301–309
Hong J-H, Jin-Hyuk H, Sung-Bae C (2006) Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit Lett 27(2):143–150
Article Google Scholar
Mohamad MS, Safaai D, Illias RMD (2005) A hybrid of genetic algorithm and support vector machine for features selection and classification of gene expression microarray. Int J Comput Intell Appl 05(01):91–107
Article Google Scholar
Hung C-L, Chen W-P, Hua G-J, Zheng H, Tsai S-JJ, Lin Y-L (2015) Cloud computing-based TagSNP selection algorithm for human genome data. Int J Mol Sci 16(1):1096–1110
Article Google Scholar
Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 11(Suppl 12):S1
Article Google Scholar
Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369
Article Google Scholar
Hung C-L, Lin Y-L (2013) Implementation of a parallel protein structure alignment service on cloud. Int J Genom Proteom 2013:439681
Google Scholar
Hung C-L, Hua G-J (2013) Cloud computing for protein-ligand binding site comparison. Biomed Res Int 2013:170356
Google Scholar
Gunarathne T (2015) Hadoop MapReduce v2 Cookbook, 2nd edn. Packt Publishing Ltd, Birmingham
Google Scholar
Keco D, Subasi A (2012) Parallelization of genetic algorithms using Hadoop Map/Reduce. SouthEast Eur J Soft Comput 1(2):56–59
Google Scholar
Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74(7–8):491–503
Article Google Scholar
Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education, Upper Saddle River
Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Lee W-P, Hsiao Y-T, Hwang W-C (2014) Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment. BMC Syst Biol 8:5
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Information Technologies, International Burch University, Francuske Revolucije bb, 71000, Ilidza, Sarajevo, Bosnia and Herzegovina
Dino Kečo & Jasmin Kevric
College of Engineering, Effat University, Jeddah, 21478, Saudi Arabia
Abdulhamit Subasi

Authors

Dino Kečo
View author publications
You can also search for this author in PubMed Google Scholar
Abdulhamit Subasi
View author publications
You can also search for this author in PubMed Google Scholar
Jasmin Kevric
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdulhamit Subasi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kečo, D., Subasi, A. & Kevric, J. Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput & Applic 30, 1601–1610 (2018). https://doi.org/10.1007/s00521-016-2780-z

Download citation

Received: 01 June 2016
Accepted: 08 December 2016
Published: 19 December 2016
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00521-016-2780-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

Abstract

Access this article

Similar content being viewed by others

MapReduce based parallel gene selection method

High-Performance Multiclass Classification Framework Using Cloud Computing Architecture

Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

Abstract

Access this article

Similar content being viewed by others

MapReduce based parallel gene selection method

High-Performance Multiclass Classification Framework Using Cloud Computing Architecture

Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation