Skip to main content

Advertisement

Log in

Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Welcome to Apache™ Hadoop®!. http://hadoop.apache.org. Accessed 13 Dec 2015

  2. Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 7:22

    Article  Google Scholar 

  3. Coulouris GF, Dollimore J, Kindberg T (2005) Distributed systems: concepts and design. Pearson Education, Upper Saddle River

    MATH  Google Scholar 

  4. Apache Spark™—Lightning-Fast Cluster Computing. http://spark.apache.org/. Accessed 24 Apr 2016

  5. Quackenbush J, John Q (2001) Computational genetics: computational analysis of microarray data. Nat Rev Genet 2(6):418–427

    Article  Google Scholar 

  6. Wong T-T, Tzu-Tsung W, Ching-Han H (2008) Two-stage classification methods for microarray data. Expert Syst Appl 34(1):375–383

    Article  Google Scholar 

  7. Lee C-P, Chien-Pang L, Wen-Shin L, Yuh-Min C, Bo-Jein K (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst Appl 38(5):4661–4667

    Article  Google Scholar 

  8. Roffo G, Giorgio R, Simone M, Marco C (2015) Infinite Feature Selection, in 2015 IEEE International Conference on Computer Vision (ICCV)

  9. Phuong TM, Lin Z, Altman RB (2005) Choosing SNPs using feature selection. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 301–309

  10. Hong J-H, Jin-Hyuk H, Sung-Bae C (2006) Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit Lett 27(2):143–150

    Article  Google Scholar 

  11. Mohamad MS, Safaai D, Illias RMD (2005) A hybrid of genetic algorithm and support vector machine for features selection and classification of gene expression microarray. Int J Comput Intell Appl 05(01):91–107

    Article  Google Scholar 

  12. Hung C-L, Chen W-P, Hua G-J, Zheng H, Tsai S-JJ, Lin Y-L (2015) Cloud computing-based TagSNP selection algorithm for human genome data. Int J Mol Sci 16(1):1096–1110

    Article  Google Scholar 

  13. Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 11(Suppl 12):S1

    Article  Google Scholar 

  14. Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369

    Article  Google Scholar 

  15. Hung C-L, Lin Y-L (2013) Implementation of a parallel protein structure alignment service on cloud. Int J Genom Proteom 2013:439681

    Google Scholar 

  16. Hung C-L, Hua G-J (2013) Cloud computing for protein-ligand binding site comparison. Biomed Res Int 2013:170356

    Google Scholar 

  17. Gunarathne T (2015) Hadoop MapReduce v2 Cookbook, 2nd edn. Packt Publishing Ltd, Birmingham

    Google Scholar 

  18. Keco D, Subasi A (2012) Parallelization of genetic algorithms using Hadoop Map/Reduce. SouthEast Eur J Soft Comput 1(2):56–59

    Google Scholar 

  19. Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74(7–8):491–503

    Article  Google Scholar 

  20. Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education, Upper Saddle River

    Google Scholar 

  21. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  22. Lee W-P, Hsiao Y-T, Hwang W-C (2014) Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment. BMC Syst Biol 8:5

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdulhamit Subasi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kečo, D., Subasi, A. & Kevric, J. Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput & Applic 30, 1601–1610 (2018). https://doi.org/10.1007/s00521-016-2780-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-016-2780-z

Keywords

Navigation