Abstract
The main goal of this study is the identification of a robust set of genes having the capability of discerning among the different sub-types of lung cancer: Small Cell Lung Carcinoma (SCLC), Adenocarcinoma (ACC), Squamous Cell Carcinoma (SCC) and Large Cell Lung Carcinoma (LCLC). To achieve this goal, an overall differentially expressed genes analysis was performed by using data from gene expression microarrays publicly stored at NCBI/GEO platform. Once the analysis was done, a total of 60 Differential Expressed Genes (DEGs) were selected and then used in the development of predictive models combining supervised machine learning and feature selection algorithms. This provided a reduced and specific gene signature that allows identifying the sub-type of lung cancer of new samples. The predictive models designed are assessed in terms of accuracy, f1-score, sensitivity and specificity. Finally, a set of public web platforms having biological information on genes, were used in order to determine the relation that exists between the final subset of genes and the addressed sub-types of lung cancer.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A.: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin. 68(6), 394–424 (2018)
Cooper, W.A., et al.: The textbook on Lung Cancer: time for personalized medicine. Ann. Transl. Med. 3(7), 86 (2015)
Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA Microarray. Science 270(5235), 467 (1995)
Sanchez Palencia, A., et al.: Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer. Int. J. Cancer 129(2), 355–364 (2011)
Yanaihara, N., et al.: Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell 9(3), 189–198 (2006)
Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., et al.: NCBI GEO: mining tens of millions of expression profiles database and tools update. Nucl. Acids Res. 35(suppl. 1), D760–D765 (2007)
R Core Team: R: A language and environment for statistical computing (2013)
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)
Galvez, J.M., Castillo, D., Herrera, L.J., Roman, B.S., Valenzuela, O., Ortuno, F.M., et al.: Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series. PLoS ONE 13(5), 1V (2018). https://doi.org/10.1371/journal.pone.0196836
Smyth, G.K.: Limma: linear models for Microarray data. In: Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S. (eds.) Bioinformatics and computational biology solutions using R and Bioconductor. SBH, pp. 397–420. Springer, New York (2005). https://doi.org/10.1007/0-387-29362-0_23
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarrays data. Adv. Bioinform. 2015, 13 (2015)
Diaz Uriarte, R., de Andres, S.A.: Gene Selection and classification of microarray data using Random forest. BMC Bioinform. 7, 3 (2006)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20(3), 273–297 (1995)
Noble, W.S.: What is a support vector machine? Nature Biotechnol. 24, 1565–1567 (2006)
Parry, R., Jones, W., Stokes, T., Phan, J., Moffitt, R., Fang, H., et al.: K nearest neighbor models for Microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J. 10(4), 292 (2010)
Carvalho-Silva, D., et al.: Open Targets Platform: new developments and updates two years on. Nucl. Acids Res. 47(D1), D1056–D1065 (2019). https://doi.org/10.1093/nar/gky1133
Safran, M., et al.: GeneCards Version 3: the human gene integrator. Database 2010, baq020 (2010)
Chen, Z., et al.: cAMP/CREB-regulated LINC00473 marks LKB1-inactivated lung cancer and mediates tumor growth. J. Clin. invest. 126(6), 2267–2279 (2016)
Savci-Heijink, C.D., Kosari, F., Aubry, M.C., Caron, B.L., Sun, Z., Yang, P., Vasmatzis, G.: The role of desmoglein-3 in the diagnosis of squamous cell carcinoma of the lung. Am. J. Pathol. 174(5), 1629–1637 (2009)
Saaber, F., Chen, Y., Cui, T., Yang, L., Mireskandari, M., Petersen, I.: Expression of desmogleins 13 and their clinical impacts on human lung cancer. Pathol.-Res. Pract. 211(3), 208–213 (2015)
Zhang, F., et al.: Identification of key transcription factors associated with lung squamous cell carcinoma. Med. Sci. Monit.: Int. Med. J. Exp. Clin. Res. 23, 172 (2017)
Chen, Z., et al.: MiR-195 suppresses non-small cell lung cancer by targeting CHEK1. Oncotarget 6(11), 9445 (2016)
Cui, T., et al.: The p53 target gene desmocollin 3 acts as a novel tumor suppressor through inhibiting EGFR/ERK pathway in human lung cancer. Carcinogenesis 33(12), 2326–2333 (2012)
Frezzetti, D., et al.: Vascular endothelial growth factor a regulates the secretion of different angiogenic factors in lung cancer cells. J. Cell. Physiol. 231(7), 1514–1521 (2016)
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10(1), 57–63 (2009)
Castillo, D., Galvez, J.M., Herrera, L.J., Roman, B.S., Rojas, F., Rojas, I.: Integration of RNA-Seq data with heterogeneous Microarray data for breast cancer profiling. BMC Bioinform. 18(1), 506 (2017). https://doi.org/10.1186/s12859-017-1925-0
Castillo, D., et al.: Leukemia multiclass assessment and classification from Microarray and RNA-Seq technologies integration at gene expression level. PLoS ONE (2019). https://doi.org/10.1371/journal.pone.0212127
Acknowledgements
This research has been possible thanks to the support of project: TIN2015-71873-R (Spanish Ministry of Economy and Competitiveness – MINECO – and the European Regional Development Fund – ERDF).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
González, S., Castillo, D., Galvez, J.M., Rojas, I., Herrera, L.J. (2019). Feature Selection and Assessment of Lung Cancer Sub-types by Applying Predictive Models. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_73
Download citation
DOI: https://doi.org/10.1007/978-3-030-20518-8_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)