Abstract
Proper investigation of cancer has always been of foremost importance for its accurate forecasting, thereby aiding the correct cure. Microarray-based gene expression profiling is being practised for this purpose making it one of the leading research interests for discovering gene clusters accountable for a particular behavior. Big data analytics provides an efficient way to seek facts about the biological processes inherent from this microarray data. Previously, many attempts have been made to achieve this using numerous clustering approaches, but the results were quite deviating from the reality. In this work, we have attempted to discover potential and accurate gene indicators from the gene expression data by using a well-known quantitative measure called quantum clustering. The characteristic feature of this concept is that the total estimate of clusters formed is not predetermined but is determined depending on the nature of the data. As the concept is established on the grounds that a cluster is formed by density wise spaces, where the center is formed based on the density maxima point, this motivated us to detect those clusters which may be engaged in a certain biological process. The clustering approach becomes privileged in that extremely dense spaces are inherently detected and combined to produce arbitrarily shaped clusters without regarding the dimension of the space. For the purpose of comparing the results obtained, we have also applied a non-parametric measure, namely, the mean shift clustering on the gene expression data. For validation purpose, we used DAVID to check the significance of the clusters created. Results show that the genes so discovered are highly indicative in the pursuit of rare diseases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jiang D, Tang C, Zhang A (Nov 2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Board FS (2017) Artificial intelligence and machine learning in financial services. http://www.fsb.org/2017/11/artificialintelligence-and-machine-learning-in-financialservice/. Accessed 30 Jan 2018
Maji P (2012) Mutual-information-based supervised attribute clustering for microarray sample classification. IEEE Trans Knowl Data Eng 24(1):127–140
Pita-Juarez et al (2018) The pathway coexpression network: revealing pathway relationships. PLoS Comput Bifol 14(3):e1006042
Kim J, Shin M (2017) Inferring genes and biological functions that are sensitive to the severity of toxicity symptoms. Int J Mol Sci 18(4):755
Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Tran Knowl Discov Data (TKDD) 3(1):1
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discov 2(3):283–304
Breyne P, Zabeau M (2001) Genome-wide expression analysis of plant cell cycle modulated genes. Current Opin Plant Biol 4(2):136–142
Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press
Cheng Y, Church GM (2000) Biclustering of expression data. ISMB, vol 8, no 2000, pp 93–103
Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Statistica Sinica 61–86
Abdullah A, Hussain A (2006) A new biclustering technique based on crossing minimization. Neurocomputing 69(16):1882–1896
Preli A et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129
Ben-Dor A et al (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384
Cho H et al (2004) Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the 2004 SIAM international conference on data mining, society for industrial and applied mathematics, pp 114–125
Banerjee A et al (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986
Deodhar M et al (2008) Hunting for coherent co-clusters in high dimensional and noisy datasets. In: IEEE international conference on data mining workshops ICDMW08. IEEE, pp 654–663
Huang DW (2007) DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucl Acids Res 35(suppl 2) W169–W175
Horn D, Gottlieb A (2002) The method of quantum clustering. In: Advances in neural information processing systems, pp 769–776
Sebastian R (2016) An overview of gradient descent optimization algorithms. vol 1609, no 04747
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 5:603–619
West M et al (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci 98(20):11462–11467
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedi Natl Acad Sci 96(12):6745–6750
Van der Pouw Kraan TCTM et al (2007) Rheumatoid arthritis subtypes identified by genomic profiling of peripheral blood cells: assignment of a type I interferon signature in a subpopulation of patients. Ann Rheum Dis 66(8):1008–1014
Liu X, Cheng HM, Zhang ZY (2019) Evaluation of community detection methods. IEEE Trans Knowl Data Eng
Hamosh A et al (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl Acids Res 33(suppl 1):D514–D517
Becker KG et al (2004) The genetic association database. Nature Gen 36(5):431–432
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Das, T., Pratim Kalita, P., Saha, G. (2021). Clustering-Based Techniques for Big Data Analysis of Gene Expression. In: Maji, A.K., Saha, G., Das, S., Basu, S., Tavares, J.M.R.S. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 170. Springer, Singapore. https://doi.org/10.1007/978-981-33-4084-8_16
Download citation
DOI: https://doi.org/10.1007/978-981-33-4084-8_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4083-1
Online ISBN: 978-981-33-4084-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)