Skip to main content

Clustering-Based Techniques for Big Data Analysis of Gene Expression

  • Conference paper
  • First Online:
Proceedings of the International Conference on Computing and Communication Systems

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 170))

  • 546 Accesses

Abstract

Proper investigation of cancer has always been of foremost importance for its accurate forecasting, thereby aiding the correct cure. Microarray-based gene expression profiling is being practised for this purpose making it one of the leading research interests for discovering gene clusters accountable for a particular behavior. Big data analytics provides an efficient way to seek facts about the biological processes inherent from this microarray data. Previously, many attempts have been made to achieve this using numerous clustering approaches, but the results were quite deviating from the reality. In this work, we have attempted to discover potential and accurate gene indicators from the gene expression data by using a well-known quantitative measure called quantum clustering. The characteristic feature of this concept is that the total estimate of clusters formed is not predetermined but is determined depending on the nature of the data. As the concept is established on the grounds that a cluster is formed by density wise spaces, where the center is formed based on the density maxima point, this motivated us to detect those clusters which may be engaged in a certain biological process. The clustering approach becomes privileged in that extremely dense spaces are inherently detected and combined to produce arbitrarily shaped clusters without regarding the dimension of the space. For the purpose of comparing the results obtained, we have also applied a non-parametric measure, namely, the mean shift clustering on the gene expression data. For validation purpose, we used DAVID to check the significance of the clusters created. Results show that the genes so discovered are highly indicative in the pursuit of rare diseases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jiang D, Tang C, Zhang A (Nov 2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386

    Google Scholar 

  2. Board FS (2017) Artificial intelligence and machine learning in financial services. http://www.fsb.org/2017/11/artificialintelligence-and-machine-learning-in-financialservice/. Accessed 30 Jan 2018

  3. Maji P (2012) Mutual-information-based supervised attribute clustering for microarray sample classification. IEEE Trans Knowl Data Eng 24(1):127–140

    Google Scholar 

  4. Pita-Juarez et al (2018) The pathway coexpression network: revealing pathway relationships. PLoS Comput Bifol 14(3):e1006042

    Google Scholar 

  5. Kim J, Shin M (2017) Inferring genes and biological functions that are sensitive to the severity of toxicity symptoms. Int J Mol Sci 18(4):755

    Google Scholar 

  6. Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Tran Knowl Discov Data (TKDD) 3(1):1

    Article  Google Scholar 

  7. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discov 2(3):283–304

    Google Scholar 

  8. Breyne P, Zabeau M (2001) Genome-wide expression analysis of plant cell cycle modulated genes. Current Opin Plant Biol 4(2):136–142

    Article  Google Scholar 

  9. Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press

    Google Scholar 

  10. Cheng Y, Church GM (2000) Biclustering of expression data. ISMB, vol 8, no 2000, pp 93–103

    Google Scholar 

  11. Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Statistica Sinica 61–86

    Google Scholar 

  12. Abdullah A, Hussain A (2006) A new biclustering technique based on crossing minimization. Neurocomputing 69(16):1882–1896

    Google Scholar 

  13. Preli A et al (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129

    Google Scholar 

  14. Ben-Dor A et al (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384

    Google Scholar 

  15. Cho H et al (2004) Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the 2004 SIAM international conference on data mining, society for industrial and applied mathematics, pp 114–125

    Google Scholar 

  16. Banerjee A et al (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986

    Google Scholar 

  17. Deodhar M et al (2008) Hunting for coherent co-clusters in high dimensional and noisy datasets. In: IEEE international conference on data mining workshops ICDMW08. IEEE, pp 654–663

    Google Scholar 

  18. Huang DW (2007) DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucl Acids Res 35(suppl 2) W169–W175

    Google Scholar 

  19. Horn D, Gottlieb A (2002) The method of quantum clustering. In: Advances in neural information processing systems, pp 769–776

    Google Scholar 

  20. Sebastian R (2016) An overview of gradient descent optimization algorithms. vol 1609, no 04747

    Google Scholar 

  21. Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40

    Article  MathSciNet  Google Scholar 

  22. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 5:603–619

    Article  Google Scholar 

  23. West M et al (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci 98(20):11462–11467

    Google Scholar 

  24. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Google Scholar 

  25. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedi Natl Acad Sci 96(12):6745–6750

    Google Scholar 

  26. Van der Pouw Kraan TCTM et al (2007) Rheumatoid arthritis subtypes identified by genomic profiling of peripheral blood cells: assignment of a type I interferon signature in a subpopulation of patients. Ann Rheum Dis 66(8):1008–1014

    Google Scholar 

  27. Liu X, Cheng HM, Zhang ZY (2019) Evaluation of community detection methods. IEEE Trans Knowl Data Eng

    Google Scholar 

  28. Hamosh A et al (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl Acids Res 33(suppl 1):D514–D517

    Google Scholar 

  29. Becker KG et al (2004) The genetic association database. Nature Gen 36(5):431–432

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanuja Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Das, T., Pratim Kalita, P., Saha, G. (2021). Clustering-Based Techniques for Big Data Analysis of Gene Expression. In: Maji, A.K., Saha, G., Das, S., Basu, S., Tavares, J.M.R.S. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 170. Springer, Singapore. https://doi.org/10.1007/978-981-33-4084-8_16

Download citation

Publish with us

Policies and ethics