Skip to main content

Advertisement

Log in

An effective non-parametric method for globally clustering genes from expression profiles

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Clustering is widely used in bioinformatics to find gene correlation patterns. Although many algorithms have been proposed, these are usually confronted with difficulties in meeting the requirements of both automation and high quality. In this paper, we propose a novel algorithm for clustering genes from their expression profiles. The unique features of the proposed algorithm are twofold: it takes into consideration global, rather than local, gene correlation information in clustering processes; and it incorporates clustering quality measurement into the clustering processes to implement non-parametric, automatic and global optimal gene clustering. The evaluation on simulated and real gene data sets demonstrates the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications, Beverly Hills

    Google Scholar 

  2. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences of the USA Cell Biology 96:6745–6750

  3. Altman RB, Raychaudhuri S (2001) Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol 11(3):340–347

    Article  Google Scholar 

  4. Azuaje F (2003) Clustering-based approaches to discovering and visualising microarray data patterns. Brief Bioinform 4(1):31–42

    Article  Google Scholar 

  5. Boutros PC, Okey AB (2005) Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data. Brief Bioinform 6(4):331–343

    Article  Google Scholar 

  6. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. In: Proceedings of the National Academy of Sciences of the USA, Cenetics 95:14863–14868

  7. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inform Sys 17(2/3):107–145

    Article  MATH  Google Scholar 

  8. Hathaway RJ, Bezdek JC (2003) Visual cluster validity for prototype generator clustering models. Pattern Recognition Letters 24(9–10):1563–1569

    Article  MATH  Google Scholar 

  9. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs

    Google Scholar 

  10. MacQueens JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkley symposium on mathematical statistics and probability, vol I Statistics, pp 281–297

  11. Özsu MT, Valduriez P (1991) Principle of distributed database systems. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  12. Raychaudhuri S, Sutphin PD, Chang JT, Altman RB (2001) Basic microarray analysis: grouping and feature reduction. Trends Biotechnol 19(5):189–193

    Article  Google Scholar 

  13. Sherlock G (2001) Analysis of large-scale gene expression data. Brief Bioinform 2(4):350–362

    Article  Google Scholar 

  14. Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1):14–18

    Article  Google Scholar 

  15. Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438

    Google Scholar 

  16. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Fucher B (1998) Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization. Mol Biol Cell 9(12):3273–3297

    Google Scholar 

  17. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistics. J R Statist Soc B 63:411–423

    Article  MATH  MathSciNet  Google Scholar 

  18. Tseng VS, Kao CP (2005) Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Trans Comput Biol Bioinform 2(4):355–365

    Article  Google Scholar 

  19. Tseng SM, Kao CP (2003) Mining and Validating Gene Expression Patterns: An Integrated Approach and Applications. Informatica 27:21–27

    MathSciNet  Google Scholar 

  20. Zhang T, Ramakrishnman R, Linvy M (1996) BIRCH: An efficient method for very large databases, ACM SIGMOD. Montreal

    Google Scholar 

Download references

Acknowledgments

This research is partially supported by the Starting Grant of Faculty of Science and Technology, Deakin University, Australia. We thank Dr Yang Xiang for his contribution to some evaluations. The authors also thank the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingyu Hou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou, J., Shi, W., Li, G. et al. An effective non-parametric method for globally clustering genes from expression profiles. Med Bio Eng Comput 45, 1175–1185 (2007). https://doi.org/10.1007/s11517-007-0271-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-007-0271-1

Keywords

Navigation