Abstract
Clustering is widely used in bioinformatics to find gene correlation patterns. Although many algorithms have been proposed, these are usually confronted with difficulties in meeting the requirements of both automation and high quality. In this paper, we propose a novel algorithm for clustering genes from their expression profiles. The unique features of the proposed algorithm are twofold: it takes into consideration global, rather than local, gene correlation information in clustering processes; and it incorporates clustering quality measurement into the clustering processes to implement non-parametric, automatic and global optimal gene clustering. The evaluation on simulated and real gene data sets demonstrates the effectiveness of the algorithm.
Similar content being viewed by others
References
Aldenderfer MS, Blashfield RK (1984) Cluster analysis. Sage Publications, Beverly Hills
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences of the USA Cell Biology 96:6745–6750
Altman RB, Raychaudhuri S (2001) Whole-genome expression analysis: challenges beyond clustering. Curr Opin Struct Biol 11(3):340–347
Azuaje F (2003) Clustering-based approaches to discovering and visualising microarray data patterns. Brief Bioinform 4(1):31–42
Boutros PC, Okey AB (2005) Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data. Brief Bioinform 6(4):331–343
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. In: Proceedings of the National Academy of Sciences of the USA, Cenetics 95:14863–14868
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inform Sys 17(2/3):107–145
Hathaway RJ, Bezdek JC (2003) Visual cluster validity for prototype generator clustering models. Pattern Recognition Letters 24(9–10):1563–1569
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs
MacQueens JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkley symposium on mathematical statistics and probability, vol I Statistics, pp 281–297
Özsu MT, Valduriez P (1991) Principle of distributed database systems. Prentice-Hall, Englewood Cliffs
Raychaudhuri S, Sutphin PD, Chang JT, Altman RB (2001) Basic microarray analysis: grouping and feature reduction. Trends Biotechnol 19(5):189–193
Sherlock G (2001) Analysis of large-scale gene expression data. Brief Bioinform 2(4):350–362
Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95(1):14–18
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Fucher B (1998) Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization. Mol Biol Cell 9(12):3273–3297
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistics. J R Statist Soc B 63:411–423
Tseng VS, Kao CP (2005) Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Trans Comput Biol Bioinform 2(4):355–365
Tseng SM, Kao CP (2003) Mining and Validating Gene Expression Patterns: An Integrated Approach and Applications. Informatica 27:21–27
Zhang T, Ramakrishnman R, Linvy M (1996) BIRCH: An efficient method for very large databases, ACM SIGMOD. Montreal
Acknowledgments
This research is partially supported by the Starting Grant of Faculty of Science and Technology, Deakin University, Australia. We thank Dr Yang Xiang for his contribution to some evaluations. The authors also thank the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hou, J., Shi, W., Li, G. et al. An effective non-parametric method for globally clustering genes from expression profiles. Med Bio Eng Comput 45, 1175–1185 (2007). https://doi.org/10.1007/s11517-007-0271-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-007-0271-1