ABSTRACT
In this paper we introduce a novel algorithm called TRICLUSTER, for mining coherent clusters in three-dimensional (3D) gene expression datasets. TRICLUSTER can mine arbitrarily positioned and overlapping clusters, and depending on different parameter values, it can mine different types of clusters, including those with constant or similar values along each dimension, as well as scaling and shifting expression patterns. TRICLUSTER relies on graph-based approach to mine all valid clusters. For each time slice, i.e., a gene×sample matrix, it constructs the range multigraph, a compact representation of all similar value ranges between any two sample columns. It then searches for constrained maximal cliques in this multigraph to yield the set of bi-clusters for this time slice. Then TRICLUSTER constructs another graph using the biclusters (as vertices) from each time slice; mining cliques from this graph yields the final set of triclusters. Optionally, TRICLUSTER merges/deletes some clusters having large overlaps. We present a useful set of metrics to evaluate the clustering quality, and we show that TRICLUSTER can find significant triclusters in the real microarray datasets.
- C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In ACM SIGMOD Conference, 1999. Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu. Finding generalized projected clusters in high dimensional spaces. In ACM SIGMOD Conference, 2000. Google ScholarDigital Library
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In ACM SIGMOD Conference, June 1998. Google ScholarDigital Library
- Z. Bar-Joseph. Analyzing time series gene expression data. Bioinformatics, 20(16):2493--2503, 2004. Google ScholarDigital Library
- A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. Discovering local structure in gene expression data: the order-preserving submatrix problem. In 6th Annual Int'l Conference on Computational Biology, 2002. Google ScholarDigital Library
- A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. In 3rd Annual Int'l Conference on Computational Biology, RECOMB, 1999. Google ScholarDigital Library
- Y. Cheng and G. M. Church. Biclustering of expression data. In 8th Int'l Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000. Google ScholarDigital Library
- M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Science, USA, 95(25):14863--14868, 1998.Google ScholarCross Ref
- S. Erdal, O. Ozturk, D. Armbruster, H. Ferhatosmanoglu, and W. Ray. A time series analysis of microarray data. In 4th IEEE Int'l Symposium on Bioinformatics and Bioengineering, May 2004. Google ScholarDigital Library
- J. Feng, P. E. Barbano, and B. Mishra. Time-frequency feature detection for timecourse microarray data. In 2004 ACM Symposium on Applied Computing, 2004. Google ScholarDigital Library
- V. Filkov, S. Skiena, and J. Zhi. Analysis techniques for microarray time-series data. In 5th Annual Int'l Conference on Computational Biology, 2001. Google ScholarDigital Library
- J. H. Friedman and J. J. Meulman. Clustering objects on subsets of attributes. Journal of the Royal Statistical Society Series B, 66(4):815, 2004.Google ScholarCross Ref
- E. Hartuv, A. Schmitt, J. Lange, S. Meier-Ewert, H. Lehrach, and R. Shamir. An algorithm for clustering cdnas for gene expression analysis. In 3rd Annual Int'l Conference on Computational Biology, 1999. Google ScholarDigital Library
- D. Jiang, J. Pei, M. Ramanathany, C. Tang, and A. Zhang. Mining coherent gene clusters from gene-sample-time microarray data. In 10th ACM SIGKDD Conference, 2004. Google ScholarDigital Library
- J. Liu and W. Wang. OP-cluster: clustering by tendency in high dimensional spaces. In 3rd IEEE Int'l Conference on Data Mining, pages 187--194, 2003. Google ScholarDigital Library
- S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004. Google ScholarDigital Library
- C. S. Moller, F. Klawonn, K. Cho, H. Yin, and O. W. uer. Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems, 2004.Google Scholar
- T. Murali and S. Kasif. Extracting conserved gene expression motifs from gene expression data. In Pacific Symposium on Biocomputing, 2003.Google Scholar
- C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. Murali. A monte carlo algorithm for fast projective clustering. In ACM SIGMOD Conference, 2002. Google ScholarDigital Library
- M. F. Ramoni, P. Sebastiani, and I. S. Kohane. Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences, USA, 99(14):9121--9126, July 2002.Google ScholarCross Ref
- R. Sharan and R. Shamir. CLICK: A clustering algorithm with applications to gene expression analysis. In Int'l Conference on Intelligent Systems for Molecular Biology, 2000. Google ScholarDigital Library
- P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12):3273--3297, Dec. 1998.Google ScholarCross Ref
- P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceeedings of the National Academy of Science, USA, 96(6):2907--2912, 1999.Google ScholarCross Ref
- A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(Suppl.1):S136-S144, 2002.Google ScholarCross Ref
- C. Tang, L. Zhang, A. Zhang, and M. Ramanathan. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In 2nd IEEE Int'l Symposium on Bioinformatics and Bioengineering, 2001. Google ScholarDigital Library
- H. Wang, W. Wang, J. Yang, and P. S. Yu. Clustering by pattern similarity in large data sets. In ACM SIGMOD Conference, 2002. Google ScholarDigital Library
- E. P. Xing and R. M. Karp. Cliff: clustering high-dim microarray data via iterative feature filtering using normalized cuts. Bioinformatics, 17(Suppl.1):S306-S315, 2001.Google ScholarCross Ref
- Y. Xu. V. Olman, and D. Xu. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics, 18(4):536--545, 2002.Google ScholarCross Ref
- J. Yang, W. Wang, H. Wang, and P. Yu. Δ-clusters: Capturing subspace correlation in a large data set. In 18th Int'l Conference on Data Engineering, ICDE, 2002. Google ScholarDigital Library
- K. Yeung and W. Ruzzo. Principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763--774, 2001.Google ScholarCross Ref
- TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data
Recommendations
THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data
Graphical abstractDisplay Omitted
Highlights- THD-Tricluster mines co-expressed genes, exhibiting shifting-and-scaling patterns.
AbstractDeveloping a cost-effective and robust triclustering algorithm that can identify triclusters of high biological significance in the gene-sample-time (GST) domain is a challenging task. Most existing triclustering algorithms can detect ...
Approximate bicluster and tricluster boxes in the analysis of binary data
RSFDGrC'11: Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computingA disjunctive model of box bicluster and tricluster analysis is considered. A least-squares locally-optimal one cluster method is proposed, oriented towards the analysis of binary data. The method involves a parameter, the scale shift, and is proven to ...
A survey of disease connections for CD4+ T cell master genes and their directly linked genes
HighlightsCD4+ T cell subtype master genes and their connected genes are more likely to be associated with a disease or a phenotype.Genes connected to the CD4+ T cell subtype master genes are more likely to be transcription factors.CD4+ T cell subtype ...
Comments