Article

TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Authors:
Lizhuang Zhao

Rensselaer Polytechnic Institute, Troy, New York

Rensselaer Polytechnic Institute, Troy, New York
View Profile

,
Mohammed J. Zaki

Rensselaer Polytechnic Institute, Troy, New York

Rensselaer Polytechnic Institute, Troy, New York
View Profile

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of dataJune 2005Pages 694–705https://doi.org/10.1145/1066157.1066236

Published:14 June 2005Publication History

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Pages 694–705

ABSTRACT

In this paper we introduce a novel algorithm called TRICLUSTER, for mining coherent clusters in three-dimensional (3D) gene expression datasets. TRICLUSTER can mine arbitrarily positioned and overlapping clusters, and depending on different parameter values, it can mine different types of clusters, including those with constant or similar values along each dimension, as well as scaling and shifting expression patterns. TRICLUSTER relies on graph-based approach to mine all valid clusters. For each time slice, i.e., a gene×sample matrix, it constructs the range multigraph, a compact representation of all similar value ranges between any two sample columns. It then searches for constrained maximal cliques in this multigraph to yield the set of bi-clusters for this time slice. Then TRICLUSTER constructs another graph using the biclusters (as vertices) from each time slice; mining cliques from this graph yields the final set of triclusters. Optionally, TRICLUSTER merges/deletes some clusters having large overlaps. We present a useful set of metrics to evaluate the clustering quality, and we show that TRICLUSTER can find significant triclusters in the real microarray datasets.

References

C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In ACM SIGMOD Conference, 1999. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. Finding generalized projected clusters in high dimensional spaces. In ACM SIGMOD Conference, 2000. Google ScholarDigital Library
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In ACM SIGMOD Conference, June 1998. Google ScholarDigital Library
Z. Bar-Joseph. Analyzing time series gene expression data. Bioinformatics, 20(16):2493--2503, 2004. Google ScholarDigital Library
A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. Discovering local structure in gene expression data: the order-preserving submatrix problem. In 6th Annual Int'l Conference on Computational Biology, 2002. Google ScholarDigital Library
A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. In 3rd Annual Int'l Conference on Computational Biology, RECOMB, 1999. Google ScholarDigital Library
Y. Cheng and G. M. Church. Biclustering of expression data. In 8th Int'l Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000. Google ScholarDigital Library
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Science, USA, 95(25):14863--14868, 1998.Google ScholarCross Ref
S. Erdal, O. Ozturk, D. Armbruster, H. Ferhatosmanoglu, and W. Ray. A time series analysis of microarray data. In 4th IEEE Int'l Symposium on Bioinformatics and Bioengineering, May 2004. Google ScholarDigital Library
J. Feng, P. E. Barbano, and B. Mishra. Time-frequency feature detection for timecourse microarray data. In 2004 ACM Symposium on Applied Computing, 2004. Google ScholarDigital Library
V. Filkov, S. Skiena, and J. Zhi. Analysis techniques for microarray time-series data. In 5th Annual Int'l Conference on Computational Biology, 2001. Google ScholarDigital Library
J. H. Friedman and J. J. Meulman. Clustering objects on subsets of attributes. Journal of the Royal Statistical Society Series B, 66(4):815, 2004.Google ScholarCross Ref
E. Hartuv, A. Schmitt, J. Lange, S. Meier-Ewert, H. Lehrach, and R. Shamir. An algorithm for clustering cdnas for gene expression analysis. In 3rd Annual Int'l Conference on Computational Biology, 1999. Google ScholarDigital Library
D. Jiang, J. Pei, M. Ramanathany, C. Tang, and A. Zhang. Mining coherent gene clusters from gene-sample-time microarray data. In 10th ACM SIGKDD Conference, 2004. Google ScholarDigital Library
J. Liu and W. Wang. OP-cluster: clustering by tendency in high dimensional spaces. In 3rd IEEE Int'l Conference on Data Mining, pages 187--194, 2003. Google ScholarDigital Library
S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004. Google ScholarDigital Library
C. S. Moller, F. Klawonn, K. Cho, H. Yin, and O. W. uer. Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems, 2004.Google Scholar
T. Murali and S. Kasif. Extracting conserved gene expression motifs from gene expression data. In Pacific Symposium on Biocomputing, 2003.Google Scholar
C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. Murali. A monte carlo algorithm for fast projective clustering. In ACM SIGMOD Conference, 2002. Google ScholarDigital Library
M. F. Ramoni, P. Sebastiani, and I. S. Kohane. Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences, USA, 99(14):9121--9126, July 2002.Google ScholarCross Ref
R. Sharan and R. Shamir. CLICK: A clustering algorithm with applications to gene expression analysis. In Int'l Conference on Intelligent Systems for Molecular Biology, 2000. Google ScholarDigital Library
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12):3273--3297, Dec. 1998.Google ScholarCross Ref
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceeedings of the National Academy of Science, USA, 96(6):2907--2912, 1999.Google ScholarCross Ref
A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(Suppl.1):S136-S144, 2002.Google ScholarCross Ref
C. Tang, L. Zhang, A. Zhang, and M. Ramanathan. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In 2nd IEEE Int'l Symposium on Bioinformatics and Bioengineering, 2001. Google ScholarDigital Library
H. Wang, W. Wang, J. Yang, and P. S. Yu. Clustering by pattern similarity in large data sets. In ACM SIGMOD Conference, 2002. Google ScholarDigital Library
E. P. Xing and R. M. Karp. Cliff: clustering high-dim microarray data via iterative feature filtering using normalized cuts. Bioinformatics, 17(Suppl.1):S306-S315, 2001.Google ScholarCross Ref
Y. Xu. V. Olman, and D. Xu. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics, 18(4):536--545, 2002.Google ScholarCross Ref
J. Yang, W. Wang, H. Wang, and P. Yu. Δ-clusters: Capturing subspace correlation in a large data set. In 18th Int'l Conference on Data Engineering, ICDE, 2002. Google ScholarDigital Library
K. Yeung and W. Ruzzo. Principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763--774, 2001.Google ScholarCross Ref

TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Recommendations

THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data
Graphical abstract

Display Omitted
Highlights
- THD-Tricluster mines co-expressed genes, exhibiting shifting-and-scaling patterns.
Abstract
Developing a cost-effective and robust triclustering algorithm that can identify triclusters of high biological significance in the gene-sample-time (GST) domain is a challenging task. Most existing triclustering algorithms can detect ...
Read More
Approximate bicluster and tricluster boxes in the analysis of binary data
RSFDGrC'11: Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing

A disjunctive model of box bicluster and tricluster analysis is considered. A least-squares locally-optimal one cluster method is proposed, oriented towards the analysis of binary data. The method involves a parameter, the scale shift, and is proven to ...
Read More
A survey of disease connections for CD4+ T cell master genes and their directly linked genes

HighlightsCD4+ T cell subtype master genes and their connected genes are more likely to be associated with a disease or a phenotype.Genes connected to the CD4+ T cell subtype master genes are more likely to be transcription factors.CD4+ T cell subtype ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
Conference Chair:
Fatma Ozcan
IBM Almaden Research Center
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 149
  Total Citations
  View Citations
- 1,262
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data

Approximate bicluster and tricluster boxes in the analysis of binary data

A survey of disease connections for CD4+ T cell master genes and their directly linked genes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data

Approximate bicluster and tricluster boxes in the analysis of binary data

A survey of disease connections for CD4+ T cell master genes and their directly linked genes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media