skip to main content
10.1145/1066157.1066236acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Published:14 June 2005Publication History

ABSTRACT

In this paper we introduce a novel algorithm called TRICLUSTER, for mining coherent clusters in three-dimensional (3D) gene expression datasets. TRICLUSTER can mine arbitrarily positioned and overlapping clusters, and depending on different parameter values, it can mine different types of clusters, including those with constant or similar values along each dimension, as well as scaling and shifting expression patterns. TRICLUSTER relies on graph-based approach to mine all valid clusters. For each time slice, i.e., a gene×sample matrix, it constructs the range multigraph, a compact representation of all similar value ranges between any two sample columns. It then searches for constrained maximal cliques in this multigraph to yield the set of bi-clusters for this time slice. Then TRICLUSTER constructs another graph using the biclusters (as vertices) from each time slice; mining cliques from this graph yields the final set of triclusters. Optionally, TRICLUSTER merges/deletes some clusters having large overlaps. We present a useful set of metrics to evaluate the clustering quality, and we show that TRICLUSTER can find significant triclusters in the real microarray datasets.

References

  1. C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast algorithms for projected clustering. In ACM SIGMOD Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal and P. S. Yu. Finding generalized projected clusters in high dimensional spaces. In ACM SIGMOD Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In ACM SIGMOD Conference, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Joseph. Analyzing time series gene expression data. Bioinformatics, 20(16):2493--2503, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. Discovering local structure in gene expression data: the order-preserving submatrix problem. In 6th Annual Int'l Conference on Computational Biology, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. In 3rd Annual Int'l Conference on Computational Biology, RECOMB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Cheng and G. M. Church. Biclustering of expression data. In 8th Int'l Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Science, USA, 95(25):14863--14868, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Erdal, O. Ozturk, D. Armbruster, H. Ferhatosmanoglu, and W. Ray. A time series analysis of microarray data. In 4th IEEE Int'l Symposium on Bioinformatics and Bioengineering, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Feng, P. E. Barbano, and B. Mishra. Time-frequency feature detection for timecourse microarray data. In 2004 ACM Symposium on Applied Computing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Filkov, S. Skiena, and J. Zhi. Analysis techniques for microarray time-series data. In 5th Annual Int'l Conference on Computational Biology, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. H. Friedman and J. J. Meulman. Clustering objects on subsets of attributes. Journal of the Royal Statistical Society Series B, 66(4):815, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. E. Hartuv, A. Schmitt, J. Lange, S. Meier-Ewert, H. Lehrach, and R. Shamir. An algorithm for clustering cdnas for gene expression analysis. In 3rd Annual Int'l Conference on Computational Biology, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Jiang, J. Pei, M. Ramanathany, C. Tang, and A. Zhang. Mining coherent gene clusters from gene-sample-time microarray data. In 10th ACM SIGKDD Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Liu and W. Wang. OP-cluster: clustering by tendency in high dimensional spaces. In 3rd IEEE Int'l Conference on Data Mining, pages 187--194, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. S. Moller, F. Klawonn, K. Cho, H. Yin, and O. W. uer. Clustering of unevenly sampled gene expression time-series data. Fuzzy Sets and Systems, 2004.Google ScholarGoogle Scholar
  18. T. Murali and S. Kasif. Extracting conserved gene expression motifs from gene expression data. In Pacific Symposium on Biocomputing, 2003.Google ScholarGoogle Scholar
  19. C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. Murali. A monte carlo algorithm for fast projective clustering. In ACM SIGMOD Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. F. Ramoni, P. Sebastiani, and I. S. Kohane. Cluster analysis of gene expression dynamics. Proceedings of the National Academy of Sciences, USA, 99(14):9121--9126, July 2002.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Sharan and R. Shamir. CLICK: A clustering algorithm with applications to gene expression analysis. In Int'l Conference on Intelligent Systems for Molecular Biology, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12):3273--3297, Dec. 1998.Google ScholarGoogle ScholarCross RefCross Ref
  23. P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceeedings of the National Academy of Science, USA, 96(6):2907--2912, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(Suppl.1):S136-S144, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  25. C. Tang, L. Zhang, A. Zhang, and M. Ramanathan. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In 2nd IEEE Int'l Symposium on Bioinformatics and Bioengineering, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Wang, W. Wang, J. Yang, and P. S. Yu. Clustering by pattern similarity in large data sets. In ACM SIGMOD Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. P. Xing and R. M. Karp. Cliff: clustering high-dim microarray data via iterative feature filtering using normalized cuts. Bioinformatics, 17(Suppl.1):S306-S315, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. Xu. V. Olman, and D. Xu. Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics, 18(4):536--545, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Yang, W. Wang, H. Wang, and P. Yu. Δ-clusters: Capturing subspace correlation in a large data set. In 18th Int'l Conference on Data Engineering, ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Yeung and W. Ruzzo. Principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763--774, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  1. TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
          June 2005
          990 pages
          ISBN:1595930604
          DOI:10.1145/1066157
          • Conference Chair:
          • Fatma Ozcan

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader