ABSTRACT
Dyadic data matrices, such as co-occurrence matrix, rating matrix, and proximity matrix, arise frequently in various important applications. A fundamental problem in dyadic data analysis is to find the hidden block structure of the data matrix. In this paper, we present a new co-clustering framework, block value decomposition(BVD), for dyadic data, which factorizes the dyadic data matrix into three components, the row-coefficient matrix R, the block value matrix B, and the column-coefficient matrix C. Under this framework, we focus on a special yet very popular case -- non-negative dyadic data, and propose a specific novel co-clustering algorithm that iteratively computes the three decomposition matrices based on the multiplicative updating rules. Extensive experimental evaluations also demonstrate the effectiveness and potential of this framework as well as the specific algorithms for co-clustering, and in particular, for discovering the hidden block structure in the dyadic data.
- N.M.L.A.P. Dempster and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(8):1--38, 1977.]]Google Scholar
- A. Banerjee, I.S. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In KDD pages 509--514, 2004.]] Google ScholarDigital Library
- P.K.Chan, M.D.F. Schlag, and J.Y. Zien. Spectral k-way ratio-cut partitioning and clustering. In DAC '93.]] Google ScholarDigital Library
- Y. Cheng and G.M. Church. Biclustering of expression data. In ICMB pages 93--103.]] Google ScholarDigital Library
- H. Cho, I. Dhillon, Y. Guan, and S. Sra. Minimum sum squared residue co-clustering of gene expression data. In SDM 2004.]]Google ScholarCross Ref
- D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature 401:788--791, 1999.]]Google ScholarCross Ref
- S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6):391--407, 1990.]]Google ScholarCross Ref
- I.S. Dhillon, S. Mallela, and D.S. Modha. Information-theoretic co-clustering. In KDD'03 pages 89--98.]] Google ScholarDigital Library
- C.H.Q. Ding, X. He, H. Zha, M. Gu, and H.D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of ICDM 2001 pages 107--114, 2001.]] Google ScholarDigital Library
- R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semi-supervised learning. In ECML pages 121--132, 2001.]] Google ScholarDigital Library
- J.A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association 67(337):123--129, March 1972.]]Google ScholarCross Ref
- K. Lang. NewsWeeder: learning to filter netnews. In ICML'95 pages 331--339, 1995.]]Google ScholarCross Ref
- D.D. Lee and H.S. Seung. Algorithms for non-negative matrix factorization. In NIPS pages 556--562, 2000.]]Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8):888--905, 2000.]] Google ScholarDigital Library
- N. Slonim and N. Tishby. Document clustering using word clusters via the information bottleneck method. In SIGIR '00.]] Google ScholarDigital Library
- N. Tishby, F. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing pages 368--377, 1999.]]Google Scholar
- W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR '03 pages 267--273. ACM Press, 2003.]] Google ScholarDigital Library
- H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. Advances in Neural Information Processing Systems 14, 2002.]]Google Scholar
Index Terms
- Co-clustering by block value decomposition
Recommendations
Orthogonal parametric non-negative matrix tri-factorization with α-divergence for co-clustering
AbstractCo-clustering algorithms can seek homogeneous sub-matrices into a dyadic data matrix, such as a document-word matrix. Algorithms for co-clustering can be expressed as a non-negative matrix tri-factorization problem such that X ≈ FSG ⊤, which is ...
Highlights- Our algorithm works by multiplicative update rules and it is convergence.
- Adding two penalties for controlling the orthogonality of row and column clusters.
- Unifying a class of algorithms for co-clustering based on α-divergence.
Fast Co-clustering Using Matrix Decomposition
APCIP '09: Proceedings of the 2009 Asia-Pacific Conference on Information Processing - Volume 02Co-clustering is a powerful data mining technique with varied applications such as text clustering, web-log mining and microarray analysis. Simultaneously clustering rows and columns (co-clustering) of large data matrix is an important problem with ...
Fast matrix decomposition in F2
In this work an efficient algorithm to perform a block decomposition for large dense rectangular matrices with entries in F"2 is presented. Matrices are stored as column blocks of row major matrices in order to facilitate rows operation and matrix ...
Comments