ABSTRACT
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and actors' roles in movies. A common prediction technique given one pairwise relation, for example a #users x #movies ratings matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new large-scale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a wide variety of error models. We demonstrate its efficiency, as well as the benefit of sharing parameters among relations.
- D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In KDD, pages 26--35, 2007. Google ScholarDigital Library
- D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multi. Anal., 11(4):581--598, 1981.Google ScholarCross Ref
- D. J. Aldous. Exchangeability and related topics, chapter 1. Springer, 1985.Google Scholar
- K. S. Azoury and M. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn., 43:211--246, 2001. Google ScholarDigital Library
- A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SDM. SIAM, 2007.Google ScholarCross Ref
- A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn. Res., 6:1705--1749, 2005. Google ScholarCross Ref
- L. Bottou. Online algorithms and stochastic approximations. In Online Learning and Neural Networks. Cambridge UP, 1998. Google ScholarDigital Library
- L. Bottou and Y. LeCun. Large scale online learning. In NIPS, 2003.Google ScholarDigital Library
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UP, 2004. Google ScholarDigital Library
- L. Bregman. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math and Math. Phys., 7:200--217, 1967.Google ScholarCross Ref
- Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms, and Applications. Oxford UP, 1997. Google ScholarDigital Library
- P. P. Chen. The entity-relationship model: Toward a unified view of data. ACM Trans. Data. Sys., 1(1):9--36, 1976. Google ScholarDigital Library
- D. Cohn and T. Hofmann. The missing link-a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.Google Scholar
- M. Collins, S. Dasgupta, and R. E. Schapire. A generalization of principal component analysis to the exponential family. In NIPS, 2001.Google Scholar
- J. Forster and M. K. Warmuth. Relative expected instantaneous loss bounds. In COLT, pages 90--99, 2000. Google ScholarDigital Library
- G. H. Golub and C. F. V. Loan. Matrix Computions. John Hopkins UP, 3rd edition, 1996.Google Scholar
- G. J. Gordon. Generalized2 linear2 models. In NIPS, 2002.Google Scholar
- D. Harman. Overview of the 2nd text retrieval conference (TREC-2). Inf. Process. Manag., 31(3):271--289, 1995. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarDigital Library
- Internet Movie Database Inc. IMDB interfaces. http://www.imdb.com/interfaces, Jan. 2007.Google Scholar
- D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2001.Google ScholarDigital Library
- J. D. Leeuw. Block relaxation algorithms in statistics, 1994.Google Scholar
- B. Long, Z. M. Zhang, X. Wú;, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, pages 585--592, 2006. Google ScholarDigital Library
- B. Long, Z. M. Zhang, X. Wu, and P. S. Yu. Relational clustering by symmetric convex coding. In ICML, pages 569--576, 2007. Google ScholarDigital Library
- B. Long, Z. M. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In KDD, pages 470--479, 2007. Google ScholarDigital Library
- P. McCullagh and J. Nelder. Generalized Linear Models. Chapman and Hall: London., 1989.Google ScholarCross Ref
- Netflix. Netflix prize dataset. http://www.netflixprize.com, Jan. 2007.Google Scholar
- J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.Google ScholarCross Ref
- F. Pereira and G. Gordon. The support vector decomposition machine. In ICML, pages 689--696, 2006. Google ScholarDigital Library
- J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, pages 713--719, 2005. Google ScholarDigital Library
- A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. Technical Report CMU-ML-08-109, Machine Learning Department, Carnegie Mellon University, 2008.Google ScholarCross Ref
- N. Srebro and T. Jaakola. Weighted low-rank approximations. In ICML, 2003.Google ScholarDigital Library
- N. Srebro, J. D. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In NIPS, 2004.Google ScholarDigital Library
- P. Stoica and Y. Selen. Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher. Sig. Process. Mag., IEEE, 21(1):112--114, 2004.Google ScholarCross Ref
- K. Yu, S. Yu, and V. Tresp. Multi-label informed latent semantic indexing. In SIGIR, pages 258--265, 2005. Google ScholarDigital Library
- S. Yu, K. Yu, V. Tresp, H.-P. Kriegel, and M. Wu. Supervised probabilistic principal component analysis. In KDD, pages 464--473, 2006. Google ScholarDigital Library
- S. Zhu, K. Yu, Y. Chi, and Y. Gong. Combining content and link for classification using matrix factorization. In SIGIR, pages 487--494, 2007. Google ScholarDigital Library
Index Terms
- Relational learning via collective matrix factorization
Recommendations
Co-manifold Matrix Factorization
ICCPR '20: Proceedings of the 2020 9th International Conference on Computing and Pattern RecognitionMatrix factorization plays a fundamental role in collaborative filtering. In collaborative filtering setting, the rating matrix R is very sparse. Thus, infinite number of matrices can fit the observed entries in the rating matrix. Without additional ...
Collective matrix factorization for co-clustering
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebWe outline some matrix factorization approaches for co- clustering polyadic data (like publication data) using non-negative factorization (NMF). NMF approximates the data as a product of non-negative low-rank matrices, and can induce desirable ...
Two Purposes for Matrix Factorization: A Historical Appraisal
Matrix factorization in numerical linear algebra (NLA) typically serves the purpose of restating some given problem in such a way that it can be solved more readily; for example, one major application is in the solution of a linear system of equations. ...
Comments