skip to main content
10.1145/1401890.1401969acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Relational learning via collective matrix factorization

Published:24 August 2008Publication History

ABSTRACT

Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and actors' roles in movies. A common prediction technique given one pairwise relation, for example a #users x #movies ratings matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new large-scale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a wide variety of error models. We demonstrate its efficiency, as well as the benefit of sharing parameters among relations.

References

  1. D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In KDD, pages 26--35, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multi. Anal., 11(4):581--598, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. J. Aldous. Exchangeability and related topics, chapter 1. Springer, 1985.Google ScholarGoogle Scholar
  4. K. S. Azoury and M. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn., 43:211--246, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SDM. SIAM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. J. Mach. Learn. Res., 6:1705--1749, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  7. L. Bottou. Online algorithms and stochastic approximations. In Online Learning and Neural Networks. Cambridge UP, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Bottou and Y. LeCun. Large scale online learning. In NIPS, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UP, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Bregman. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math and Math. Phys., 7:200--217, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms, and Applications. Oxford UP, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. P. Chen. The entity-relationship model: Toward a unified view of data. ACM Trans. Data. Sys., 1(1):9--36, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Cohn and T. Hofmann. The missing link-a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.Google ScholarGoogle Scholar
  14. M. Collins, S. Dasgupta, and R. E. Schapire. A generalization of principal component analysis to the exponential family. In NIPS, 2001.Google ScholarGoogle Scholar
  15. J. Forster and M. K. Warmuth. Relative expected instantaneous loss bounds. In COLT, pages 90--99, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. H. Golub and C. F. V. Loan. Matrix Computions. John Hopkins UP, 3rd edition, 1996.Google ScholarGoogle Scholar
  17. G. J. Gordon. Generalized2 linear2 models. In NIPS, 2002.Google ScholarGoogle Scholar
  18. D. Harman. Overview of the 2nd text retrieval conference (TREC-2). Inf. Process. Manag., 31(3):271--289, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Internet Movie Database Inc. IMDB interfaces. http://www.imdb.com/interfaces, Jan. 2007.Google ScholarGoogle Scholar
  21. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. D. Leeuw. Block relaxation algorithms in statistics, 1994.Google ScholarGoogle Scholar
  23. B. Long, Z. M. Zhang, X. Wú;, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, pages 585--592, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Long, Z. M. Zhang, X. Wu, and P. S. Yu. Relational clustering by symmetric convex coding. In ICML, pages 569--576, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Long, Z. M. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In KDD, pages 470--479, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. McCullagh and J. Nelder. Generalized Linear Models. Chapman and Hall: London., 1989.Google ScholarGoogle ScholarCross RefCross Ref
  27. Netflix. Netflix prize dataset. http://www.netflixprize.com, Jan. 2007.Google ScholarGoogle Scholar
  28. J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  29. F. Pereira and G. Gordon. The support vector decomposition machine. In ICML, pages 689--696, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. D. M. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In ICML, pages 713--719, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. Technical Report CMU-ML-08-109, Machine Learning Department, Carnegie Mellon University, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  32. N. Srebro and T. Jaakola. Weighted low-rank approximations. In ICML, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Srebro, J. D. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In NIPS, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Stoica and Y. Selen. Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher. Sig. Process. Mag., IEEE, 21(1):112--114, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  35. K. Yu, S. Yu, and V. Tresp. Multi-label informed latent semantic indexing. In SIGIR, pages 258--265, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Yu, K. Yu, V. Tresp, H.-P. Kriegel, and M. Wu. Supervised probabilistic principal component analysis. In KDD, pages 464--473, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Zhu, K. Yu, Y. Chi, and Y. Gong. Combining content and link for classification using matrix factorization. In SIGIR, pages 487--494, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Relational learning via collective matrix factorization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2008
          1116 pages
          ISBN:9781605581934
          DOI:10.1145/1401890
          • General Chair:
          • Ying Li,
          • Program Chairs:
          • Bing Liu,
          • Sunita Sarawagi

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 August 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '08 Paper Acceptance Rate118of593submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader