ABSTRACT
Low-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of the model parameters, a procedure that can be performed efficiently even on very large datasets. However, unless the regularization parameters are tuned carefully, this approach is prone to overfitting because it finds a single point estimate of the parameters. In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. We show that Bayesian PMF models can be efficiently trained using Markov chain Monte Carlo methods by applying them to the Netflix dataset, which consists of over 100 million movie ratings. The resulting models achieve significantly higher prediction accuracy than PMF models trained using MAP estimation.
- Hinton, G. E., & van Camp, D. (1993). Keeping the neural networks simple by minimizing the description length of the weights. COLT (pp. 5--13). Google ScholarDigital Library
- Hofmann, T. (1999). Probabilistic latent semantic analysis. Proceedings of the 15th Conference on Uncertainty in AI (pp. 289--296). San Fransisco, California: Morgan Kaufmann. Google ScholarDigital Library
- Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183. Google ScholarDigital Library
- Lim, Y. J., & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. Proceedings of KDD Cup and Workshop.Google Scholar
- Marlin, B. (2004). Modeling user rating profiles for collaborative filtering. In S. Thrun, L. Saul and B. Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.Google Scholar
- Marlin, B., & Zemel, R. S. (2004). The multiple multiplicative factor model for collaborative filtering. Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada. ACM. Google ScholarDigital Library
- Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (Technical Report CRG-TR-93-1). Department of Computer Science, University of Toronto.Google Scholar
- Nowlan, S. J., & Hinton, G. E. (1992). Simplifying neural networks by soft weight-sharing. Neural Computation, 4, 473--493. Google ScholarDigital Library
- Raiko, T., Ilin, A., & Karhunen, J. (2007). Principal component analysis for large scale problems with lots of missing values. ECML (pp. 691--698). Google ScholarDigital Library
- Rennie, J. D. M., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany (pp. 713--719). ACM. Google ScholarDigital Library
- Salakhutdinov, R., & Mnih, A. (2008). Probabilistic matrix factorization. Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press.Google Scholar
- Srebro, N., & Jaakkola, T. (2003). Weighted low-rank approximations. Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), Washington, DC, USA (pp. 720--727). AAAI Press.Google Scholar
Index Terms
- Bayesian probabilistic matrix factorization using Markov chain Monte Carlo
Recommendations
Pseudo-marginal Markov Chain Monte Carlo for Nonnegative Matrix Factorization
A pseudo-marginal Markov chain Monte Carlo (PMCMC) method is proposed for nonnegative matrix factorization (NMF). The sampler jointly simulates the joint posterior distribution for the nonnegative matrices and the matrix dimensions which indicate the ...
Variational Bayesian Monte Carlo
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing SystemsMany probabilistic models of interest in scientific computing and machine learning have expensive, black-box likelihoods that prevent the application of standard techniques for Bayesian inference, such as MCMC, which would require access to the gradient ...
Bayesian matrix co-factorization: variational algorithm and Cramér-Rao bound
ECML PKDD'11: Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part IIIMatrix factorization is a popular method for collaborative prediction, where unknown ratings are predicted by user and item factor matrices which are determined to approximate a user-item matrix as their product. Bayesian matrix factorization is ...
Comments