ABSTRACT
Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several millionusers and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptable for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
- G. Adomavicius, and A. Tuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. In IEEE Transactions on Knowledge And Data Engineering, Vol 17, No. 6, June 2005. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation. In Journal of Machine Learning Research, 2003. Google ScholarDigital Library
- J. Breese, D. Heckerman, and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proc. of the 14th Conf. on Uncertainty in Artifical Intelligence, July 1998. Google ScholarDigital Library
- A. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences (SEQUENCES'97), 1998, pp 21--29. Google ScholarDigital Library
- J. Buhler Efficient large-scale sequence comparison by locality-sensitive hashing. In Bioinformatics, Vol. 17, pp 419--428, 2001.Google ScholarCross Ref
- M. Charikar. Similarity Estimation Techniques from Rounding Algorithms. In Proc. of the 34th Annual ACM Symposium on Theory of Computing, STOC (2002). Google ScholarDigital Library
- N. Cristianini, and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge University Press, 1st edition (March 28, 2000). Google ScholarDigital Library
- E. Cohen. Size-Estimation Framework with Applications to Transitive Closure and Reachability. Journal of Computer and System Sciences. 55 (1997): 441--453. Google ScholarDigital Library
- E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding Interesting Associations without Support Pruning. In Proc. of the 16th Intl. Conf. on Data Engineering, (ICDE 2000). Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proc. of the 7th Symposium on Operating System Design and Implementation, (OSDI 2006). Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proc. of the 20th ACM Annual Symposium on Computational Geometry (SOCG 2004). Google ScholarDigital Library
- J. Dean, and S. Ghemawat., "MapReduce: Simplified Data Processing on Large Clusters.", In Proc. of 6th Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, 2004. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proc. of the 25th Intl. Conf. on Very Large Data Bases, VLDB(1999). Google ScholarDigital Library
- T. Hofmann. Latent Semantic Models for Collaborative Filtering In ACM Transactions on Information Systems, 2004, Vol 22(1), pp. 89--115. Google ScholarDigital Library
- P. Indyk. A Small Approximately Min-Wise Independent Family of Hash Functions. In Proc. 10th Symposium on Discrete Algorithms, SODA (1999). Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. In Proc. of the 30th Annual ACM Symposium on Theory of Computing, 1998, pp 604--613. Google ScholarDigital Library
- B. Marlin, and R. Zemel. The multiple multiplicative factor model for collaborative filtering. In ACM Intl. Conf. Proceeding Series, Vol. 69, 2004. Google ScholarDigital Library
- R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1985. Google ScholarDigital Library
- P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews, In Proc. of Computer Supported Cooperative Work Conf., 1994. Google ScholarDigital Library
- B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Application of Dimensionality Reduction in Recommender Systems -- A Case Study In Proc. of the ACM WebKDD Workshop, 2000.Google Scholar
- B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proc. of the 10th Intl. WWW Conf., (WWW) 2001. Google ScholarDigital Library
- G. Shani, R. Brafman, and D. Heckerman, An MDP-Based Recommender System. In Proc. of the 18th Conf. Uncertainty in Artificial Intelligence, Aug. 2002. Google ScholarDigital Library
- K. Yu, X. Xu, J. Tao, M. Ester, and H. Kriegel. Instance Selection Techniques for Memory-Based Collaborative Filtering. In Proc. of the Second Siam Intl. Conf. on Data Mining, (SDM) 2002.Google ScholarCross Ref
Index Terms
- Google news personalization: scalable online collaborative filtering
Recommendations
Personalized news recommendation based on click behavior
IUI '10: Proceedings of the 15th international conference on Intelligent user interfacesOnline news reading has become very popular as the web provides access to news articles from millions of sources around the world. A key challenge of news websites is to help users find the articles that are interesting to read. In this paper, we ...
Do Not Read the Same News! Enhancing Diversity and Personalization of News Recommendation
WWW '22: Companion Proceedings of the Web Conference 2022Personalized news recommendation by machine is one of the widely studied areas. As the production of news articles increases and topics are diversified, it is impractical to read all the articles available to users. Therefore, the purpose of the news ...
Generating semantically enriched user profiles for Web personalization
Traditional collaborative filtering generates recommendations for the active user based solely on ratings of items by other users. However, most businesses today have item ontologies that provide a useful source of content descriptors that can be used ...
Comments