skip to main content
10.1145/1242572.1242610acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Google news personalization: scalable online collaborative filtering

Published:08 May 2007Publication History

ABSTRACT

Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several millionusers and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptable for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.

References

  1. G. Adomavicius, and A. Tuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. In IEEE Transactions on Knowledge And Data Engineering, Vol 17, No. 6, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation. In Journal of Machine Learning Research, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Breese, D. Heckerman, and C. Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proc. of the 14th Conf. on Uncertainty in Artifical Intelligence, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences (SEQUENCES'97), 1998, pp 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Buhler Efficient large-scale sequence comparison by locality-sensitive hashing. In Bioinformatics, Vol. 17, pp 419--428, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Charikar. Similarity Estimation Techniques from Rounding Algorithms. In Proc. of the 34th Annual ACM Symposium on Theory of Computing, STOC (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Cristianini, and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods Cambridge University Press, 1st edition (March 28, 2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Cohen. Size-Estimation Framework with Applications to Transitive Closure and Reachability. Journal of Computer and System Sciences. 55 (1997): 441--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding Interesting Associations without Support Pruning. In Proc. of the 16th Intl. Conf. on Data Engineering, (ICDE 2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proc. of the 7th Symposium on Operating System Design and Implementation, (OSDI 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proc. of the 20th ACM Annual Symposium on Computational Geometry (SOCG 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Dean, and S. Ghemawat., "MapReduce: Simplified Data Processing on Large Clusters.", In Proc. of 6th Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proc. of the 25th Intl. Conf. on Very Large Data Bases, VLDB(1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hofmann. Latent Semantic Models for Collaborative Filtering In ACM Transactions on Information Systems, 2004, Vol 22(1), pp. 89--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Indyk. A Small Approximately Min-Wise Independent Family of Hash Functions. In Proc. 10th Symposium on Discrete Algorithms, SODA (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Indyk and R. Motwani. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. In Proc. of the 30th Annual ACM Symposium on Theory of Computing, 1998, pp 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Marlin, and R. Zemel. The multiple multiplicative factor model for collaborative filtering. In ACM Intl. Conf. Proceeding Series, Vol. 69, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl. GroupLens: An Open Architecture for Collaborative Filtering of Netnews, In Proc. of Computer Supported Cooperative Work Conf., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Application of Dimensionality Reduction in Recommender Systems -- A Case Study In Proc. of the ACM WebKDD Workshop, 2000.Google ScholarGoogle Scholar
  21. B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proc. of the 10th Intl. WWW Conf., (WWW) 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Shani, R. Brafman, and D. Heckerman, An MDP-Based Recommender System. In Proc. of the 18th Conf. Uncertainty in Artificial Intelligence, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Yu, X. Xu, J. Tao, M. Ester, and H. Kriegel. Instance Selection Techniques for Memory-Based Collaborative Filtering. In Proc. of the Second Siam Intl. Conf. on Data Mining, (SDM) 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Google news personalization: scalable online collaborative filtering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '07: Proceedings of the 16th international conference on World Wide Web
      May 2007
      1382 pages
      ISBN:9781595936547
      DOI:10.1145/1242572

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 May 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

      Upcoming Conference

      WWW '24
      The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore , Singapore

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader