skip to main content
10.1145/1557914.1557933acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

The scalable hyperlink store

Published:29 June 2009Publication History

ABSTRACT

This paper describes the Scalable Hyperlink Store, a distributed in-memory "database" for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new link-based ranking algorithms. Previous work on specialized hyperlink databases focused on finding efficient compression algorithms for web graphs. By contrast, this work focuses on the systems issues of building such a database. Specifically, it describes how to build a hyperlink database that is fast, scalable, fault-tolerant, and incrementally updateable.

References

  1. M. Adler and M. Mitzenmacher.Towards Compressing Web Graphs.In 11th IEEE Data Compression Conference, March 2001, pages 203--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web, 2(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian.The Connectivity Server: fast access to linkage information on the Web.In 7th International World Wide Web Conference,April 1998, pages 469--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Boldi and S. Vigna.The WebGraph Framework I: Compression Techniques.In 13th International World Wide Web Conference,May 2004, pages 595--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Boldi and S. Vigna.The WebGraph Framework II: Codes For The World-Wide Web. In 14th IEEE Data Compression Conference, March 2004, page 528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-Wise Independent Permutations. Journal of Computer and System Sciences 60(3):630--659, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener.Graph structure in the Web. In 9th International World Wide Web Conference,May 2000, pages 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Buehrer and K. Chellapilla.A Scalable Pattern Mining Approach to Web Graph Compression with Communities.In 1st Intl. Conf. on Web Search and Data Mining, February 2008, pages 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the Wisdom of the Crowds for Keyword Generation. In 17th International World Wide Web Conference,April 2008, pages 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Gollapudi, M. Najork, and R. Panigrahy. Using Bloom Filters to Speed Up HITS-like Ranking Algorithms. In 5th Workshop on Algorithms and Models for the Web--Graph, December 2007, pages 195--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In 9th Annual ACM--SIAM Symposium on Discrete Algorithms, January 1998, pages 668--677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for Emerging Cyber-Communities. In 8th International World Wide Web Conference,May 1999, pages 11--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks and ISDN Systems, 33(1--6):387--401, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Marchiori. The quest for correct information on the Web: Hyper search engines. In Computer Networks and ISDN Systems, 29(8--13):1225--1236, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Najork. System and method for maintaining a distributed database of hyperlinks. US Patent 7340467; filed April 2003, issued March 2008.Google ScholarGoogle Scholar
  17. M. Najork. Comparing the Effectiveness of HITS and SALSA. In 16th ACM Conference on Information and Knowledge Management, November 2007, pages 157--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Najork and N. Craswell. Efficient and Effective Link Analysis with Precomputed SALSA Maps. In 17th ACM Conference on Information and Knowledge Management,October 2008, pages 53--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Najork, S. Gollapudi, and R. Panigrahy. Less is More: Sampling the Neighborhood Graph Makes SALSA Better and Faster. In 2nd ACM International Conference on Web Search and Data Mining, February 2009, pages 242--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Najork and A. Heydon. High-Performance Web Crawling. In Handbook of Massive Data Sets,Kluwer Academic Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Najork, H. Zaragoza, and M. Taylor. HITS on the Web: How does it Compare? In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2007, pages 471--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  23. K. Randall, R. Stata, R. Wickremesinghe, and J. Wiener. The Link Database: Fast Access to Graphs of the Web. In 12th IEEE Data Compression Conference, April 2002, pages 122--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Suel and J. Yuan. Compressing the Graph Structure of the Web. In 11th IEEE Data Compression Conference, March 2001, pages 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. Witten, A. Moffat, and T. Bell. Managing Gigabytes (2nd edition).Academic Press, 1999.Google ScholarGoogle Scholar

Index Terms

  1. The scalable hyperlink store

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermedia
      June 2009
      410 pages
      ISBN:9781605584867
      DOI:10.1145/1557914

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate378of1,158submissions,33%

      Upcoming Conference

      HT '24
      35th ACM Conference on Hypertext and Social Media
      September 10 - 13, 2024
      Poznan , Poland

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader