Skip to main content

The Web as a Graph: Measurements, Models, and Methods

  • Conference paper
  • First Online:
Book cover Computing and Combinatorics (COCOON 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1627))

Included in the following conference series:

Abstract

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons — mathematical, sociological, and commercial — for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Weiner. The Lorel Query language for semistructured data. Intl. J. on Digital Libraries, 1(1):68–88, 1997.

    Google Scholar 

  2. R. Agrawal and R. Srikanth. Fast algorithms for mining association rules. Proc. VLDB, 1994.

    Google Scholar 

  3. G. O. Arocena, A. O. Mendelzon, G. A. Mihaila. Applications of a Web query language. Proc. 6th WWW Conf., 1997.

    Google Scholar 

  4. K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998.

    Google Scholar 

  5. K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. Proc. ACM SIGIR, 1998.

    Google Scholar 

  6. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Proc. 7th WWW Conf., 1998.

    Google Scholar 

  7. B. Bollobás. Random Graphs, Academic Press, 1985.

    Google Scholar 

  8. J. Carriére and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity. Proc. 6th WWW Conf., 1997.

    Google Scholar 

  9. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th WWW Conf., 1998.

    Google Scholar 

  10. S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on hypertext IR, 1998.

    Google Scholar 

  11. S. Chakrabarti and B. Dom and P. Indyk. Enhanced hypertext classification using hyperlinks. Proc. ACM SIGMOD, 1998.

    Google Scholar 

  12. H. T. Davis. The Analysis of Economic Time Series. Principia press, 1941.

    Google Scholar 

  13. R. Downey, M. Fellows. Parametrized Computational Feasibility. In Feasible Mathematics II, P. Clote and J. Remmel, eds., Birkhauser, 1994.

    Google Scholar 

  14. L. Egghe, R. Rousseau, Introduction to Informetrics, Elsevier, 1990.

    Google Scholar 

  15. D. Florescu, A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3): 59–74, 1998.

    Article  Google Scholar 

  16. E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.

    Article  Google Scholar 

  17. N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2(2), 1997.

    Google Scholar 

  18. G. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University Press, 1989.

    Google Scholar 

  19. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets, 1998.

    Google Scholar 

  20. M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10–25, 1963.

    Article  Google Scholar 

  21. J. Kleinberg. Authoritative sources in a hyperlinked environment, J. of the ACM, 1999, to appear. Also appears as IBM Research Report RJ 10076(91892) May 1997.

    Google Scholar 

  22. D. Konopnicki and O. Shmueli. Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems, 1998.

    Google Scholar 

  23. S. R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Trawling emerging cyber-communities automatically. Proc. 8th WWW Conf., 1999.

    Google Scholar 

  24. L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative approach to querying and restructuring the World Wide Web. Post-ICDE Workshop on RIDE, 1996.

    Google Scholar 

  25. R. Larson. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting of the American Soc. Info. Sci., 1996.

    Google Scholar 

  26. A. J. Lotka. The frequency distribution of scientific productivity. J. of the Washington Acad. of Sci., 16:317, 1926.

    Google Scholar 

  27. A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web, J. of Digital Libraries 1(1):68–88, 1997.

    Google Scholar 

  28. A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Comp., 24(6):1235–1258, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  29. E. Spertus. ParaSite: Mining structural information on the Web. Proc. 6th WWW Conf., 1997.

    Google Scholar 

  30. D. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query Flocks: A generalization of association rule mining. Proc. ACM SIGMOD, 1998.

    Google Scholar 

  31. G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S. (1999). The Web as a Graph: Measurements, Models, and Methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, Si., Tokuyama, T. (eds) Computing and Combinatorics. COCOON 1999. Lecture Notes in Computer Science, vol 1627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48686-0_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-48686-0_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66200-6

  • Online ISBN: 978-3-540-48686-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics