The Web as a Graph: Measurements, Models, and Methods

Kleinberg, Jon M.; Kumar, Ravi; Raghavan, Prabhakar; Rajagopalan, Sridhar; Tomkins, Andrew S.

doi:10.1007/3-540-48686-0_1

Jon M. Kleinberg⁶,
Ravi Kumar⁷,
Prabhakar Raghavan⁷,
Sridhar Rajagopalan⁷ &
…
Andrew S. Tomkins⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1627))

Included in the following conference series:

International Computing and Combinatorics Conference

1784 Accesses
312 Citations
3 Altmetric

Abstract

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons — mathematical, sociological, and commercial — for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Weiner. The Lorel Query language for semistructured data. Intl. J. on Digital Libraries, 1(1):68–88, 1997.
Google Scholar
R. Agrawal and R. Srikanth. Fast algorithms for mining association rules. Proc. VLDB, 1994.
Google Scholar
G. O. Arocena, A. O. Mendelzon, G. A. Mihaila. Applications of a Web query language. Proc. 6th WWW Conf., 1997.
Google Scholar
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998.
Google Scholar
K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. Proc. ACM SIGIR, 1998.
Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Proc. 7th WWW Conf., 1998.
Google Scholar
B. Bollobás. Random Graphs, Academic Press, 1985.
Google Scholar
J. Carriére and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity. Proc. 6th WWW Conf., 1997.
Google Scholar
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th WWW Conf., 1998.
Google Scholar
S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on hypertext IR, 1998.
Google Scholar
S. Chakrabarti and B. Dom and P. Indyk. Enhanced hypertext classification using hyperlinks. Proc. ACM SIGMOD, 1998.
Google Scholar
H. T. Davis. The Analysis of Economic Time Series. Principia press, 1941.
Google Scholar
R. Downey, M. Fellows. Parametrized Computational Feasibility. In Feasible Mathematics II, P. Clote and J. Remmel, eds., Birkhauser, 1994.
Google Scholar
L. Egghe, R. Rousseau, Introduction to Informetrics, Elsevier, 1990.
Google Scholar
D. Florescu, A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3): 59–74, 1998.
Article Google Scholar
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.
Article Google Scholar
N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2(2), 1997.
Google Scholar
G. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University Press, 1989.
Google Scholar
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets, 1998.
Google Scholar
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10–25, 1963.
Article Google Scholar
J. Kleinberg. Authoritative sources in a hyperlinked environment, J. of the ACM, 1999, to appear. Also appears as IBM Research Report RJ 10076(91892) May 1997.
Google Scholar
D. Konopnicki and O. Shmueli. Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems, 1998.
Google Scholar
S. R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Trawling emerging cyber-communities automatically. Proc. 8th WWW Conf., 1999.
Google Scholar
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative approach to querying and restructuring the World Wide Web. Post-ICDE Workshop on RIDE, 1996.
Google Scholar
R. Larson. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting of the American Soc. Info. Sci., 1996.
Google Scholar
A. J. Lotka. The frequency distribution of scientific productivity. J. of the Washington Acad. of Sci., 16:317, 1926.
Google Scholar
A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web, J. of Digital Libraries 1(1):68–88, 1997.
Google Scholar
A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Comp., 24(6):1235–1258, 1995.
Article MATH MathSciNet Google Scholar
E. Spertus. ParaSite: Mining structural information on the Web. Proc. 6th WWW Conf., 1997.
Google Scholar
D. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query Flocks: A generalization of association rule mining. Proc. ACM SIGMOD, 1998.
Google Scholar
G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, 14853
Jon M. Kleinberg
IBM Almaden Research Center K53/B1, 650 Harry Road, San Jose, CA, 95120
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan & Andrew S. Tomkins

Authors

Jon M. Kleinberg
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Prabhakar Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
Sridhar Rajagopalan
View author publications
You can also search for this author in PubMed Google Scholar
Andrew S. Tomkins
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and System Engineering Faculty of Science and Engineering, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Takano Asano
Department of Information Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Hideki Imai
Academia Sinica, Institute of Information Science, Nankang, Taipei, Taiwan
D. T. Lee
Department of Computer Science Faculty of Engineering, Gunma University, 1-5-1 Tenjin-cho, Kiryu, Gunma, 376-8515, Japan
Shin-ichi Nakano
IBM Tokyo Research Laboratory, 1623-14, Shimo-Tsuruma, Yamato Kanagawa, 242-0001, Japan
Takeshi Tokuyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S. (1999). The Web as a Graph: Measurements, Models, and Methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, Si., Tokuyama, T. (eds) Computing and Combinatorics. COCOON 1999. Lecture Notes in Computer Science, vol 1627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48686-0_1

Download citation

DOI: https://doi.org/10.1007/3-540-48686-0_1
Published: 25 June 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66200-6
Online ISBN: 978-3-540-48686-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics