Article

Sampling from large graphs

Authors:
Jure Leskovec

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Christos Faloutsos

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006Pages 631–636https://doi.org/10.1145/1150402.1150479

Published:20 August 2006Publication History

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 631–636

ABSTRACT

Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

References

M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Data Compression Conference, 2001. Google ScholarDigital Library
E. M. Airoldi and K. M. Carley. Sampling algorithms for pure network topologies. SIGKDD Explor., 2005. Google ScholarDigital Library
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.Google ScholarCross Ref
X. A. Dimitropoulos and G. F. Riley. Creating realistic BGP models. IEEE/ACM MASCOTS, 2003.Google ScholarCross Ref
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251--262, 1999. Google ScholarDigital Library
T. Feder and R. Motwani. Clique partitions, graph compression and speeding-up algorithms. In Journal of Computer And System Sciences, volume 51, 1995. Google ScholarDigital Library
A. C. Gilbert and K. Levchenko. Compressing network graphs. In LinkKDD, 2004.Google Scholar
V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.-H. Cui, and A. G. Percus. Reducing large internet topologies for faster simulations. In Networking, 2005. Google ScholarDigital Library
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diamaters and possible explanations. In ACM SIGKDD, 2005. Google ScholarDigital Library
U. of Oregon. Route views project.Google Scholar
C. R. Palmer, P. B. Gibbons, and C. Faloutsos. Anf: A fast and scalable tool for data mining in massive graphs. In SIGKDD, Edmonton, AB, Canada, 2002. Google ScholarDigital Library
D. Rafiei and S. Curial. Effectively visualizing large networks through sampling. In Visualization, 2005.Google Scholar
M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In Second International Semantic Web Conference, 2003.Google ScholarDigital Library
M. P. H. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. In PNAS, volume 102, 2005.Google ScholarCross Ref
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. Sampling techniques for large, dynamics graphs. In CIS-TR-06-01, University of Oregon, 2006.Google Scholar
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. Nature , 393:440--442, 1998.Google Scholar

Index Terms

Sampling from large graphs
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Sampling from Large Graphs with a Reservoir
NBIS '14: Proceedings of the 2014 17th International Conference on Network-Based Information Systems

Sampling is a process of choosing a suitable representative subset from a population and uniformity is a basic requirement of representative ness. A sampling process produces a uniform random sample when all possible samples of the same size have the ...
Read More
A Two -- Stage Reservoir Sampling Algorithm for Massive Network Graphs
NBIS '15: Proceedings of the 2015 18th International Conference on Network-Based Information Systems

Independent and uniform random sampling from massive network graphs is still a great challenge due to the lack of a sample frame or a full list of all elements to be sampled from. The state -- of -- the -- art solution to this challenge is random walk ...
Read More
Albatross sampling: robust and effective hybrid vertex sampling for social graphs
HotPlanet '11: Proceedings of the 3rd ACM international workshop on MobiArch

Nowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph mining
graph sampling
scaling laws
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 773
  Total Citations
  View Citations
- 5,005
  Total Downloads
- Downloads (Last 12 months)265
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sampling from large graphs

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sampling from Large Graphs with a Reservoir

A Two -- Stage Reservoir Sampling Algorithm for Massive Network Graphs

Albatross sampling: robust and effective hybrid vertex sampling for social graphs