skip to main content
10.1145/1150402.1150479acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Sampling from large graphs

Published:20 August 2006Publication History

ABSTRACT

Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

References

  1. M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Data Compression Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. M. Airoldi and K. M. Carley. Sampling algorithms for pure network topologies. SIGKDD Explor., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  4. X. A. Dimitropoulos and G. F. Riley. Creating realistic BGP models. IEEE/ACM MASCOTS, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251--262, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Feder and R. Motwani. Clique partitions, graph compression and speeding-up algorithms. In Journal of Computer And System Sciences, volume 51, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. C. Gilbert and K. Levchenko. Compressing network graphs. In LinkKDD, 2004.Google ScholarGoogle Scholar
  8. V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.-H. Cui, and A. G. Percus. Reducing large internet topologies for faster simulations. In Networking, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diamaters and possible explanations. In ACM SIGKDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. of Oregon. Route views project.Google ScholarGoogle Scholar
  11. C. R. Palmer, P. B. Gibbons, and C. Faloutsos. Anf: A fast and scalable tool for data mining in massive graphs. In SIGKDD, Edmonton, AB, Canada, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Rafiei and S. Curial. Effectively visualizing large networks through sampling. In Visualization, 2005.Google ScholarGoogle Scholar
  13. M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In Second International Semantic Web Conference, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. P. H. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: Sampling properties of networks. In PNAS, volume 102, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. Sampling techniques for large, dynamics graphs. In CIS-TR-06-01, University of Oregon, 2006.Google ScholarGoogle Scholar
  16. D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world'networks. Nature , 393:440--442, 1998.Google ScholarGoogle Scholar

Index Terms

  1. Sampling from large graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader