skip to main content
10.1145/1772690.1772755acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Empirical comparison of algorithms for network community detection

Published:26 April 2010Publication History

ABSTRACT

Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.

In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

References

  1. Supporting website. http://snap.stanford.edu/ncp/.Google ScholarGoogle Scholar
  2. R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 475--486, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Andersen and K. Lang. Communities from seed sets. In WWW '06: Proceedings of the 15th International Conference on World Wide Web, pages 223--232, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. In STOC '04: Proceedings of the 36th annual ACM Symposium on Theory of Computing, pages 222--231, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Burer and R. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming (series B), 95(2):329--357, 2003.Google ScholarGoogle Scholar
  6. F. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997.Google ScholarGoogle Scholar
  7. A. Clauset. Finding local community structure in networks. Physical Review E, 72:026132, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Clauset, M. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944--1957, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 150--160, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Flake, R. Tarjan, and K.Tsioutsiouliklis. Graph clustering and minimum cut trees. Internet Mathematics, 1(4):385--408, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Fortunato. Community detection in graphs. arXiv:0906.0612, June 2009.Google ScholarGoogle Scholar
  13. S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America, 104(1):36--41, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  14. M. Gaertler. Clustering. In U. Brandes and T. Erlebach, editors, Network Analysis: Methodological Foundations, pages 178--215. Springer, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. G. Gallo, M. Grigoriadis, and R. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1):30--55, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12):7821--7826, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70:025101, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Journal of the ACM, 51(3):497--515, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Karrer, E. Levina, and M. Newman. Robustness of community structure in networks. Physical Review E, 77:046119, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20:359--392, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Lancichinetti and S. Fortunato. Community detection algorithms: a comparative analysis. arXiv:0908.1062, August 2009.Google ScholarGoogle Scholar
  22. K. Lang and S. Rao. Finding near-optimal cuts: an empirical evaluation. In SODA '93: Proceedings of the 4th annual ACM-SIAM Symposium on Discrete algorithms, pages 212--221, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. In IPCO '04: Proceedings of the 10th International IPCO Conference on Integer Programming and Combinatorial Optimization, pages 325--337, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  24. T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In FOCS '88: Proceedings of the 28th Annual Symposium on Foundations of Computer Science, pages 422--431, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Leighton and S. Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM, 46(6):787--832, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. arXiv:0810.1355, October 2008.Google ScholarGoogle Scholar
  27. J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08: Proceedings of the 17th International Conference on World Wide Web, pages 695--704, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103(23):8577--8582, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  29. M. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658--2663, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Schaeffer. Graph clustering. Computer Science Review, 1(1):27--64, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations of Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Spielman and S.-H. Teng. Spectral partitioning works: Planar graphs and finite element meshes. In FOCS '96: Proceedings of the 37th Annual IEEE Symposium on Foundations of Computer Science, pages 96--107, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM '05: Proceedings of the 5th SIAM International Conference on Data Mining, pages 76--84, 2005.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Empirical comparison of algorithms for network community detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub