research-article

Empirical comparison of algorithms for network community detection

Authors:
Jure Leskovec

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Kevin J. Lang

Yahoo! Research, Santa Clara, CA, USA

Yahoo! Research, Santa Clara, CA, USA
View Profile

,
Michael Mahoney

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

WWW '10: Proceedings of the 19th international conference on World wide webApril 2010Pages 631–640https://doi.org/10.1145/1772690.1772755

Published:26 April 2010Publication History

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 631–640

ABSTRACT

Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.

In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

References

Supporting website. http://snap.stanford.edu/ncp/.Google Scholar
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 475--486, 2006. Google ScholarDigital Library
R. Andersen and K. Lang. Communities from seed sets. In WWW '06: Proceedings of the 15th International Conference on World Wide Web, pages 223--232, 2006. Google ScholarDigital Library
S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. In STOC '04: Proceedings of the 36th annual ACM Symposium on Theory of Computing, pages 222--231, 2004. Google ScholarDigital Library
S. Burer and R. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming (series B), 95(2):329--357, 2003.Google Scholar
F. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997.Google Scholar
A. Clauset. Finding local community structure in networks. Physical Review E, 72:026132, 2005.Google ScholarCross Ref
A. Clauset, M. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.Google ScholarCross Ref
I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944--1957, 2007. Google ScholarDigital Library
G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 150--160, 2000. Google ScholarDigital Library
G. Flake, R. Tarjan, and K.Tsioutsiouliklis. Graph clustering and minimum cut trees. Internet Mathematics, 1(4):385--408, 2003.Google ScholarCross Ref
S. Fortunato. Community detection in graphs. arXiv:0906.0612, June 2009.Google Scholar
S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America, 104(1):36--41, 2007.Google ScholarCross Ref
M. Gaertler. Clustering. In U. Brandes and T. Erlebach, editors, Network Analysis: Methodological Foundations, pages 178--215. Springer, 2005.Google ScholarCross Ref
G. Gallo, M. Grigoriadis, and R. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1):30--55, 1989. Google ScholarDigital Library
M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12):7821--7826, 2002.Google ScholarCross Ref
R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70:025101, 2004.Google ScholarCross Ref
R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Journal of the ACM, 51(3):497--515, 2004. Google ScholarDigital Library
B. Karrer, E. Levina, and M. Newman. Robustness of community structure in networks. Physical Review E, 77:046119, 2008.Google ScholarCross Ref
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20:359--392, 1998. Google ScholarDigital Library
A. Lancichinetti and S. Fortunato. Community detection algorithms: a comparative analysis. arXiv:0908.1062, August 2009.Google Scholar
K. Lang and S. Rao. Finding near-optimal cuts: an empirical evaluation. In SODA '93: Proceedings of the 4th annual ACM-SIAM Symposium on Discrete algorithms, pages 212--221, 1993. Google ScholarDigital Library
K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. In IPCO '04: Proceedings of the 10th International IPCO Conference on Integer Programming and Combinatorial Optimization, pages 325--337, 2004.Google ScholarCross Ref
T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In FOCS '88: Proceedings of the 28th Annual Symposium on Foundations of Computer Science, pages 422--431, 1988. Google ScholarDigital Library
T. Leighton and S. Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM, 46(6):787--832, 1999. Google ScholarDigital Library
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. arXiv:0810.1355, October 2008.Google Scholar
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08: Proceedings of the 17th International Conference on World Wide Web, pages 695--704, 2008. Google ScholarDigital Library
M. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103(23):8577--8582, 2006.Google ScholarCross Ref
M. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.Google ScholarCross Ref
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658--2663, 2004.Google ScholarCross Ref
S. Schaeffer. Graph clustering. Computer Science Review, 1(1):27--64, 2007. Google ScholarDigital Library
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations of Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000. Google ScholarDigital Library
D. Spielman and S.-H. Teng. Spectral partitioning works: Planar graphs and finite element meshes. In FOCS '96: Proceedings of the 37th Annual IEEE Symposium on Foundations of Computer Science, pages 96--107, 1996. Google ScholarDigital Library
S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM '05: Proceedings of the 5th SIAM International Conference on Data Mining, pages 76--84, 2005.Google ScholarCross Ref

Index Terms

Empirical comparison of algorithms for network community detection
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

COMPARISON AND SELECTION OF OBJECTIVE FUNCTIONS IN MULTIOBJECTIVE COMMUNITY DETECTION

Detecting communities of complex networks has been an effective way to identify substructures that could correspond to important functions. Conventional approaches usually consider community detection as a single-objective optimization problem, which may ...
Read More
Statistical properties of community structure in large social and information networks
WWW '08: Proceedings of the 17th international conference on World Wide Web

A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as ...
Read More
A Community Detection Algorithm Using Random Walk
Computational Data and Social Networks
Abstract
Community structure plays an essential role in analyzing networks. Various algorithms exist to find the community structure that scores high on a graph clustering index called Modularity. In divisive community structure algorithms, initially, all ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690
General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India
Copyright © 2010 International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community structure
conductance
flow-based methods
graph partitioning
spectral methods
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 675
  Total Citations
  View Citations
- 5,108
  Total Downloads
- Downloads (Last 12 months)196
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

Empirical comparison of algorithms for network community detection

WWW '10: Proceedings of the 19th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

COMPARISON AND SELECTION OF OBJECTIVE FUNCTIONS IN MULTIOBJECTIVE COMMUNITY DETECTION

Statistical properties of community structure in large social and information networks

A Community Detection Algorithm Using Random Walk