ABSTRACT
Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.
In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.
- Supporting website. http://snap.stanford.edu/ncp/.Google Scholar
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS '06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 475--486, 2006. Google ScholarDigital Library
- R. Andersen and K. Lang. Communities from seed sets. In WWW '06: Proceedings of the 15th International Conference on World Wide Web, pages 223--232, 2006. Google ScholarDigital Library
- S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. In STOC '04: Proceedings of the 36th annual ACM Symposium on Theory of Computing, pages 222--231, 2004. Google ScholarDigital Library
- S. Burer and R. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming (series B), 95(2):329--357, 2003.Google Scholar
- F. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1997.Google Scholar
- A. Clauset. Finding local community structure in networks. Physical Review E, 72:026132, 2005.Google ScholarCross Ref
- A. Clauset, M. Newman, and C. Moore. Finding community structure in very large networks. Physical Review E, 70:066111, 2004.Google ScholarCross Ref
- I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944--1957, 2007. Google ScholarDigital Library
- G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In KDD '00: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 150--160, 2000. Google ScholarDigital Library
- G. Flake, R. Tarjan, and K.Tsioutsiouliklis. Graph clustering and minimum cut trees. Internet Mathematics, 1(4):385--408, 2003.Google ScholarCross Ref
- S. Fortunato. Community detection in graphs. arXiv:0906.0612, June 2009.Google Scholar
- S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America, 104(1):36--41, 2007.Google ScholarCross Ref
- M. Gaertler. Clustering. In U. Brandes and T. Erlebach, editors, Network Analysis: Methodological Foundations, pages 178--215. Springer, 2005.Google ScholarCross Ref
- G. Gallo, M. Grigoriadis, and R. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1):30--55, 1989. Google ScholarDigital Library
- M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12):7821--7826, 2002.Google ScholarCross Ref
- R. Guimerà, M. Sales-Pardo, and L. Amaral. Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70:025101, 2004.Google ScholarCross Ref
- R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Journal of the ACM, 51(3):497--515, 2004. Google ScholarDigital Library
- B. Karrer, E. Levina, and M. Newman. Robustness of community structure in networks. Physical Review E, 77:046119, 2008.Google ScholarCross Ref
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20:359--392, 1998. Google ScholarDigital Library
- A. Lancichinetti and S. Fortunato. Community detection algorithms: a comparative analysis. arXiv:0908.1062, August 2009.Google Scholar
- K. Lang and S. Rao. Finding near-optimal cuts: an empirical evaluation. In SODA '93: Proceedings of the 4th annual ACM-SIAM Symposium on Discrete algorithms, pages 212--221, 1993. Google ScholarDigital Library
- K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. In IPCO '04: Proceedings of the 10th International IPCO Conference on Integer Programming and Combinatorial Optimization, pages 325--337, 2004.Google ScholarCross Ref
- T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In FOCS '88: Proceedings of the 28th Annual Symposium on Foundations of Computer Science, pages 422--431, 1988. Google ScholarDigital Library
- T. Leighton and S. Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM, 46(6):787--832, 1999. Google ScholarDigital Library
- J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. arXiv:0810.1355, October 2008.Google Scholar
- J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08: Proceedings of the 17th International Conference on World Wide Web, pages 695--704, 2008. Google ScholarDigital Library
- M. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103(23):8577--8582, 2006.Google ScholarCross Ref
- M. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.Google ScholarCross Ref
- F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658--2663, 2004.Google ScholarCross Ref
- S. Schaeffer. Graph clustering. Computer Science Review, 1(1):27--64, 2007. Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transcations of Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000. Google ScholarDigital Library
- D. Spielman and S.-H. Teng. Spectral partitioning works: Planar graphs and finite element meshes. In FOCS '96: Proceedings of the 37th Annual IEEE Symposium on Foundations of Computer Science, pages 96--107, 1996. Google ScholarDigital Library
- S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM '05: Proceedings of the 5th SIAM International Conference on Data Mining, pages 76--84, 2005.Google ScholarCross Ref
Index Terms
- Empirical comparison of algorithms for network community detection
Recommendations
COMPARISON AND SELECTION OF OBJECTIVE FUNCTIONS IN MULTIOBJECTIVE COMMUNITY DETECTION
Detecting communities of complex networks has been an effective way to identify substructures that could correspond to important functions. Conventional approaches usually consider community detection as a single-objective optimization problem, which may ...
Statistical properties of community structure in large social and information networks
WWW '08: Proceedings of the 17th international conference on World Wide WebA large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as ...
A Community Detection Algorithm Using Random Walk
Computational Data and Social NetworksAbstractCommunity structure plays an essential role in analyzing networks. Various algorithms exist to find the community structure that scores high on a graph clustering index called Modularity. In divisive community structure algorithms, initially, all ...
Comments