ABSTRACT
In many applications we have a social network of people and would like to identify the members of an interesting but unlabeled group or community. We start with a small number of exemplar group members -- they may be followers of a political ideology or fans of a music genre -- and need to use those examples to discover the additional members. This problem gives rise to the seed expansion problem in community detection: given example community members, how can the social graph be used to predict the identities of remaining, hidden community members? In contrast with global community detection (graph partitioning or covering), seed expansion is best suited for identifying communities locally concentrated around nodes of interest. A growing body of work has used seed expansion as a scalable means of detecting overlapping communities. Yet despite growing interest in seed expansion, there are divergent approaches in the literature and there still isn't a systematic understanding of which approaches work best in different domains. Here we evaluate several variants and uncover subtle trade-offs between different approaches. We explore which properties of the seed set can improve performance, focusing on heuristics that one can control in practice. As a consequence of this systematic understanding we have found several opportunities for performance gains. We also consider an adaptive version in which requests are made for additional membership labels of particular nodes, such as one finds in field studies of social communities. This leads to interesting connections and contrasts with active learning and the trade-offs of exploration and exploitation. Finally, we explore topological properties of communities and seed sets that correlate with algorithm performance, and explain these empirical observations with theoretical ones. We evaluate our methods across multiple domains, using publicly available datasets with labeled, ground-truth communities.
- Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, and Robert Kleinberg. On the separability of structural classes of communities. In In KDD '12, pages 624--632. ACM, 2012. Google ScholarDigital Library
- Reid Andersen, Fan Chung, and Kevin Lang. Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pages 475--486. IEEE, 2006. Google ScholarDigital Library
- Reid Andersen and Kevin J Lang. Communities from seed sets. In In WWW '06. Google ScholarDigital Library
- James P Bagrow. Evaluating local community methods in networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(05):P05001, 2008.Google ScholarCross Ref
- Aaron Clauset. Finding local community structure in networks. Physical review E, 72(2):026132, 2005.Google Scholar
- Charles H Hubbell. An input-output approach to clique identification. Sociometry, 1965.Google ScholarCross Ref
- Glen Jeh and Jennifer Widom. Scaling personalized web search. In In WWW '03, pages 271--279. ACM, 2003. Google ScholarDigital Library
- George Karypis and Vipin Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.Google Scholar
- Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39--43, 1953.Google ScholarCross Ref
- Jon M Kleinberg. Authoritative sources in a hyperlink environment. In Proc. of ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarDigital Library
- Jure Leskovec, Kevin J Lang, and Michael Mahoney. Empirical comparison of algorithms for network community detection. In In WWW '10, pages 631--640. ACM, 2010. data source. Google ScholarDigital Library
- Feng Luo, James Z Wang, and Eric Promislow. Exploring local community structures in large networks. Web Intelligence and Agent Systems, 6(4):387--400, 2006. Google ScholarDigital Library
- Andrew Mehler and Steven Skiena. Expanding network communities from representative examples. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(2):7, 2009. Google ScholarDigital Library
- Alan Mislove, Bimal Viswanath, Krishna P Gummadi, and Peter Druschel. You are who you know: inferring user profiles in online social networks. In In WSDM '10, pages 251--260. ACM, 2010. Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. 1999.Google Scholar
- Jason Riedy, David A Bader, Karl Jiang, Pushkar Pande, and Richa Sharma. Detecting communities from given seeds in social networks. 2011.Google Scholar
- Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In In STOC '04, pages 81--90. ACM, 2004. Google ScholarDigital Library
- Ingmar Weber, Venkata R Kiran Garimella, and Alaa Batayneh. Secular vs. islamist polarization in egypt on twitter. In In ASONAM '13, pages 290--297. ACM, 2013. Google ScholarDigital Library
- Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. Overlapping community detection using seed set expansion. In In CIKM '13, pages 2099--2108. ACM, 2013. Google ScholarDigital Library
- Baoning Wu and Kumar Chellapilla. Extracting link spam using biased random walks from spam seed sets. In In AIRWeb '07, pages 37--44. ACM, 2007. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. In In MDS '12, page 3. ACM, 2012. Google ScholarDigital Library
Index Terms
- Community membership identification from small seed sets
Recommendations
Overlapping community detection using seed set expansion
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementCommunity detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities ...
Multiple Local Community Detection via High-Quality Seed Identification
Web and Big DataAbstractLocal community detection aims to find the communities that a given seed node belongs to. Most existing works on this problem are based on a very strict assumption that the seed node only belongs to a single community, but in real-world networks, ...
Defining and evaluating network communities based on ground-truth
Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Comments