research-article

Community membership identification from small seed sets

Authors:
Isabel M. Kloumann

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Jon M. Kleinberg

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 1366–1375https://doi.org/10.1145/2623330.2623621

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1366–1375

ABSTRACT

In many applications we have a social network of people and would like to identify the members of an interesting but unlabeled group or community. We start with a small number of exemplar group members -- they may be followers of a political ideology or fans of a music genre -- and need to use those examples to discover the additional members. This problem gives rise to the seed expansion problem in community detection: given example community members, how can the social graph be used to predict the identities of remaining, hidden community members? In contrast with global community detection (graph partitioning or covering), seed expansion is best suited for identifying communities locally concentrated around nodes of interest. A growing body of work has used seed expansion as a scalable means of detecting overlapping communities. Yet despite growing interest in seed expansion, there are divergent approaches in the literature and there still isn't a systematic understanding of which approaches work best in different domains. Here we evaluate several variants and uncover subtle trade-offs between different approaches. We explore which properties of the seed set can improve performance, focusing on heuristics that one can control in practice. As a consequence of this systematic understanding we have found several opportunities for performance gains. We also consider an adaptive version in which requests are made for additional membership labels of particular nodes, such as one finds in field studies of social communities. This leads to interesting connections and contrasts with active learning and the trade-offs of exploration and exploitation. Finally, we explore topological properties of communities and seed sets that correlate with algorithm performance, and explain these empirical observations with theoretical ones. We evaluate our methods across multiple domains, using publicly available datasets with labeled, ground-truth communities.

References

Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, and Robert Kleinberg. On the separability of structural classes of communities. In In KDD '12, pages 624--632. ACM, 2012. Google ScholarDigital Library
Reid Andersen, Fan Chung, and Kevin Lang. Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pages 475--486. IEEE, 2006. Google ScholarDigital Library
Reid Andersen and Kevin J Lang. Communities from seed sets. In In WWW '06. Google ScholarDigital Library
James P Bagrow. Evaluating local community methods in networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(05):P05001, 2008.Google ScholarCross Ref
Aaron Clauset. Finding local community structure in networks. Physical review E, 72(2):026132, 2005.Google Scholar
Charles H Hubbell. An input-output approach to clique identification. Sociometry, 1965.Google ScholarCross Ref
Glen Jeh and Jennifer Widom. Scaling personalized web search. In In WWW '03, pages 271--279. ACM, 2003. Google ScholarDigital Library
George Karypis and Vipin Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.Google Scholar
Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39--43, 1953.Google ScholarCross Ref
Jon M Kleinberg. Authoritative sources in a hyperlink environment. In Proc. of ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarDigital Library
Jure Leskovec, Kevin J Lang, and Michael Mahoney. Empirical comparison of algorithms for network community detection. In In WWW '10, pages 631--640. ACM, 2010. data source. Google ScholarDigital Library
Feng Luo, James Z Wang, and Eric Promislow. Exploring local community structures in large networks. Web Intelligence and Agent Systems, 6(4):387--400, 2006. Google ScholarDigital Library
Andrew Mehler and Steven Skiena. Expanding network communities from representative examples. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(2):7, 2009. Google ScholarDigital Library
Alan Mislove, Bimal Viswanath, Krishna P Gummadi, and Peter Druschel. You are who you know: inferring user profiles in online social networks. In In WSDM '10, pages 251--260. ACM, 2010. Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. 1999.Google Scholar
Jason Riedy, David A Bader, Karl Jiang, Pushkar Pande, and Richa Sharma. Detecting communities from given seeds in social networks. 2011.Google Scholar
Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In In STOC '04, pages 81--90. ACM, 2004. Google ScholarDigital Library
Ingmar Weber, Venkata R Kiran Garimella, and Alaa Batayneh. Secular vs. islamist polarization in egypt on twitter. In In ASONAM '13, pages 290--297. ACM, 2013. Google ScholarDigital Library
Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. Overlapping community detection using seed set expansion. In In CIKM '13, pages 2099--2108. ACM, 2013. Google ScholarDigital Library
Baoning Wu and Kumar Chellapilla. Extracting link spam using biased random walks from spam seed sets. In In AIRWeb '07, pages 37--44. ACM, 2007. Google ScholarDigital Library
Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. In In MDS '12, page 3. ACM, 2012. Google ScholarDigital Library

Index Terms

Community membership identification from small seed sets
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Overlapping community detection using seed set expansion
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Community detection is an important task in network analysis. A community (also referred to as a cluster) is a set of cohesive vertices that have more connections inside the set than outside. In many social and information networks, these communities ...
Read More
Multiple Local Community Detection via High-Quality Seed Identification
Web and Big Data
Abstract
Local community detection aims to find the communities that a given seed node belongs to. Most existing works on this problem are based on a very strict assumption that the seed node only belongs to a single community, but in real-world networks, ...
Read More
Defining and evaluating network communities based on ground-truth

Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ground-truth communities
seed set expansion
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 1,117
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Community membership identification from small seed sets

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Overlapping community detection using seed set expansion

Multiple Local Community Detection via High-Quality Seed Identification

Defining and evaluating network communities based on ground-truth