skip to main content
10.1145/2623330.2623621acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Community membership identification from small seed sets

Published:24 August 2014Publication History

ABSTRACT

In many applications we have a social network of people and would like to identify the members of an interesting but unlabeled group or community. We start with a small number of exemplar group members -- they may be followers of a political ideology or fans of a music genre -- and need to use those examples to discover the additional members. This problem gives rise to the seed expansion problem in community detection: given example community members, how can the social graph be used to predict the identities of remaining, hidden community members? In contrast with global community detection (graph partitioning or covering), seed expansion is best suited for identifying communities locally concentrated around nodes of interest. A growing body of work has used seed expansion as a scalable means of detecting overlapping communities. Yet despite growing interest in seed expansion, there are divergent approaches in the literature and there still isn't a systematic understanding of which approaches work best in different domains. Here we evaluate several variants and uncover subtle trade-offs between different approaches. We explore which properties of the seed set can improve performance, focusing on heuristics that one can control in practice. As a consequence of this systematic understanding we have found several opportunities for performance gains. We also consider an adaptive version in which requests are made for additional membership labels of particular nodes, such as one finds in field studies of social communities. This leads to interesting connections and contrasts with active learning and the trade-offs of exploration and exploitation. Finally, we explore topological properties of communities and seed sets that correlate with algorithm performance, and explain these empirical observations with theoretical ones. We evaluate our methods across multiple domains, using publicly available datasets with labeled, ground-truth communities.

References

  1. Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, and Robert Kleinberg. On the separability of structural classes of communities. In In KDD '12, pages 624--632. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Reid Andersen, Fan Chung, and Kevin Lang. Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pages 475--486. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Reid Andersen and Kevin J Lang. Communities from seed sets. In In WWW '06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James P Bagrow. Evaluating local community methods in networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(05):P05001, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  5. Aaron Clauset. Finding local community structure in networks. Physical review E, 72(2):026132, 2005.Google ScholarGoogle Scholar
  6. Charles H Hubbell. An input-output approach to clique identification. Sociometry, 1965.Google ScholarGoogle ScholarCross RefCross Ref
  7. Glen Jeh and Jennifer Widom. Scaling personalized web search. In In WWW '03, pages 271--279. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. George Karypis and Vipin Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.Google ScholarGoogle Scholar
  9. Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39--43, 1953.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jon M Kleinberg. Authoritative sources in a hyperlink environment. In Proc. of ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jure Leskovec, Kevin J Lang, and Michael Mahoney. Empirical comparison of algorithms for network community detection. In In WWW '10, pages 631--640. ACM, 2010. data source. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Feng Luo, James Z Wang, and Eric Promislow. Exploring local community structures in large networks. Web Intelligence and Agent Systems, 6(4):387--400, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Andrew Mehler and Steven Skiena. Expanding network communities from representative examples. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(2):7, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alan Mislove, Bimal Viswanath, Krishna P Gummadi, and Peter Druschel. You are who you know: inferring user profiles in online social networks. In In WSDM '10, pages 251--260. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. 1999.Google ScholarGoogle Scholar
  16. Jason Riedy, David A Bader, Karl Jiang, Pushkar Pande, and Richa Sharma. Detecting communities from given seeds in social networks. 2011.Google ScholarGoogle Scholar
  17. Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In In STOC '04, pages 81--90. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ingmar Weber, Venkata R Kiran Garimella, and Alaa Batayneh. Secular vs. islamist polarization in egypt on twitter. In In ASONAM '13, pages 290--297. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. Overlapping community detection using seed set expansion. In In CIKM '13, pages 2099--2108. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Baoning Wu and Kumar Chellapilla. Extracting link spam using biased random walks from spam seed sets. In In AIRWeb '07, pages 37--44. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. In In MDS '12, page 3. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Community membership identification from small seed sets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2014
      2028 pages
      ISBN:9781450329569
      DOI:10.1145/2623330

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader