ABSTRACT
Expanding a seed set into a larger community is a common procedure in link-based analysis. We show how to adapt recent results from theoretical computer science to expand a seed set into a community with small conductance and a strong relationship to the seed, while examining only a small neighborhood of the entire graph. We extend existing results to give theoretical guarantees that apply to a variety of seed sets from specified communities. We also describe simple and flexible heuristics for applying these methods in practice, and present early experiments showing that these methods compare favorably with existing approaches.
- Krishna Bharat and Monika R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In ACM SIGIR-98, pages 104--111, Melbourne, AU, 1998. Google ScholarDigital Library
- Soumen Chakrabarti, Byron E. Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Laura M. Haas and Ashutosh Tiwary, editors, Proceedings of ACM SIGMOD-98, pages 307--318, Seattle, US, 1998. ACM Press, New York, US. Google ScholarDigital Library
- Fan Chung and Lincoln Lu. Connected components in random graphs with given degree sequences. Annals of Combinatorics, 6:125--145, 2002.Google ScholarCross Ref
- Gary Flake, Steve Lawrence, and C. Lee Giles. Efficient identification of web communities. In Sixth ACM SIGKDD, pages 150--160, Boston, MA, August 20--23 2000. Google ScholarDigital Library
- Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen. Combating web spam with trustrank. In VLDB, pages 576--587, 2004. Google ScholarDigital Library
- George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20:359 -- 392, 1999. Google ScholarDigital Library
- Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11--16):1481--1493, 1999. Google ScholarDigital Library
- Kevin J Lang. Fixing two weaknesses of the spectral method. In NIPS, 2005.Google Scholar
- László Lovász and Miklós Simonovits. The mixing rate of markov chains, an isoperimetric inequality, and computing the volume. In FOCS, pages 346--354, 1990.Google ScholarDigital Library
- László Lovász and Miklós Simonovits. Random walks in a convex body and an improved volume algorithm. Random Struct. Algorithms, 4(4):359--412, 1993.Google ScholarCross Ref
- Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In ACM STOC-04, pages 81--90, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- M. Toyoda and M. Kitsuregawa. Creating a web community chart for navigating related communities, 2001. Google ScholarDigital Library
Index Terms
- Communities from seed sets
Recommendations
Community membership identification from small seed sets
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningIn many applications we have a social network of people and would like to identify the members of an interesting but unlabeled group or community. We start with a small number of exemplar group members -- they may be followers of a political ideology or ...
Extracting link spam using biased random walks from spam seed sets
AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the webLink spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link ...
Automatic seed set expansion for trust propagation based anti-spamming algorithms
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementSeed sets are of significant importance for trust propagation based anti-spamming algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would ...
Comments