skip to main content
10.1145/2783258.2783385acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling

Published:10 August 2015Publication History

ABSTRACT

Extracting dense subgraphs from large graphs is a key primitive in a variety of graph mining applications, ranging from mining social networks and the Web graph to bioinformatics [41]. In this paper we focus on a family of poly-time solvable formulations, known as the k-clique densest subgraph problem (k-Clique-DSP) [57]. When k=2, the problem becomes the well-known densest subgraph problem (DSP) [22, 31, 33, 39]. Our main contribution is a sampling scheme that gives densest subgraph sparsifier, yielding a randomized algorithm that produces high-quality approximations while providing significant speedups and improved space complexity. We also extend this family of formulations to bipartite graphs by introducing the (p,q)-biclique densest subgraph problem ((p,q)-Biclique-DSP), and devise an exact algorithm that can treat both clique and biclique densities in a unified way.

As an example of performance, our sparsifying algorithm extracts the 5-clique densest subgraph --which is a large-near clique on 62 vertices-- from a large collaboration network. Our algorithm achieves 100% accuracy over five runs, while achieving an average speedup factor of over 10,000. Specifically, we reduce the running time from ∼2 107 seconds to an average running time of 0.15 seconds. We also use our methods to study how the k-clique densest subgraphs change as a function of time in time-evolving networks for various small values of k. We observe significant deviations between the experimental findings on real-world networks and stochastic Kronecker graphs, a random graph model that mimics real-world networks in certain aspects.

We believe that our work is a significant advance in routines with rigorous theoretical guarantees for scalable extraction of large near-cliques from networks.

Skip Supplemental Material Section

Supplemental Material

p815.mp4

mp4

196.1 MB

References

  1. http://www.avglab.com/soft/hipr.tar.Google ScholarGoogle Scholar
  2. http://snap.stanford.edu/data/index.html.Google ScholarGoogle Scholar
  3. http://grouplens.org/datasets.Google ScholarGoogle Scholar
  4. http://research.nii.ac.jp/~uno/codes.htm.Google ScholarGoogle Scholar
  5. Large Near-Clique Detection. http://tinyurl.com/o6y33g9.Google ScholarGoogle Scholar
  6. J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Andoni, A. Gupta, and R. Krauthgamer. Towards (1+ ε)-approximate flow sparsifiers. In SODA, pages 279--293. SIAM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. In VLDB, 5(6), pages 574--585, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. In Journal of Algorithms, 34(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. D. Bader and C. W. Hogue. An automated method for finding molecular complexes in large protein interaction networks. In BMC bioinformatics, 2003.Google ScholarGoogle Scholar
  12. B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. In VLDB , 5(5):454--465, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. O. D. Balalau, F. Bonchi, T. Chan, F. Gullo, and M. Sozio. Finding subgraphs with maximum total density and limited overlap. In WSDM, pages 379--388. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. In Arxiv, arXiv.cs/0310049, 2003.Google ScholarGoogle Scholar
  15. A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in Õ(n2) time. In STOC, pages 47--55, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an o(n1/4) approximation for densest k-subgraph. In STOC, pages 201--210, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Bhattacharya, M. Henzinger, D. Nanongkai, and C. E. Tsourakakis. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC, 2015 (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. M. Bomze, M. Budinich, P. M. Pardalos, and M. Pelillo. The maximum clique problem. In Handbook of combinatorial optimization, pages 1--74, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. In KDD, pages 1316--1325, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. In Communications of the ACM, 16(9):575--577, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, pages 95--106, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Chen, Y. Saad. Dense Subgraph Extraction with Application to Community Detection. In TKDE, vol. 24, pages 1216--1230, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. V. Cherkassky and A. V. Goldberg. On implementing the push-relabel method for the maximum flow problem. In Algorithmica, 19(4), pages 390--410, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. In SIAM Journal on Computing, 14(1), pages 210--223, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Eppstein. Arboricity and bipartite subgraph listing algorithms. In Information Processing Letters, 51(4), pages 207--211, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Eppstein, M. Löffler, and D. Strash. Listing all maximal cliques in sparse graphs in near-optimal time. In ISAAC, 2010.Google ScholarGoogle Scholar
  28. U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. In Algorithmica, 29(3), 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. Finocchi, M. Finocchi, and E. G. Fusco. Counting small cliques in mapreduce. In ArXiv arXiv:1403.0734, 2014.Google ScholarGoogle Scholar
  30. E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. In Bioinformatics, vol. 22(14), pages 150--157, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. In Journal of Computing, 18(1), 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721--732, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. V. Goldberg. Finding a maximum density subgraph. Tech. report, UC Berkeley, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. In Journal of the ACM (JACM), 35(4), pages 921--940, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Håstad. Clique is hard to approximate within n1-ε. In Acta Mathematica, 182(1), 1999.Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. S. Johnson and M. A. Trick. Cliques, coloring, and satisfiability: second DIMACS implementation challenge American Mathematical Soc., 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Kannan and V. Vinay. Analyzing the structure of large graphs, manuscript, 1999.Google ScholarGoogle Scholar
  39. S. Khuller and B. Saha. On finding dense subgraphs. In ICALP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. E. Knuth. Seminumerical algorithms. 2007.Google ScholarGoogle Scholar
  41. V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data, pages 303--336, Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  42. Y. T. Lee and A. Sidford. Path finding methods for linear programming: Solving linear programs in õ (vrank) iterations and faster algorithms for maximum flow. In FOCS, pages 424--433, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. F. T. Leighton and A. Moitra. Extensions and limits to vertex sparsification. In STOC, pages 47--56, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. In The Journal of Machine Learning Research, vol. 11, pages 985--1042, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, pages 177--187, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. Makino and T. Uno. New algorithms for enumerating all maximal cliques. In SWAT pages 260--272, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  47. M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarGoogle ScholarCross RefCross Ref
  48. A. Moitra. Approximation algorithms for multicommodity-type problems with guarantees independent of the graph size. In FOCS, pages 3--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization. In Mathematical Programming, vol. 118(2), pages 237--251, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. In Information Processing Letters, vol. 112(7), pages 277--281, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. E. Sariyuce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In WWW, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC, pages 81--90, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Thorup and U. Zwick. Approximate distance oracles. In Journal of the ACM (JACM), vol. 52(1), pages 1--24, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Thorup and U. Zwick. Spanners and emulators with sublinear distance errors. In SODA, pages 802--809, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. C.E. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In KDD, pages 104--112, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. C. E. Tsourakakis. Mathematical and Algorithmic Analysis of Network and Biological Data. PhD thesis, Carnegie Mellon University, 2013.Google ScholarGoogle Scholar
  57. C. E. Tsourakakis. The k-clique densest subgraph problem. In WWW, pages 1122--1132, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. C. E. Tsourakakis, M. N. Kolountzakis, and G. L. Miller. Triangle sparsifiers. In J. Graph Algorithms Appl., 15(6), pages 703--726, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  59. T. Uno. An efficient algorithm for solving pseudo clique enumeration problem. In Algorithmica, 56(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. N. Wang, J. Zhang, K.-L. Tan, and A. K. Tung. On triangulation-based dense neighborhood graph discovery. In VLDB, 4(2), pages 58--68, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader