skip to main content
10.1145/2487575.2487645acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

Published:11 August 2013Publication History

ABSTRACT

Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter.

In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs.

We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.

References

  1. J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. PVLDB, 5(6), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Arora, D. Karger, and M. Karpinski. Polynomial time approximation schemes for dense instances of NP-hard problems. In STOC, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Asahiro, R. Hassin, and K. Iwama. Complexity of finding dense subgraphs. Discr. Ap. Math., 121(1--3), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. J. Algorithms, 34(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. CACM, 16(9), 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Brunato, H. H. Hoos, and R. Battiti. On effectively finding maximal quasi-cliques in graphs. In Learning and Intelligent Optimization. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. R. K. Chung and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1), 2003.Google ScholarGoogle Scholar
  12. X. Du, et al. Migration motif: a spatial - temporal pattern mining approach for financial markets. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. U. Feige. Approximating maximum clique by removing subgraphs. SIAM Journal of Discrete Mathematics, 18(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3), 2001.Google ScholarGoogle Scholar
  15. U. Feige and M. Langberg. Approximation algorithms for maximization problems arising in graph partitioning. J. Algorithms, 41(2), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. MotifCut: regulatory motifs finding with maximum density subgraphs. In ISMB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. Journal of Computing, 18(1), 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. V. Goldberg. Finding a maximum density subgraph. Technical report, University of California at Berkeley, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Håstad. Clique is hard to approximate within n1--ε. Acta Mathematica, 182(1), 1999.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Khot. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. Journal of Computing, 36(4), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Khuller and B. Saha. On Finding Dense Subgraphs. ICALP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. N. Kolountzakis and et al.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2), 2012.Google ScholarGoogle Scholar
  25. M. A. Langston and et al. A combinatorial approach to the analysis of differential gene expression data: The use of graph algorithms for disease prediction and screening. Methods of Microarray Data Analysis IV. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. Managing and Mining Graph Data. 2010.Google ScholarGoogle Scholar
  27. M. Newman. The structure and function of complex networks. SIAM review, 45(2):167--256, 2003.Google ScholarGoogle Scholar
  28. A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency (Algorithms and Combinatorics). Springer, 2004.Google ScholarGoogle Scholar
  29. T. Sorlie and et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS, 100(14), 2003.Google ScholarGoogle Scholar
  30. M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. J. van de Vijver and et al. A gene-expression signature as a predictor of survival in breast cancer. The New England journal of medicine, 347(25), 2002.Google ScholarGoogle Scholar
  33. R. A. Weinberg. The Biology of Cancer HB. Garland Science, 2006.Google ScholarGoogle Scholar

Index Terms

  1. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2013
      1534 pages
      ISBN:9781450321747
      DOI:10.1145/2487575

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 August 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader