research-article

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

Authors:
Charalampos Tsourakakis

Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
View Profile

,
Francesco Bonchi

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
Aristides Gionis

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland
View Profile

,
Francesco Gullo

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
Maria Tsiarli

University of Pittsburgh, Pittsburgh, Pennsylvania, USA

University of Pittsburgh, Pittsburgh, Pennsylvania, USA
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 104–112https://doi.org/10.1145/2487575.2487645

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 104–112

ABSTRACT

Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter.

In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs.

We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.

References

J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, 2002. Google ScholarDigital Library
R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, 2009. Google ScholarDigital Library
A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. PVLDB, 5(6), 2012. Google ScholarDigital Library
S. Arora, D. Karger, and M. Karpinski. Polynomial time approximation schemes for dense instances of NP-hard problems. In STOC, 1995. Google ScholarDigital Library
Y. Asahiro, R. Hassin, and K. Iwama. Complexity of finding dense subgraphs. Discr. Ap. Math., 121(1--3), 2002. Google ScholarDigital Library
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. J. Algorithms, 34(2), 2000. Google ScholarDigital Library
C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. CACM, 16(9), 1973. Google ScholarDigital Library
M. Brunato, H. H. Hoos, and R. Battiti. On effectively finding maximal quasi-cliques in graphs. In Learning and Intelligent Optimization. 2008. Google ScholarDigital Library
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarDigital Library
M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, 2000. Google ScholarDigital Library
F. R. K. Chung and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1), 2003.Google Scholar
X. Du, et al. Migration motif: a spatial - temporal pattern mining approach for financial markets. In KDD, 2009. Google ScholarDigital Library
U. Feige. Approximating maximum clique by removing subgraphs. SIAM Journal of Discrete Mathematics, 18(2), 2005. Google ScholarDigital Library
U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3), 2001.Google Scholar
U. Feige and M. Langberg. Approximation algorithms for maximization problems arising in graph partitioning. J. Algorithms, 41(2), 2001. Google ScholarDigital Library
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. MotifCut: regulatory motifs finding with maximum density subgraphs. In ISMB, 2006. Google ScholarDigital Library
G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. Journal of Computing, 18(1), 1989. Google ScholarDigital Library
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, 2005. Google ScholarDigital Library
A. V. Goldberg. Finding a maximum density subgraph. Technical report, University of California at Berkeley, 1984. Google ScholarDigital Library
J. Håstad. Clique is hard to approximate within n^1--ε. Acta Mathematica, 182(1), 1999.Google ScholarCross Ref
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarDigital Library
S. Khot. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. Journal of Computing, 36(4), 2006. Google ScholarDigital Library
S. Khuller and B. Saha. On Finding Dense Subgraphs. ICALP, 2009. Google ScholarDigital Library
M. N. Kolountzakis and et al.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2), 2012.Google Scholar
M. A. Langston and et al. A combinatorial approach to the analysis of differential gene expression data: The use of graph algorithms for disease prediction and screening. Methods of Microarray Data Analysis IV. 2005.Google ScholarCross Ref
V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. Managing and Mining Graph Data. 2010.Google Scholar
M. Newman. The structure and function of complex networks. SIAM review, 45(2):167--256, 2003.Google Scholar
A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency (Algorithms and Combinatorics). Springer, 2004.Google Scholar
T. Sorlie and et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS, 100(14), 2003.Google Scholar
M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. KDD, 2010. Google ScholarDigital Library
T. Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1), 2010. Google ScholarDigital Library
M. J. van de Vijver and et al. A gene-expression signature as a predictor of survival in breast cancer. The New England journal of medicine, 347(25), 2002.Google Scholar
R. A. Weinberg. The Biology of Cancer HB. Garland Science, 2006.Google Scholar

Index Terms

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

The K-clique Densest Subgraph Problem
WWW '15: Proceedings of the 24th International Conference on World Wide Web

Numerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are hard and frequently inapproximable due to connections with the Maximum Clique problem, ...
Read More
Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs

Given a simple graph and a constant $$\gamma \in (0,1]$$ (0,1], a $$\gamma $$ -quasi-clique is defined as a subset of vertices that induces a subgraph with an edge density of at least $$\gamma $$ . This well-known clique relaxation model arises in a ...
Read More
Proportionally dense subgraph of maximum size: Complexity and approximation
Abstract
We define a proportionally dense subgraph (PDS) as an induced subgraph of a graph with the property that each vertex in the PDS is adjacent to proportionally as many vertices in the subgraph as in the graph. We prove that the problem ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dense subgraph
graph mining
quasi-clique
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 195
  Total Citations
  View Citations
- 1,418
  Total Downloads
- Downloads (Last 12 months)126
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

The K-clique Densest Subgraph Problem

Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs

Proportionally dense subgraph of maximum size: Complexity and approximation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

The K-clique Densest Subgraph Problem

Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs

Proportionally dense subgraph of maximum size: Complexity and approximation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media