ABSTRACT
Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the well-known densest-subgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter.
In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs.
We evaluate our algorithms on real and synthetic datasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open.
- J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, 2002. Google ScholarDigital Library
- R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, 2009. Google ScholarDigital Library
- A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. PVLDB, 5(6), 2012. Google ScholarDigital Library
- S. Arora, D. Karger, and M. Karpinski. Polynomial time approximation schemes for dense instances of NP-hard problems. In STOC, 1995. Google ScholarDigital Library
- Y. Asahiro, R. Hassin, and K. Iwama. Complexity of finding dense subgraphs. Discr. Ap. Math., 121(1--3), 2002. Google ScholarDigital Library
- Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. J. Algorithms, 34(2), 2000. Google ScholarDigital Library
- C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. CACM, 16(9), 1973. Google ScholarDigital Library
- M. Brunato, H. H. Hoos, and R. Battiti. On effectively finding maximal quasi-cliques in graphs. In Learning and Intelligent Optimization. 2008. Google ScholarDigital Library
- G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarDigital Library
- M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, 2000. Google ScholarDigital Library
- F. R. K. Chung and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1), 2003.Google Scholar
- X. Du, et al. Migration motif: a spatial - temporal pattern mining approach for financial markets. In KDD, 2009. Google ScholarDigital Library
- U. Feige. Approximating maximum clique by removing subgraphs. SIAM Journal of Discrete Mathematics, 18(2), 2005. Google ScholarDigital Library
- U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. Algorithmica, 29(3), 2001.Google Scholar
- U. Feige and M. Langberg. Approximation algorithms for maximization problems arising in graph partitioning. J. Algorithms, 41(2), 2001. Google ScholarDigital Library
- E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. MotifCut: regulatory motifs finding with maximum density subgraphs. In ISMB, 2006. Google ScholarDigital Library
- G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. Journal of Computing, 18(1), 1989. Google ScholarDigital Library
- D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, 2005. Google ScholarDigital Library
- A. V. Goldberg. Finding a maximum density subgraph. Technical report, University of California at Berkeley, 1984. Google ScholarDigital Library
- J. Håstad. Clique is hard to approximate within n1--ε. Acta Mathematica, 182(1), 1999.Google ScholarCross Ref
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarDigital Library
- S. Khot. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. Journal of Computing, 36(4), 2006. Google ScholarDigital Library
- S. Khuller and B. Saha. On Finding Dense Subgraphs. ICALP, 2009. Google ScholarDigital Library
- M. N. Kolountzakis and et al.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2), 2012.Google Scholar
- M. A. Langston and et al. A combinatorial approach to the analysis of differential gene expression data: The use of graph algorithms for disease prediction and screening. Methods of Microarray Data Analysis IV. 2005.Google ScholarCross Ref
- V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. Managing and Mining Graph Data. 2010.Google Scholar
- M. Newman. The structure and function of complex networks. SIAM review, 45(2):167--256, 2003.Google Scholar
- A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency (Algorithms and Combinatorics). Springer, 2004.Google Scholar
- T. Sorlie and et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. PNAS, 100(14), 2003.Google Scholar
- M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. KDD, 2010. Google ScholarDigital Library
- T. Uno. An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica, 56(1), 2010. Google ScholarDigital Library
- M. J. van de Vijver and et al. A gene-expression signature as a predictor of survival in breast cancer. The New England journal of medicine, 347(25), 2002.Google Scholar
- R. A. Weinberg. The Biology of Cancer HB. Garland Science, 2006.Google Scholar
Index Terms
- Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
Recommendations
The K-clique Densest Subgraph Problem
WWW '15: Proceedings of the 24th International Conference on World Wide WebNumerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are hard and frequently inapproximable due to connections with the Maximum Clique problem, ...
Exact MIP-based approaches for finding maximum quasi-cliques and dense subgraphs
Given a simple graph and a constant $$\gamma \in (0,1]$$ (0,1], a $$\gamma $$ -quasi-clique is defined as a subset of vertices that induces a subgraph with an edge density of at least $$\gamma $$ . This well-known clique relaxation model arises in a ...
Proportionally dense subgraph of maximum size: Complexity and approximation
AbstractWe define a proportionally dense subgraph (PDS) as an induced subgraph of a graph with the property that each vertex in the PDS is adjacent to proportionally as many vertices in the subgraph as in the graph. We prove that the problem ...
Comments