ABSTRACT
Finding dense subgraphs is a fundamental graph-theoretic problem, that lies in the heart of numerous graph-mining applications, ranging from finding communities in social networks, to detecting regulatory motifs in DNA, and to identifying real-time stories in news. The problem of finding dense subgraphs has been studied extensively in theoretical computer science, and recently, due to the relevance of the problem in real-world applications, it has attracted considerable attention in the data-mining community.
In this tutorial we aim to provide a comprehensive overview of (i) major algorithmic techniques for finding dense subgraphs in large graphs and (ii) graph mining applications that rely on dense subgraph extraction. We will present fundamental concepts and algorithms that date back to 80's, as well as the latest advances in the area, from theoretical and from practical point-of-view. We will motivate the problem of finding dense subgraphs by discussing how it can be used in real-world applications. We will discuss different density definitions and the complexity of the corresponding optimization problems. We will also present efficient algorithms for different density measures and under different computational models. Specifically, we will focus on scalable streaming, distributed and MapReduce algorithms. Finally we will discuss problem variants, extensions, and will provide pointers for future research directions.
Supplemental Material
- R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, pages 25--37, 2009. Google ScholarDigital Library
- A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. VLDB Endowment, 5(6):574--585, 2012. Google ScholarDigital Library
- B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. Proceedings of the VLDB Endowment, 5(5):454--465, 2012. Google ScholarDigital Library
- O. D. Balalau, F. Bonchi, T. Chan, F. Gullo, and M. Sozio. Finding subgraphs with maximum total density and limited overlap. In WSDM, pages 379--388, 2015. Google ScholarDigital Library
- M. Bhattacharya, Sayan Henzinger, D. Nanongkai, and C. Tsourakakis. Space- and time-efficient algorithms for maintaining dense subgraphs on one-pass dynamic streams. In 47th ACM Symposium on Theory of Computing (STOC), 2015. Google ScholarDigital Library
- M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000. Google ScholarDigital Library
- U. Feige, D. Peleg, and G. Kortsarz. The dense k-subgraph problem. Algorithmica, 29(3):410--421, 2001.Google ScholarDigital Library
- E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. Bioinformatics, 22(14):e150--e157, 2006. Google ScholarDigital Library
- A. V. Goldberg. Finding a maximum density subgraph. UCB technical report, 1984. Google ScholarDigital Library
- S. Khuller and B. Saha. On finding dense subgraphs. In Automata, Languages and Programming, pages 597--608, 2009. Google ScholarDigital Library
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11--16):1481--1493, 1999. Google ScholarDigital Library
- V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data, pages 303--336. Springer, 2010.Google ScholarCross Ref
- J. Nishimura and J. Ugander. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1106--1114. ACM, 2013. Google ScholarDigital Library
- P. Rozenshtein, A. Anagnostopoulos, A. Gionis, and N. Tatti. Event detection in activity networks. In KDD, pages 1176--1185, 2014. Google ScholarDigital Library
- P. Rozenshtein, N. Tatti, and A. Gionis. Discovering dynamic communities in interaction networks. In ECML PKDD, pages 678--693. Springer, 2014.Google ScholarDigital Library
- M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In KDD, pages 939--948, 2010. Google ScholarDigital Library
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222--1230. ACM, 2012. Google ScholarDigital Library
- N. Tatti and A. Gionis. Density-friendly graph decomposition. In WWW, 2015. Google ScholarDigital Library
- H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 404--413. ACM, 2006. Google ScholarDigital Library
- C. Tsourakakis. The k-clique densest subgraph problem. WWW, 2015. Google ScholarDigital Library
- C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD, pages 104--112, 2013. Google ScholarDigital Library
- C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In WSDM, pages 333--342. ACM, 2014. Google ScholarDigital Library
Index Terms
- Dense Subgraph Discovery: KDD 2015 tutorial
Recommendations
The K-clique Densest Subgraph Problem
WWW '15: Proceedings of the 24th International Conference on World Wide WebNumerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are hard and frequently inapproximable due to connections with the Maximum Clique problem, ...
Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataGiven a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud ...
Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningFinding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative ...
Comments