ABSTRACT
Motivated by applications in community detection and dense subgraph discovery, we consider new clustering objectives in hypergraphs and bipartite graphs. These objectives are parameterized by one or more resolution parameters in order to enable diverse knowledge discovery in complex data.
For both hypergraph and bipartite objectives, we identify relevant parameter regimes that are equivalent to existing objectives and share their (polynomial-time) approximation algorithms. We first show that our parameterized hypergraph correlation clustering objective is related to higher-order notions of normalized cut and modularity in hypergraphs. It is further amenable to approximation algorithms via hyperedge expansion techniques.
Our parameterized bipartite correlation clustering objective generalizes standard unweighted bipartite correlation clustering, as well as the bicluster deletion problem. For a certain choice of parameters it is also related to our hypergraph objective. Although in general it is NP-hard, we highlight a parameter regime for the bipartite objective where the problem reduces to the bipartite matching problem and thus can be solved in polynomial time. For other parameter settings, we present several approximation algorithms using linear program rounding techniques. These results allow us to introduce the first constant-factor approximation for bicluster deletion, the task of removing a minimum number of edges to partition a bipartite graph into disjoint bi-cliques.
In several experimental results, we highlight the flexibility of our framework and the diversity of results that can be obtained in different parameter settings. This includes clustering bipartite graphs across a range of parameters, detecting motif-rich clusters in an email network and a food web, and forming clusters of retail products in a product review hypergraph, that are highly correlated with known product categories.
- Sameer Agarwal, Jongwoo Lim, Lihi Zelnik-Manor, Pietro Perona, David Kriegman, and Serge Belongie. 2005. Beyond Pairwise Clustering (CVPR '05).Google Scholar
- Nir. Ailon, Noa. Avigdor-Elgrabli, Edo. Liberty, and Anke. van Zuylen. 2012. Improved Approximation Algorithms for Bipartite Correlation Clustering. SIAM J. Comput., Vol. 41, 5 (2012), 1110--1121.Google ScholarDigital Library
- Nir Ailon, Moses Charikar, and Alantha Newman. 2008. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), Vol. 55, 5 (2008), 23.Google ScholarDigital Library
- Ilya Amburg, Nate Veldt, and Austin R Benson. Clustering in graphs and hypergraphs with categorical edge labels (WWW '20).Google Scholar
- Noga Amit. 2004. The bicluster graph editing problem. Master's thesis. Tel Aviv University.Google Scholar
- A Arenas, A Ferná ndez, S Fortunato, and S Gó mez. 2008b. Motif-based communities in complex networks. Journal of Physics A: Mathematical and Theoretical, Vol. 41, 22 (2008).Google ScholarCross Ref
- A Arenas, A Ferná ndez, and S Gó mez. 2008a. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, Vol. 10, 5 (2008).Google ScholarCross Ref
- M. Asteris, A. Kyrillidis, D. Papailiopoulos, and A. Dimakis. Bipartite correlation clustering: Maximizing agreements (AISTATS '16).Google Scholar
- Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation Clustering. Machine Learning, Vol. 56 (2004), 89--113.Google ScholarDigital Library
- Austin R. Benson, David F. Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science, Vol. 353, 6295 (2016), 163--166.Google Scholar
- Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, Vol. 2008, 10 (2008), P10008.Google ScholarCross Ref
- Justin Brickell, Inderjit S. Dhillon, Suvrit Sra, and Joel A. Tropp. 2008. The Metric Nearness Problem. SIAM J. Matrix Anal. Appl., Vol. 30, 1 (2008), 375--396.Google ScholarDigital Library
- Ü mit V. cC atalyü rek and Cevdet Aykanat. 1999. Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication. IEEE Transactions on Parallel and Distributed Systems, Vol. 10, 7 (1999), 673--693.Google ScholarDigital Library
- Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. 2005. Clustering with qualitative information. J. Comput. System Sci., Vol. 71, 3 (2005), 360 -- 383. Learning Theory 2003.Google ScholarDigital Library
- Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. 2015. Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs (STOC '15). ACM.Google Scholar
- J.-C. Delvenne, S. N. Yaliraki, and M. Barahona. 2010. Stability of graph communities across time scales. Proceedings of the National Academy of Sciences, Vol. 107, 29 (2010), 12755--12760.Google ScholarCross Ref
- Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. 2006. Correlation clustering in general weighted graphs. Theoretical Computer Science, Vol. 361, 2 (2006), 172 -- 187. Approximation and Online Algorithms.Google ScholarDigital Library
- Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2007. Weighted Graph Cuts without Eigenvectors A Multilevel Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, 11 (2007), 1944--1957.Google ScholarDigital Library
- Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences, Vol. 104, 1 (2007), 36--41.Google ScholarCross Ref
- Takuro Fukunaga. 2018. LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering. In Computing and Combinatorics .Google Scholar
- David F. Gleich, Nate Veldt, and Anthony Wirth. 2018. Correlation Clustering Generalized (ISAAC 2018).Google Scholar
- J. Gong and Sung Kyu Lim. 1998. Multiway partitioning with pairwise movement (ICAD '98).Google Scholar
- S. W. Hadley, B. L. Mark, and A. Vannelli. 1992. An efficient eigenvector approach for finding netlist partitions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 11, 7 (1992).Google ScholarDigital Library
- Matthias Hein, Simon Setzer, Leonardo Jost, and Syama Sundar Rangapuram. 2013. The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited (NIPS'13).Google Scholar
- Edmund Ihler, Dorothea Wagner, and Frank Wagner. 1993. Modeling Hypergraphs by Graphs with the Same Mincut Properties. Inf. Process. Lett., Vol. 45, 4 (1993).Google Scholar
- Lucas G. S. Jeub, Marya Bazzi, Inderjit S. Jutla, and Peter J. Mucha. 2011--2017. A generalized Louvain method for community detection implemented in MATLAB. (2011--2017). http://netwiki.amath.unc.edu/GenLouvainGoogle Scholar
- Bogumił Kami'nski, Valérie Poulin, Paweł Prałat, Przemysław Szufel, and Francc ois Théberge. 2019. Clustering via hypergraph modularity. PloS one, Vol. 14, 11 (2019).Google Scholar
- George Karypis and Vipin Kumar. 1998. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput., Vol. 20, 1 (1998), 359--392.Google ScholarDigital Library
- George Karypis and Vipin Kumar. 1999. Multilevel K-way Hypergraph Partitioning (DAC '99). ACM, 343--348.Google Scholar
- Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang D. Yoo. 2011. Higher-Order Correlation Clustering for Image Segmentation (NIPS '11).Google Scholar
- Christine Klymko, David F. Gleich, and Tamara G. Kolda. 2014. Using Triangles to Improve Community Detection in Directed Networks. In The Second ASE International Conference on Big Data Science and Computing, BigDataScience .Google Scholar
- Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. 2020. A New Measure of Modularity in Hypergraphs: Theoretical Insights and Implications for Effective Clustering. In Complex Networks and Their Applications VIII. Springer International Publishing.Google Scholar
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 1, 1 (2007), 2.Google ScholarDigital Library
- Pan Li, H. Dau, Gregory J. Puleo, and Olgica Milenkovic. 2017. Motif clustering and overlapping clustering for social network analysis (INFOCOM '17). 1--9.Google Scholar
- Pan Li and Olgica Milenkovic. 2017. Inhomogeneous Hypergraph Clustering with Applications (NIPS '17). 2308--2318.Google Scholar
- Pan Li and Olgica Milenkovic. 2018. Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering (ICML '18). 3020--3029.Google Scholar
- Pan Li, Gregory. J. Puleo, and Olgica. Milenkovic. 2019. Motif and Hypergraph Correlation Clustering. IEEE Transactions on Information Theory (2019), 1--1.Google ScholarCross Ref
- Tom Michoel and Bruno Nachtergaele. 2012. Alignment and integration of complex networks by hypergraph-based spectral clustering. Physical Review E, Vol. 86 (2012), 056111. Issue 5.Google ScholarCross Ref
- Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E, Vol. 69, 026113 (2004).Google Scholar
- Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects (EMNLP-IJCNLP '19). 188--197.Google Scholar
- Leto Peel, Daniel B. Larremore, and Aaron Clauset. 2017. The ground truth about metadata and community detection in networks. Science Advances, Vol. 3, 5 (2017).Google Scholar
- Gregory. J. Puleo and Olgica. Milenkovic. 2018. Correlation Clustering and Biclustering With Locally Bounded Errors. IEEE Transactions on Information Theory, Vol. 64, 6 (June 2018), 4105--4119.Google ScholarCross Ref
- Jörg Reichardt and Stefan Bornholdt. 2004. Detecting Fuzzy Community Structures in Complex Networks with a Potts Model. Phys. Rev. Lett., Vol. 93 (2004), 218701.Google ScholarCross Ref
- Cameron Ruggles, Nate Veldt, and David F. Gleich. A Parallel Projection Method for Metric Constrained Optimization (SIAM CSC '20).Google Scholar
- Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review (2007).Google Scholar
- Ron Shamir, Roded Sharan, and Dekel Tsur. 2004. Cluster graph modification problems. Discrete Applied Mathematics, Vol. 144 (2004), 173--182.Google ScholarDigital Library
- Jianbo Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, 8 (2000), 888--905.Google ScholarDigital Library
- Rishi Sonthalia and Anna C. Gilbert. 2020. Project and Forget: Solving Large-Scale Metric Constrained Problems. (2020). arxiv: cs.LG/2005.03853Google Scholar
- Ze Tian, TaeHyun Hwang, and Rui Kuang. 2009. A hypergraph-based learning algorithm for classifying gene expression and arrayCGH data with prior knowledge. Bioinformatics, Vol. 25, 21 (2009), 2831--2838.Google ScholarDigital Library
- V. A. Traag, P. Van Dooren, and Y. Nesterov. 2011. Narrow scope for resolution-limit-free community detection. Phys. Rev. E, Vol. 84 (Jul 2011), 016114. Issue 1.Google ScholarCross Ref
- Charalampos E. Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. 2017. Scalable Motif-aware Graph Clustering (WWW '17). 1451--1460.Google Scholar
- Anke van Zuylen and David P. Williamson. 2009. Deterministic Pivoting Algorithms for Constrained Ranking and Clustering Problems. Mathematics of Operations Research, Vol. 34, 3 (2009), 594--620.Google ScholarDigital Library
- Nate Veldt, Austin R. Benson, and Jon Kleinberg. 2020. Hypergraph Cuts with General Splitting Functions. (2020). arxiv: cs.DS/2001.02817Google Scholar
- Nate Veldt, David F. Gleich, and Anthony Wirth. 2018. A Correlation Clustering Framework for Community Detection (WWW '18). 439--448.Google Scholar
- Nate Veldt, David F. Gleich, and Anthony Wirth. 2019 a. Learning Resolution Parameters for Graph Clustering (WWW '19).Google Scholar
- Nate Veldt, David F. Gleich, Anthony Wirth, and James Saunderson. 2019 b. Metric-Constrained Optimization for Graph Clustering Algorithms. SIAM Journal on Mathematics of Data Science, Vol. 1, 2 (2019), 333--355.Google ScholarCross Ref
- Nate Veldt, Anthony Wirth, and David F. Gleich. 2020. Parameterized Correlation Clustering in Hypergraphs and Bipartite Graphs. (2020). arxiv: cs.DS/2002.09460Google Scholar
- Hao Yin, Austin R. Benson, and Jure Leskovec. 2018. Higher-order clustering in networks. Phys. Rev. E, Vol. 97 (2018), 052306. Issue 5.Google ScholarCross Ref
- Hao Yin, Austin R Benson, Jure Leskovec, and David F Gleich. 2017. Local higher-order graph clustering (KDD '17). 555--564.Google Scholar
- Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with Hypergraphs: Clustering, Classification, and Embedding (NIPS '06).Google Scholar
- J. Y. Zien, M. D. F. Schlag, and P. K. Chan. 1999. Multilevel spectral hypergraph partitioning with arbitrary vertex sizes. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 18, 9 (1999), 1389--1399.Google ScholarDigital Library
Index Terms
- Parameterized Correlation Clustering in Hypergraphs and Bipartite Graphs
Recommendations
Complexity and approximation results for the connected vertex cover problem in graphs and hypergraphs
We study a variation of the vertex cover problem where it is required that the graph induced by the vertex cover is connected. We prove that this problem is polynomial in chordal graphs, has a PTAS in planar graphs, is APX-hard in bipartite graphs and ...
Dominating induced matching in some subclasses of bipartite graphs
AbstractA subset M ⊆ E of edges of a graph G = ( V , E ) is called a matching if no two edges of M share a common vertex. An edge e ∈ E is said to dominate itself and all other edges adjacent to it. A matching M in a graph G = ( V , E ) is ...
Strong Transversals in Hypergraphs and Double Total Domination in Graphs
Let $H$ be a 3-uniform hypergraph of order $n$ and size $m$, and let $T$ be a subset of vertices of $H$. The set $T$ is a strong transversal in $H$ if $T$ contains at least two vertices from every edge of $H$. The strong transversal number $\tau_s(H)$ ...
Comments