ABSTRACT
It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Λ of queries of users' choice. We compute a small Gr from a graph G such that (a) for any query Q Ε Λ Q, Q(G) = Q'(Gr), where Q' Ε Λ can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q' on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Λ. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while reserving query answers. (2) We provide techniques for aintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two lasses of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using real-life data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57% for graph pattern matching, and that our incremental maintenance algorithms are efficient.
- A. V. Aho, M. R. Garey, and J. D. Ullman. The transitive reduction of a directed graph. SICOMP, 1(2), 1972.Google Scholar
- P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011. Google ScholarDigital Library
- P. Boldi and S. Vigna. The webgraph framework i: compression techniques. In WWW, 2004. Google ScholarDigital Library
- P. Buneman, M. Grohe, and C. Koch. Path queries on compressed XML. In VLDB, 2003. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In KDD, 2009. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SICOMP, 32(5), 2003. Google ScholarDigital Library
- J. Deng, B. Choi, J. Xu, and S. S. Bhowmick. Optimizing incremental maintenance of minimal bisimulation of cyclic graphs. In DASFAA, 2011. Google ScholarDigital Library
- A. Dovier, C. Piazza, and A. Policriti. A fast bisimulation algorithm. In CAV, 2001. Google ScholarDigital Library
- W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1), 2010. Google ScholarDigital Library
- T. Feder and R. Motwani. Clique partitions, graph compression and speeding-up algorithms. JCSS, 51(2):261--272, 1995. Google ScholarDigital Library
- H. He, H.Wang, J. Yang, and P. S. Yu. Compact reachability labeling for graph-structured data. In CIKM, 2005. Google ScholarDigital Library
- M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: A highcompression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In SIGMOD, 2008. Google ScholarDigital Library
- R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, 2002.Google ScholarDigital Library
- R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, 2006. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 1(1):2, 2007. Google ScholarDigital Library
- H. Maserrat and J. Pei. Neighbor query friendly compression of social networks. In KDD, 2010. Google ScholarDigital Library
- T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999. Google ScholarDigital Library
- A. Mislove, M. Marcon, P. K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Comference, 2007. Google ScholarDigital Library
- D. Moyles and G. Thompson. An algorithm for finding a minimum equivalent graph of a digraph. JACM, 16(3), 1969. Google ScholarDigital Library
- S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD, 2008. Google ScholarDigital Library
- A. Ntoulas, J. Cho, and C. Olston. What's new on the Web? The evolution of the Web from a search engine perspective. In WWW, 2004. Google ScholarDigital Library
- R. Paige and R. E. Tarjan. Three partition refinement algorithms. SICOMP, 16(6), 1987. Google ScholarDigital Library
- S. Perugini, M. A. Gonçalves, and E. A. Fox. Recommender systems research: A connection-centric survey. J. Intell. Inf. Syst., 23(2):107--143, 2004. Google ScholarDigital Library
- C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In SIGMOD, 2003. Google ScholarDigital Library
- S. Raghavan and H. Garcia-Molina. Representing Web graphs. In ICDE, 2003.Google Scholar
- G. Ramalingam and T. Reps. On the computational complexity of dynamic graph problems. TCS, 158(1--2), 1996. Google ScholarDigital Library
- K. H. Randall, R. Stata, J. L. Wiener, and R. Wickremesinghe. The link database: Fast access to graphs of the web. In DCC, 2002. Google ScholarDigital Library
- D. Saha. An incremental bisimulation algorithm. In FSTTCS, 2007. Google ScholarDigital Library
- J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In KDD, 2008. Google ScholarDigital Library
- S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In SIGMOD, 2011. Google ScholarDigital Library
- B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In SIGCOMM Workshop on Social Networks (WOSN), 2009. Google ScholarDigital Library
- H. Yildirim, V. Chaoji, andM. J. Zaki. Grail: Scalable reachability index for large graphs. PVLDB, 3(1), 2010. Google ScholarDigital Library
- J. X. Yu and J. Cheng. Graph Reachability Queries: A Survey. 2010.Google Scholar
Index Terms
- Query preserving graph compression
Recommendations
Graph compression based on transitivity for neighborhood query
Highlights- A lossy model for graph compression is presented that answers to neighborhood queries.
AbstractIn recent years, many graph compression methods have been introduced. One successful category of them is based on local decompression designed to answer neighborhood queries. These techniques mainly rely on local similarities of ...
Multi-level Graph Compression for Fast Reachability Detection
Database Systems for Advanced ApplicationsAbstractFast reachability detection is one of the key problems in graph applications. Most of the existing works focus on creating an index and answering reachability based on that index. For these approaches, the index construction time and index size ...
Graph Compression with Stars
Advances in Knowledge Discovery and Data MiningAbstractMaking massive graph data easily understandable by people is a demanding task in a variety of real applications. Graph compression is an effective approach to reducing the size of graph data as well as its complexity in structures. This paper ...
Comments