skip to main content
10.1145/2213836.2213855acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Query preserving graph compression

Authors Info & Claims
Published:20 May 2012Publication History

ABSTRACT

It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Λ of queries of users' choice. We compute a small Gr from a graph G such that (a) for any query Q Ε Λ Q, Q(G) = Q'(Gr), where Q' Ε Λ can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q' on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Λ. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while reserving query answers. (2) We provide techniques for aintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two lasses of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using real-life data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57% for graph pattern matching, and that our incremental maintenance algorithms are efficient.

References

  1. A. V. Aho, M. R. Garey, and J. D. Ullman. The transitive reduction of a directed graph. SICOMP, 1(2), 1972.Google ScholarGoogle Scholar
  2. P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Boldi and S. Vigna. The webgraph framework i: compression techniques. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Buneman, M. Grohe, and C. Koch. Path queries on compressed XML. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SICOMP, 32(5), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Deng, B. Choi, J. Xu, and S. S. Bhowmick. Optimizing incremental maintenance of minimal bisimulation of cyclic graphs. In DASFAA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Dovier, C. Piazza, and A. Policriti. A fast bisimulation algorithm. In CAV, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Feder and R. Motwani. Clique partitions, graph compression and speeding-up algorithms. JCSS, 51(2):261--272, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. He, H.Wang, J. Yang, and P. S. Yu. Compact reachability labeling for graph-structured data. In CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: A highcompression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In ICDE, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 1(1):2, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Maserrat and J. Pei. Neighbor query friendly compression of social networks. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Mislove, M. Marcon, P. K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Internet Measurement Comference, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Moyles and G. Thompson. An algorithm for finding a minimum equivalent graph of a digraph. JACM, 16(3), 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Ntoulas, J. Cho, and C. Olston. What's new on the Web? The evolution of the Web from a search engine perspective. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Paige and R. E. Tarjan. Three partition refinement algorithms. SICOMP, 16(6), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Perugini, M. A. Gonçalves, and E. A. Fox. Recommender systems research: A connection-centric survey. J. Intell. Inf. Syst., 23(2):107--143, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Qun, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Raghavan and H. Garcia-Molina. Representing Web graphs. In ICDE, 2003.Google ScholarGoogle Scholar
  28. G. Ramalingam and T. Reps. On the computational complexity of dynamic graph problems. TCS, 158(1--2), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. H. Randall, R. Stata, J. L. Wiener, and R. Wickremesinghe. The link database: Fast access to graphs of the web. In DCC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Saha. An incremental bisimulation algorithm. In FSTTCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In SIGCOMM Workshop on Social Networks (WOSN), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Yildirim, V. Chaoji, andM. J. Zaki. Grail: Scalable reachability index for large graphs. PVLDB, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. X. Yu and J. Cheng. Graph Reachability Queries: A Survey. 2010.Google ScholarGoogle Scholar

Index Terms

  1. Query preserving graph compression

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
      May 2012
      886 pages
      ISBN:9781450312479
      DOI:10.1145/2213836

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader