skip to main content
research-article

Efficient query processing on graph databases

Authors Info & Claims
Published:23 April 2009Publication History
Skip Abstract Section

Abstract

We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem.

The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows.

FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.

References

  1. Chen, Q., Lim, A., and Ong, K. W. 2003. D(k)-index: An adaptive structural summary for graph-structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 134--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cheng, J., Ke, Y., and Ng, W. 2006. Delta-Tolerance closed frequent itemsets. In Proceedings of the IEEE International Conference on Data Mining (ICDM'06), 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cheng, J., Ke, Y., and Ng, W. 2008a. Effective elimination of redundant association rules. Data Min. Knowl. Discov. 16, 2, 221--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cheng, J., Ke, Y., and Ng, W. 2008b. Maintaining frequent closed itemsets over a sliding window. J. Intell. Inf. Syst. 31, 3, 191--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cheng, J., Ke, Y., and Ng, W. 2008c. A survey on algorithms for mining frequent patterns over data streams. Knowl. Inf. Syst. J. 16, 1, 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cheng, J., Ke, Y., Ng, W., and Lu, A. 2007. Fg-Index: Towards verification-free query processing on graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 857--872. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cheng, J. and Ng, W. 2004. Xqzip: Querying compressed XML using structural indexing. In Proceedings of the International Conference on Extending Database Technology (EDBT'04), 219--236.Google ScholarGoogle Scholar
  8. Cook, S. A. 1971. The complexity of theorem-proving procedures. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC'71), 151--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection sub- graphs. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'04), 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Golab, L. and Ozsu, M. T. 2003. Issues in data stream management. SIGMOD Rec. 32, 2, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Goldman, R. and Widom, J. 1997. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the International Conference on Very Large Databases (VLDB'97), 436--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guting, R. H. 1994. Graphdb: Modeling and querying graphs in databases. In Proceedings of the International Conference on Very Large Databases (VLDB'94), 297--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. He, H. and Singh, A. K. 2006. Closure-Tree: An index structure for graph queries. In Proceedings of the International Conference on Data Engineering (ICDE'06), 38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Holder, L. B., Cook, D. J., and Djoko, S. 1994. Substucture discovery in the subdue system. In Proceedings of the Workshop at the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'94), 169--180.Google ScholarGoogle Scholar
  15. Huan, J., Wang, W., Bandyopadhyay, D., Snoeyink, J., Prins, J., and Tropsha, A. 2004. Mining protein family specific residue packing patterns from protein structure graphs. In Proceedings of the Annual Conference on Research in Computational Molecular Biology (RECOMB'04), 308--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huan, J., Wang, W., Prins, J., and Yang, J. 2004. Spin: Mining maximal frequent subgraphs from graph databases. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'04), 581--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'00), 13--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James, C. A., Weininger, D., and Delany, J. 2003. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc.Google ScholarGoogle Scholar
  19. Jiang, H., Wang, H., Yu, P. S., and Zhou, S. 2007. Gstring: A novel approach for efficient search in graph databases. In Proceedings of the International Conference on Data Engineering (ICDE'07), 566--575.Google ScholarGoogle Scholar
  20. Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. F. 2002. Covering indexes for branching path queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 133--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ke, Y., Cheng, J., and Ng, W. 2007. Correlation search in graph databases. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07), 390--399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ke, Y., Cheng, J., and Ng, W. 2008. Efficient correlation search from graph databases. IEEE Trans. Knowl. Data Eng. 20, 12, 1601--1615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Koren, Y., North, S. C., and Volinsky, C. 2006. Measuring and extracting proximity in networks. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06), 245--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Manku, G. S. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Databases (VLDB'02), 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Milo, T. and Suciu, D. 1999. Index structures for path expressions. In Proceedings of the International Conference on Database Theory (ICDT'99), 277--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ng, W. and Cheng, J. 2007. An efficient index lattice for xml query evaluation. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA'07), 753--767. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shasha, D., Wang, J. T.-L., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'02), 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Srinivasa, S. and Kumar, S. 2003. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proceedings of the International Conference on Very Large Databases (VLDB'03), 975--986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tong, H. and Faloutsos, C. 2006. Center-Piece subgraphs: Problem definition and fast solutions. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06), 404--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tong, H., Faloutsos, C., Gallagher, B., and Eliassi-Rad, T. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07), 737--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Williams, D. W., Huan, J., and Wang, W. 2007. Graph database indexing using structured graph decomposition. In Proceedings of the International Conference on Data Engineering (ICDE'07), 976--985.Google ScholarGoogle Scholar
  32. Yan, X. and Han, J. 2002. Gspan: Graph-Based substructure pattern mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM'02), 721--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yan, X. and Han, J. 2003. Closegraph: Mining closed frequent graph patterns. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'03), 286--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yan, X., Yu, P. S., and Han, J. 2005a. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30, 4, 960--993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yan, X., Yu, P. S., and Han, J. 2005b. Substructure similarity search in graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 766--777. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yu, J. X., Chong, Z., Lu, H., and Zhou, A. 2004. False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In Proceedings of the International Conference on Very Large Databases (VLDB'04), 204--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhang, N., Ozsu, M. T., Ilyas, I. F., and Aboulnaga, A. 2006. Fix: Feature-Based indexing technique for XML documents. In Proceedings of the International Conference on Very Large Databases (VLDB'06), 259--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhang, S., Hu, M., and Yang, J. 2007. Treepi: A novel graph indexing method. In Proceedings of the International Conference on Data Mining (ICDE'07), 966--975.Google ScholarGoogle Scholar
  39. Zhao, P., Yu, J. X., and Yu, P. S. 2007. Graph indexing: Tree + delta >= graph. In Proceedings of the International Conference on Very Large Databases (VLDB'07), 938--949. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient query processing on graph databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 34, Issue 1
        April 2009
        349 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1508857
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 April 2009
        • Accepted: 1 August 2008
        • Revised: 1 July 2008
        • Received: 1 August 2007
        Published in tods Volume 34, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader