research-article

Efficient query processing on graph databases

Authors:
James Cheng

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Yiping Ke

The Chinese University of Hong Kong, New Territories, Hong Kong

The Chinese University of Hong Kong, New Territories, Hong Kong
View Profile

,
Wilfred Ng

The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 34 Issue 1Article No.: 2pp 1–48https://doi.org/10.1145/1508857.1508859

Published:23 April 2009Publication History

ACM Transactions on Database Systems

Abstract

We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem.

The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows.

FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.

References

Chen, Q., Lim, A., and Ong, K. W. 2003. D(k)-index: An adaptive structural summary for graph-structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 134--144. Google ScholarDigital Library
Cheng, J., Ke, Y., and Ng, W. 2006. Delta-Tolerance closed frequent itemsets. In Proceedings of the IEEE International Conference on Data Mining (ICDM'06), 139--148. Google ScholarDigital Library
Cheng, J., Ke, Y., and Ng, W. 2008a. Effective elimination of redundant association rules. Data Min. Knowl. Discov. 16, 2, 221--249. Google ScholarDigital Library
Cheng, J., Ke, Y., and Ng, W. 2008b. Maintaining frequent closed itemsets over a sliding window. J. Intell. Inf. Syst. 31, 3, 191--215. Google ScholarDigital Library
Cheng, J., Ke, Y., and Ng, W. 2008c. A survey on algorithms for mining frequent patterns over data streams. Knowl. Inf. Syst. J. 16, 1, 1--27.Google ScholarDigital Library
Cheng, J., Ke, Y., Ng, W., and Lu, A. 2007. Fg-Index: Towards verification-free query processing on graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 857--872. Google ScholarDigital Library
Cheng, J. and Ng, W. 2004. Xqzip: Querying compressed XML using structural indexing. In Proceedings of the International Conference on Extending Database Technology (EDBT'04), 219--236.Google Scholar
Cook, S. A. 1971. The complexity of theorem-proving procedures. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC'71), 151--158. Google ScholarDigital Library
Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection sub- graphs. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'04), 118--127. Google ScholarDigital Library
Golab, L. and Ozsu, M. T. 2003. Issues in data stream management. SIGMOD Rec. 32, 2, 5--14. Google ScholarDigital Library
Goldman, R. and Widom, J. 1997. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the International Conference on Very Large Databases (VLDB'97), 436--445. Google ScholarDigital Library
Guting, R. H. 1994. Graphdb: Modeling and querying graphs in databases. In Proceedings of the International Conference on Very Large Databases (VLDB'94), 297--308. Google ScholarDigital Library
He, H. and Singh, A. K. 2006. Closure-Tree: An index structure for graph queries. In Proceedings of the International Conference on Data Engineering (ICDE'06), 38. Google ScholarDigital Library
Holder, L. B., Cook, D. J., and Djoko, S. 1994. Substucture discovery in the subdue system. In Proceedings of the Workshop at the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'94), 169--180.Google Scholar
Huan, J., Wang, W., Bandyopadhyay, D., Snoeyink, J., Prins, J., and Tropsha, A. 2004. Mining protein family specific residue packing patterns from protein structure graphs. In Proceedings of the Annual Conference on Research in Computational Molecular Biology (RECOMB'04), 308--315. Google ScholarDigital Library
Huan, J., Wang, W., Prins, J., and Yang, J. 2004. Spin: Mining maximal frequent subgraphs from graph databases. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'04), 581--586. Google ScholarDigital Library
Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'00), 13--23. Google ScholarDigital Library
James, C. A., Weininger, D., and Delany, J. 2003. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc.Google Scholar
Jiang, H., Wang, H., Yu, P. S., and Zhou, S. 2007. Gstring: A novel approach for efficient search in graph databases. In Proceedings of the International Conference on Data Engineering (ICDE'07), 566--575.Google Scholar
Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. F. 2002. Covering indexes for branching path queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 133--144. Google ScholarDigital Library
Ke, Y., Cheng, J., and Ng, W. 2007. Correlation search in graph databases. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07), 390--399. Google ScholarDigital Library
Ke, Y., Cheng, J., and Ng, W. 2008. Efficient correlation search from graph databases. IEEE Trans. Knowl. Data Eng. 20, 12, 1601--1615. Google ScholarDigital Library
Koren, Y., North, S. C., and Volinsky, C. 2006. Measuring and extracting proximity in networks. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06), 245--255. Google ScholarDigital Library
Manku, G. S. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Databases (VLDB'02), 346--357. Google ScholarDigital Library
Milo, T. and Suciu, D. 1999. Index structures for path expressions. In Proceedings of the International Conference on Database Theory (ICDT'99), 277--295. Google ScholarDigital Library
Ng, W. and Cheng, J. 2007. An efficient index lattice for xml query evaluation. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA'07), 753--767. Google ScholarDigital Library
Shasha, D., Wang, J. T.-L., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'02), 39--52. Google ScholarDigital Library
Srinivasa, S. and Kumar, S. 2003. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proceedings of the International Conference on Very Large Databases (VLDB'03), 975--986. Google ScholarDigital Library
Tong, H. and Faloutsos, C. 2006. Center-Piece subgraphs: Problem definition and fast solutions. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06), 404--413. Google ScholarDigital Library
Tong, H., Faloutsos, C., Gallagher, B., and Eliassi-Rad, T. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07), 737--746. Google ScholarDigital Library
Williams, D. W., Huan, J., and Wang, W. 2007. Graph database indexing using structured graph decomposition. In Proceedings of the International Conference on Data Engineering (ICDE'07), 976--985.Google Scholar
Yan, X. and Han, J. 2002. Gspan: Graph-Based substructure pattern mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM'02), 721--724. Google ScholarDigital Library
Yan, X. and Han, J. 2003. Closegraph: Mining closed frequent graph patterns. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'03), 286--295. Google ScholarDigital Library
Yan, X., Yu, P. S., and Han, J. 2005a. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30, 4, 960--993. Google ScholarDigital Library
Yan, X., Yu, P. S., and Han, J. 2005b. Substructure similarity search in graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 766--777. Google ScholarDigital Library
Yu, J. X., Chong, Z., Lu, H., and Zhou, A. 2004. False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In Proceedings of the International Conference on Very Large Databases (VLDB'04), 204--215. Google ScholarDigital Library
Zhang, N., Ozsu, M. T., Ilyas, I. F., and Aboulnaga, A. 2006. Fix: Feature-Based indexing technique for XML documents. In Proceedings of the International Conference on Very Large Databases (VLDB'06), 259--270. Google ScholarDigital Library
Zhang, S., Hu, M., and Yang, J. 2007. Treepi: A novel graph indexing method. In Proceedings of the International Conference on Data Mining (ICDE'07), 966--975.Google Scholar
Zhao, P., Yu, J. X., and Yu, P. S. 2007. Graph indexing: Tree + delta >= graph. In Proceedings of the International Conference on Very Large Databases (VLDB'07), 938--949. Google ScholarDigital Library

Index Terms

Efficient query processing on graph databases
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Fast graph query processing with a low-cost index

This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a ...
Read More
Fg-index: towards verification-free query processing on graph databases
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since ...
Read More
Efficient algorithms for supergraph query processing on graph databases

We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 34, Issue 1
April 2009
349 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1508857
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2009
- Accepted: 1 August 2008
- Revised: 1 July 2008
- Received: 1 August 2007
Published in tods Volume 34, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graph databases
frequent subgraphs
graph indexing
graph query processing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 1,986
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient query processing on graph databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fast graph query processing with a low-cost index

Fg-index: towards verification-free query processing on graph databases

Efficient algorithms for supergraph query processing on graph databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient query processing on graph databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Fast graph query processing with a low-cost index

Fg-index: towards verification-free query processing on graph databases

Efficient algorithms for supergraph query processing on graph databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media