Abstract
We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem.
The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows.
FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.
- Chen, Q., Lim, A., and Ong, K. W. 2003. D(k)-index: An adaptive structural summary for graph-structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 134--144. Google ScholarDigital Library
- Cheng, J., Ke, Y., and Ng, W. 2006. Delta-Tolerance closed frequent itemsets. In Proceedings of the IEEE International Conference on Data Mining (ICDM'06), 139--148. Google ScholarDigital Library
- Cheng, J., Ke, Y., and Ng, W. 2008a. Effective elimination of redundant association rules. Data Min. Knowl. Discov. 16, 2, 221--249. Google ScholarDigital Library
- Cheng, J., Ke, Y., and Ng, W. 2008b. Maintaining frequent closed itemsets over a sliding window. J. Intell. Inf. Syst. 31, 3, 191--215. Google ScholarDigital Library
- Cheng, J., Ke, Y., and Ng, W. 2008c. A survey on algorithms for mining frequent patterns over data streams. Knowl. Inf. Syst. J. 16, 1, 1--27.Google ScholarDigital Library
- Cheng, J., Ke, Y., Ng, W., and Lu, A. 2007. Fg-Index: Towards verification-free query processing on graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 857--872. Google ScholarDigital Library
- Cheng, J. and Ng, W. 2004. Xqzip: Querying compressed XML using structural indexing. In Proceedings of the International Conference on Extending Database Technology (EDBT'04), 219--236.Google Scholar
- Cook, S. A. 1971. The complexity of theorem-proving procedures. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC'71), 151--158. Google ScholarDigital Library
- Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection sub- graphs. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'04), 118--127. Google ScholarDigital Library
- Golab, L. and Ozsu, M. T. 2003. Issues in data stream management. SIGMOD Rec. 32, 2, 5--14. Google ScholarDigital Library
- Goldman, R. and Widom, J. 1997. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the International Conference on Very Large Databases (VLDB'97), 436--445. Google ScholarDigital Library
- Guting, R. H. 1994. Graphdb: Modeling and querying graphs in databases. In Proceedings of the International Conference on Very Large Databases (VLDB'94), 297--308. Google ScholarDigital Library
- He, H. and Singh, A. K. 2006. Closure-Tree: An index structure for graph queries. In Proceedings of the International Conference on Data Engineering (ICDE'06), 38. Google ScholarDigital Library
- Holder, L. B., Cook, D. J., and Djoko, S. 1994. Substucture discovery in the subdue system. In Proceedings of the Workshop at the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'94), 169--180.Google Scholar
- Huan, J., Wang, W., Bandyopadhyay, D., Snoeyink, J., Prins, J., and Tropsha, A. 2004. Mining protein family specific residue packing patterns from protein structure graphs. In Proceedings of the Annual Conference on Research in Computational Molecular Biology (RECOMB'04), 308--315. Google ScholarDigital Library
- Huan, J., Wang, W., Prins, J., and Yang, J. 2004. Spin: Mining maximal frequent subgraphs from graph databases. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'04), 581--586. Google ScholarDigital Library
- Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'00), 13--23. Google ScholarDigital Library
- James, C. A., Weininger, D., and Delany, J. 2003. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc.Google Scholar
- Jiang, H., Wang, H., Yu, P. S., and Zhou, S. 2007. Gstring: A novel approach for efficient search in graph databases. In Proceedings of the International Conference on Data Engineering (ICDE'07), 566--575.Google Scholar
- Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. F. 2002. Covering indexes for branching path queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 133--144. Google ScholarDigital Library
- Ke, Y., Cheng, J., and Ng, W. 2007. Correlation search in graph databases. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07), 390--399. Google ScholarDigital Library
- Ke, Y., Cheng, J., and Ng, W. 2008. Efficient correlation search from graph databases. IEEE Trans. Knowl. Data Eng. 20, 12, 1601--1615. Google ScholarDigital Library
- Koren, Y., North, S. C., and Volinsky, C. 2006. Measuring and extracting proximity in networks. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06), 245--255. Google ScholarDigital Library
- Manku, G. S. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Databases (VLDB'02), 346--357. Google ScholarDigital Library
- Milo, T. and Suciu, D. 1999. Index structures for path expressions. In Proceedings of the International Conference on Database Theory (ICDT'99), 277--295. Google ScholarDigital Library
- Ng, W. and Cheng, J. 2007. An efficient index lattice for xml query evaluation. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA'07), 753--767. Google ScholarDigital Library
- Shasha, D., Wang, J. T.-L., and Giugno, R. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'02), 39--52. Google ScholarDigital Library
- Srinivasa, S. and Kumar, S. 2003. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proceedings of the International Conference on Very Large Databases (VLDB'03), 975--986. Google ScholarDigital Library
- Tong, H. and Faloutsos, C. 2006. Center-Piece subgraphs: Problem definition and fast solutions. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'06), 404--413. Google ScholarDigital Library
- Tong, H., Faloutsos, C., Gallagher, B., and Eliassi-Rad, T. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'07), 737--746. Google ScholarDigital Library
- Williams, D. W., Huan, J., and Wang, W. 2007. Graph database indexing using structured graph decomposition. In Proceedings of the International Conference on Data Engineering (ICDE'07), 976--985.Google Scholar
- Yan, X. and Han, J. 2002. Gspan: Graph-Based substructure pattern mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM'02), 721--724. Google ScholarDigital Library
- Yan, X. and Han, J. 2003. Closegraph: Mining closed frequent graph patterns. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'03), 286--295. Google ScholarDigital Library
- Yan, X., Yu, P. S., and Han, J. 2005a. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30, 4, 960--993. Google ScholarDigital Library
- Yan, X., Yu, P. S., and Han, J. 2005b. Substructure similarity search in graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 766--777. Google ScholarDigital Library
- Yu, J. X., Chong, Z., Lu, H., and Zhou, A. 2004. False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In Proceedings of the International Conference on Very Large Databases (VLDB'04), 204--215. Google ScholarDigital Library
- Zhang, N., Ozsu, M. T., Ilyas, I. F., and Aboulnaga, A. 2006. Fix: Feature-Based indexing technique for XML documents. In Proceedings of the International Conference on Very Large Databases (VLDB'06), 259--270. Google ScholarDigital Library
- Zhang, S., Hu, M., and Yang, J. 2007. Treepi: A novel graph indexing method. In Proceedings of the International Conference on Data Mining (ICDE'07), 966--975.Google Scholar
- Zhao, P., Yu, J. X., and Yu, P. S. 2007. Graph indexing: Tree + delta >= graph. In Proceedings of the International Conference on Very Large Databases (VLDB'07), 938--949. Google ScholarDigital Library
Index Terms
- Efficient query processing on graph databases
Recommendations
Fast graph query processing with a low-cost index
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a ...
Fg-index: towards verification-free query processing on graph databases
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataGraphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since ...
Efficient algorithms for supergraph query processing on graph databases
We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large ...
Comments