gStore: a graph-based SPARQL query engine

Zou, Lei; Özsu, M. Tamer; Chen, Lei; Shen, Xuchuan; Huang, Ruizhe; Zhao, Dongyan

doi:10.1007/s00778-013-0337-7

gStore: a graph-based SPARQL query engine

Regular Paper
Published: 28 September 2013

Volume 23, pages 565–590, (2014)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Lei Zou¹,
M. Tamer Özsu²,
Lei Chen³,
Xuchuan Shen¹,
Ruizhe Huang¹ &
…
Dongyan Zhao¹

3273 Accesses
139 Citations
3 Altmetric
Explore all metrics

Abstract

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Article 13 November 2021

Processing SPARQL queries over distributed RDF graphs

Article 04 January 2016

$$\textsf {GQA}_{\textsf {RDF}}$$ : A Graph-Based Approach Towards Efficient SPARQL Query Answering

Notes

The literature on aggregation and aggregate queries frequently refer to these attributes as dimensions. We follow the same convention.
Note that, this is not necessarily $O_n$, since node identifiers are arbitrarily assigned to help with presentation.
Although dimension list is a set, when the order is important, we specify them as a list enclosed in ( ).
We revised the original 14 to remove type reasoning that gStore does not currently support; the resulting queries return larger result sets since there is no filtering as a result of type reasoning. For completeness, these are included in Online Supplements as Table 15.

References

Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422 (2007)
Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)
Article Google Scholar
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International World Wide Web Conference, pp. 41–50 (2010)
Bernstein, P.A., Chiu, D.-M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)
Article MATH MathSciNet Google Scholar
Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of the 1st Latin American Web Congress, pp. 27–36 (2003)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference, pp. 54–68 (2002)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)
MATH Google Scholar
Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87 (1986)
Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)
Article Google Scholar
Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference, pp. 90–101 (2003)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L., Srivastava, D.: Using $q$-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull. 24(4), 28–34 (2001)
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)
Article Google Scholar
Gupta, A., Dallan Quass, V.H.: Aggregate-query processing in data warehousing environments. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 358–369 (1995)
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007)
Hoffart, J., Suchanek, F.M., Berberich, K., Kelham, E.L., de Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International World Wide Web Conference, pp. 229–232 (2011)
Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: Proceedings of the 21st International Conference on Data Engineering, pp. 717–728 (2005)
Johnson, T., Shasha, D.: B-trees with inserts and deletes: why free-at-empty is better than merge-at-half. J. Comput. Syst. Sci. 47(1), 45–76 (1993)
Article MATH MathSciNet Google Scholar
Kitagawa, H., Ishikawa, Y.: False drop analysis of set retrieval with signature files. IEICE Trans. Inf. Syst. E80–D(6), 1–12 (1997)
Google Scholar
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)
Article Google Scholar
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
Article Google Scholar
Neumann, T., Weikum, G.: x-RDF-3x: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 1(1), 256–263 (2010)
Article Google Scholar
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)
Google Scholar
Seid, D.Y., Mehrotra, S.: Grouping and aggregate queries over semantic web databases. In: Proceedings of the International Conference on Semantic Computing, pp. 775–782 (2007)
Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 39–52 (2002)
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference, pp. 595–604 (2008)
Tousidou, E., Nanopoulos, A., Manolopoulos, Y.: Improved methods for signature-tree construction. Comput. J. 43(4), 301–314 (2000)
Article MATH Google Scholar
Tousidou, E., Bozanis, P., Manolopoulos, Y.: Signature-based structures for objects with set-valued attributes. Inf. Syst. 27(2), 93–121 (2002)
Article MATH Google Scholar
Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 1465–1470 (2007)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)
Article Google Scholar
Wilkinson, K., Sayers, C. , Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of the 1st Inter national Workshop on Semantic Web and Databases, pp. 131–150 (2003)
Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1263–1266 (2009)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)
Yuan, P., Liu, P., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)
Article Google Scholar

Download references

Acknowledgments

Lei Zou’s work was supported by National Science Foundation of China (NSFC) under Grant No. 61370055 and by CCF-Tencent Open Research Fund. M. Tamer Özsu’s work was supported by Natural Sciences and Engineering Research Council (NSERC) of Canada under a Discovery Grant. Lei Chen’s work was supported in part by the Hong Kong RGC Project M-HKUST602/12, National Grand Fundamental Research 973 Program of China under Grant No. 2012-CB316200, Microsoft Research Asia Grant, and a Google Faculty Award. Dongyan Zhao was supported by NSFC under Grant No. 61272344 and China 863 Project under Grant No. 2012AA011101.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, China
Lei Zou, Xuchuan Shen, Ruizhe Huang & Dongyan Zhao
David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
M. Tamer Özsu
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Lei Chen

Authors

Lei Zou
View author publications
You can also search for this author in PubMed Google Scholar
M. Tamer Özsu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xuchuan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Ruizhe Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dongyan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Tamer Özsu.

Additional information

Extended version of paper “gStore: Answering SPARQL Queries via Subgraph Matching” that was presented at 2011 VLDB Conference.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 200 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, L., Özsu, M.T., Chen, L. et al. gStore: a graph-based SPARQL query engine. The VLDB Journal 23, 565–590 (2014). https://doi.org/10.1007/s00778-013-0337-7

Download citation

Received: 21 December 2012
Revised: 07 August 2013
Accepted: 26 August 2013
Published: 28 September 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00778-013-0337-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

gStore: a graph-based SPARQL query engine

Abstract

Access this article

Similar content being viewed by others

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Processing SPARQL queries over distributed RDF graphs

$$\textsf {GQA}_{\textsf {RDF}}$$ : A Graph-Based Approach Towards Efficient SPARQL Query Answering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 200 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

gStore: a graph-based SPARQL query engine

Abstract

Access this article

Similar content being viewed by others

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Processing SPARQL queries over distributed RDF graphs

$$\textsf {GQA}_{\textsf {RDF}}$$ : A Graph-Based Approach Towards Efficient SPARQL Query Answering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 200 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation