Skip to main content
Log in

gStore: a graph-based SPARQL query engine

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. The literature on aggregation and aggregate queries frequently refer to these attributes as dimensions. We follow the same convention.

  2. Note that, this is not necessarily \(O_n\), since node identifiers are arbitrarily assigned to help with presentation.

  3. Although dimension list is a set, when the order is important, we specify them as a list enclosed in ( ).

  4. We revised the original 14 to remove type reasoning that gStore does not currently support; the resulting queries return larger result sets since there is no filtering as a result of type reasoning. For completeness, these are included in Online Supplements as Table 15.

References

  1. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422 (2007)

  2. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)

    Article  Google Scholar 

  3. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International World Wide Web Conference, pp. 41–50 (2010)

  4. Bernstein, P.A., Chiu, D.-M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of the 1st Latin American Web Congress, pp. 27–36 (2003)

  6. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference, pp. 54–68 (2002)

  7. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  8. Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87 (1986)

  9. Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)

    Article  Google Scholar 

  10. Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference, pp. 90–101 (2003)

  11. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L., Srivastava, D.: Using \(q\)-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull. 24(4), 28–34 (2001)

    Google Scholar 

  12. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)

    Article  Google Scholar 

  13. Gupta, A., Dallan Quass, V.H.: Aggregate-query processing in data warehousing environments. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 358–369 (1995)

  14. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007)

  15. Hoffart, J., Suchanek, F.M., Berberich, K., Kelham, E.L., de Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International World Wide Web Conference, pp. 229–232 (2011)

  16. Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: Proceedings of the 21st International Conference on Data Engineering, pp. 717–728 (2005)

  17. Johnson, T., Shasha, D.: B-trees with inserts and deletes: why free-at-empty is better than merge-at-half. J. Comput. Syst. Sci. 47(1), 45–76 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  18. Kitagawa, H., Ishikawa, Y.: False drop analysis of set retrieval with signature files. IEICE Trans. Inf. Syst. E80–D(6), 1–12 (1997)

    Google Scholar 

  19. Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)

  20. Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)

    Article  Google Scholar 

  21. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)

    Article  Google Scholar 

  22. Neumann, T., Weikum, G.: x-RDF-3x: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 1(1), 256–263 (2010)

    Article  Google Scholar 

  23. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)

    Google Scholar 

  24. Seid, D.Y., Mehrotra, S.: Grouping and aggregate queries over semantic web databases. In: Proceedings of the International Conference on Semantic Computing, pp. 775–782 (2007)

  25. Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 39–52 (2002)

  26. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference, pp. 595–604 (2008)

  27. Tousidou, E., Nanopoulos, A., Manolopoulos, Y.: Improved methods for signature-tree construction. Comput. J. 43(4), 301–314 (2000)

    Article  MATH  Google Scholar 

  28. Tousidou, E., Bozanis, P., Manolopoulos, Y.: Signature-based structures for objects with set-valued attributes. Inf. Syst. 27(2), 93–121 (2002)

    Article  MATH  Google Scholar 

  29. Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 1465–1470 (2007)

  30. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)

    Article  Google Scholar 

  31. Wilkinson, K., Sayers, C. , Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of the 1st Inter national Workshop on Semantic Web and Databases, pp. 131–150 (2003)

  32. Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1263–1266 (2009)

  33. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)

  34. Yuan, P., Liu, P., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

Lei Zou’s work was supported by National Science Foundation of China (NSFC) under Grant No. 61370055 and by CCF-Tencent Open Research Fund. M. Tamer Özsu’s work was supported by Natural Sciences and Engineering Research Council (NSERC) of Canada under a Discovery Grant. Lei Chen’s work was supported in part by the Hong Kong RGC Project M-HKUST602/12, National Grand Fundamental Research 973 Program of China under Grant No. 2012-CB316200, Microsoft Research Asia Grant, and a Google Faculty Award. Dongyan Zhao was supported by NSFC under Grant No. 61272344 and China 863 Project under Grant No. 2012AA011101.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Tamer Özsu.

Additional information

Extended version of paper “gStore: Answering SPARQL Queries via Subgraph Matching” that was presented at 2011 VLDB Conference.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 200 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, L., Özsu, M.T., Chen, L. et al. gStore: a graph-based SPARQL query engine. The VLDB Journal 23, 565–590 (2014). https://doi.org/10.1007/s00778-013-0337-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0337-7

Keywords

Navigation