Skip to main content
Log in

R3F: RDF triple filtering method for efficient SPARQL query processing

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the rapid growth in the amount of graph-structured Resource Description Framework (RDF) data, SPARQL query processing has received significant attention. The most important part of SPARQL query processing is its method of subgraph pattern matching. For this, most RDF stores use relation-based approaches, which can produce a vast number of redundant intermediate results during query evaluation. In order to address this problem, we propose an RDF Triple Filtering (R3F) method that exploits the graph-structural information of RDF data. We design a path-based index called the RDF Path index (RP-index) to efficiently provide filter data for the triple filtering. We also propose a relational operator called the RDF Filter (RFLT) that can conduct the triple filtering with little overhead compared to the original query processing. Through comprehensive experiments on large-scale RDF datasets, we demonstrate that R3F can effectively and efficiently reduce the number of redundant intermediate results and improve the query performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)

    Article  Google Scholar 

  2. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International Conference on World Wide Web (WWW 2010) (2010)

  3. Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.D.: Magic sets and other strange ways to implement logic programs. In: Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS 1986) (1986)

  4. Bernstein, P.A., Chiu, D.M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia—a crystallization point for the web of data. J. Web Sem. 7(3), 154–165 (2009)

    Article  Google Scholar 

  6. Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: DOGMA: a disk-oriented graph matching algorithm for RDF databases. In: Proceedings of the 8th International Semantic Web Conference (ISWC 2009) (2009)

  7. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the First International Semantic Web Conference (ISWC 2002) (2002)

  8. Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International Conference on World Wide Web—Alternate Track Papers & Posters (WWW 2004) (2004)

  9. Chebotko, A., Lu, S., Fotouhi, F.: Semantics preserving SPARQL-to-SQL translation. Data Knowl. Eng. 68(10), 973–1000 (2009)

    Article  Google Scholar 

  10. Chen, M.S., Hsiao, H.I., Yu, P.S.: On applying hash filters to improving the execution of multi-join queries. VLDB J. 6(2), 121–131 (1997)

    Article  Google Scholar 

  11. Erling, O., Mikhailov, I.: RDF support in the Virtuoso DBMS. In: Proceedings of the 1st Conference on Social Semantic Web (CSSW 2007) (2007)

  12. Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. The MIT Press (1998)

  13. Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 1997) (1997)

  14. Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: a survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381 –1403 (2007)

    Article  Google Scholar 

  15. Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–170 (1993)

    Article  Google Scholar 

  16. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Sem. 3(2–3), 158–182 (2005)

    Article  Google Scholar 

  17. He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2008) (2008)

  18. He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004) (2004)

  19. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  20. Huang, H., Liu, C., Zhou, X.: Approximating query answering on RDF databases. World Wide Web 15(1), 89–114 (2012)

    Article  MathSciNet  Google Scholar 

  21. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002) (2002)

  22. Kim, K., Moon, B., Kim, H.J.: RP-Filter: a path-based triple filtering method for efficient SPARQL query processing. In: Proceedings of the 2011 Joint International Semantic Technology Conference (JIST 2011) (2011)

  23. Klyne, G., Carroll, J.J.: Resource description framework (RDF): concepts and abstract syntax. W3c recommendation, World Wide Web Consortium (2004)

  24. Köhler, H.: Estimating set intersection using small samples. In: Proceedings of the Thirty-Third Australasian Computer Science Conference (ACSC 2010) (2010)

  25. Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. In: Proceedings of the Fourth SIAM International Conference on Data Mining (SDM 2004) (2004)

  26. Maduko, A., Anyanwu, K., Sheth, A.P., Schliekelman, P.: Graph summaries for subgraph frequency estimation. In: Proceedings the 5th European Semantic Web Conference (ESWC 2008) (2008)

  27. Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory (ICDT 1999) (1999)

  28. Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB 2(1), 982–993 (2009)

    Google Scholar 

  29. Morsey, M., Lehmann, J., Auer, S., Ngomo, A.C.N.: DBpedia SPARQL benchmark—performance assessment with real queries on real data. In: Proceedings of the 10th International Semantic Web Conference (ISWC 2011) (2011)

  30. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th International Conference on Data Engineering (ICDE 2011) (2011)

  31. Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. PVLDB 1(1), 647–659 (2008)

    Google Scholar 

  32. Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009) (2009)

  33. Owens, A., Seaborne, A., Gibbins, N.: Clustered TDB: a clustered triple store for Jena. Tech. rep., University of Southampton (2008)

  34. Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3c recommendation, W3C Recommendation (2008)

  35. Qun, C., Lim, A., Ong, K.W.: D(k)-index: an adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003) (2003)

  36. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: a SPARQL performance benchmark. In: Proceedings of the 25th International Conference on Data Engineering (ICDE 2009) (2009)

  37. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD 1979) (1979)

  38. Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2002) (2002)

  39. Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)

    Google Scholar 

  40. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International Conference on World Wide Web (WWW 2008) (2008)

  41. Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)

    Google Scholar 

  42. Tian, Y., McEachin, R.C., Santos, C., States, D.J., Patel, J.M.: SAGA: a subgraph matching tool for biological graphs. Bioinformatics 23(2), 232–239 (2007)

    Article  Google Scholar 

  43. Tran, T., Ladwig, G.: Structure index for RDF data. In: Workshop on Semantic Data Management (SemData@VLDB2010) (2010)

  44. Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI 2007) (2007)

  45. Virgilio, R.D., Nostro, P.D., Gianforme, G., Paolozzi, S.: A scalable and extensible framework for query answering over RDF. World Wide Web 14(5–6), 599–622 (2011)

    Article  Google Scholar 

  46. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)

    Google Scholar 

  47. Wong, K.F., Yu, J., Tang, N.: Answering XML queries using path-based indexes: a survey. World Wide Web 9(3), 277–299 (2006)

    Article  Google Scholar 

  48. Yan, X., Yu, P.S., Han, J.: Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30(4), 960–993 (2005)

    Article  Google Scholar 

  49. Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology (EDBT 2009) (2009)

  50. Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1), 340–351 (2010)

    Google Scholar 

  51. Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kisung Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, K., Moon, B. & Kim, HJ. R3F: RDF triple filtering method for efficient SPARQL query processing. World Wide Web 18, 317–357 (2015). https://doi.org/10.1007/s11280-013-0253-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0253-1

Keywords

Navigation