Learning-based SPARQL query performance modeling and prediction

Zhang, Wei Emma; Sheng, Quan Z.; Qin, Yongrui; Taylor, Kerry; Yao, Lina

doi:10.1007/s11280-017-0498-1

Learning-based SPARQL query performance modeling and prediction

Published: 24 October 2017

Volume 21, pages 1015–1035, (2018)
Cite this article

World Wide Web Aims and scope Submit manuscript

Wei Emma Zhang ORCID: orcid.org/0000-0002-0406-5974¹,
Quan Z. Sheng¹,
Yongrui Qin²,
Kerry Taylor³ &
…
Lina Yao⁴

890 Accesses
16 Citations
3 Altmetric
Explore all metrics

Abstract

One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: Proceedings of the 14th international conference on extending database technology (EDBT 2011), pp. 449–460. Uppsala, Sweden (2011)
Akdere, M., Ċetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: Proceedings of the 28th international conference on data engineering (ICDE 2012), pp. 390–401. Washington DC, USA (2012)
Altman, N.S.: An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
Bursztyn, D., Goasdouė, F., Manolescu, I.: Optimizing reformulation-based query answering in RDF. In: Proceedings of the 18th international conference on extending database technology (EDBT 2015), pp. 265–276. Brussels, Belgium (2015)
Chang, C., Lin, C.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
Duggan, J., Ċetintemel, U., Papaemmanouil, O., Upfal, E.: Performance prediction for concurrent database workloads. In: Proceedings of the 2011 international conference on management of data (SIGMOD 2011), pp. 337–348. Athens, Greece (2011)
Ganapathi, A., Kuno, H.A., Dayal, U., Wiener, J.L., Fox, A., Jordan, M.I., Patterson, D.A.: Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In: Proceedings of the 25th international conference on data engineering (ICDE 2009), pp. 592–603. Shanghai China (2009)
Gubichev, A., Neumann, T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: Proceedings of the 17th international conference on extending database technology (EDBT 2014), pp. 439–450. Athens, Greece (2014)
Hasan, R.: Predicting SPARQL query performance and explaining linked data. In: Proceedings of the 11th extended semantic web conference (ESWC 2014), pp. 795–805. Anissaras, Crete, Greece (2014)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, Berlin (2013)
Book Google Scholar
Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2002)
MATH Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Li, J., Kȯnig, A.C., Narasayya, V.R., Chaudhuri, S.: Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. The VLDB Endowment (PVLDB) 5(11), 1555–1566 (2012)
Article Google Scholar
Morsey, M., Lehmann, J., Auer, S., Ngomo, A.N.: Usage-Centric Benchmarking of RDF Triple Stores. In: Proceedings of the 26th AAAI conference on artificial intelligence. Toronto, Canada (2012)
Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th international conference on data engineering (ICDE 2011), pp. 984–994. Hannover, Germany (2011)
Pėrez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)
Article Google Scholar
Quilitz, B., Leser, U.: Querying distributed rdf data sources with sparql. In: Proceedings of the 5th Extended Semantic Web Conference (ESWC 2008), pp. 524–538. Tenerife, Spain (2008)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Book Google Scholar
Smola, A., Vapnik, V.: Support Vector Regression Machines. Adv. Neural Inf. Proces. Syst. 9, 155–161 (1997)
Google Scholar
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation. In: Proceedings of the 17th international world wide web conference (WWW 2008), pp. 595–604. Beijing, China (2008)
Tozer, S., Brecht, T., Aboulnaga, A.: Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads. In: Proceedings of the 26th international conference on data engineering (ICDE 2010), pp. 397–408. Long Beach, USA (2010)
Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.A.: Heuristics-based query optimisation for SPARQL. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 324–335. Uppsala, Sweden (2012)
Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigu̇mu̇s, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable?. In: Proceedings of the 29th International Conference on Data Engineering (ICDE 2013), pp. 1081–1092. Brisbane Australia (2013)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A.F.M., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Zhang, W.E., Sheng, Q.Z.: Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases. In: Handproceedings of big data technologies, pp. 429–453 (2017)
Zhang, W.E., Sheng, Q.Z., Taylor, K., Qin, Y.: Identifying and Caching Hot Triples for Efficient RDF Query Processing. In: Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA 2015), pp. 259–274. Hanoi, Vietnam (2015)

Download references

Author information

Authors and Affiliations

Department of Computing, Macquarie University, Sydney, Australia
Wei Emma Zhang & Quan Z. Sheng
School of Computing and Engineering, University of Huddersfield, Huddersfield, UK
Yongrui Qin
Research School of Computer Science, Australian National University, Canberra, Australia
Kerry Taylor
School of Computer Science and Engineering, The University of New South Wales, Kensington, Australia
Lina Yao

Authors

Wei Emma Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Z. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Yongrui Qin
View author publications
You can also search for this author in PubMed Google Scholar
Kerry Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Lina Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Emma Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W.E., Sheng, Q.Z., Qin, Y. et al. Learning-based SPARQL query performance modeling and prediction. World Wide Web 21, 1015–1035 (2018). https://doi.org/10.1007/s11280-017-0498-1

Download citation

Published: 24 October 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11280-017-0498-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning-based SPARQL query performance modeling and prediction

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

DB-GPT: Large Language Model Meets Database

Big data analytics on Apache Spark

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning-based SPARQL query performance modeling and prediction

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

DB-GPT: Large Language Model Meets Database

Big data analytics on Apache Spark

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation