research-article

An end-to-end learning-based cost estimator

Authors:
Ji Sun

Tsinghua University

Tsinghua University
View Profile

,
Guoliang Li

Tsinghua University

Tsinghua University
View Profile

Proceedings of the VLDB Endowment Volume 13 Issue 3pp 307–319https://doi.org/10.14778/3368289.3368296

Published:01 November 2019Publication History

Proceedings of the VLDB Endowment

Abstract

Cost and cardinality estimation is vital to query optimizer, which can guide the query plan selection. However traditional empirical cost and cardinality estimation techniques cannot provide high-quality estimation, because they may not effectively capture the correlation between multiple tables. Recently the database community shows that the learning-based cardinality estimation is better than the empirical methods. However, existing learning-based methods have several limitations. Firstly, they focus on estimating the cardinality, but cannot estimate the cost. Secondly, they are either too heavy or hard to represent complicated structures, e.g., complex predicates.

To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. We propose effective feature extraction and encoding techniques, which consider both queries and physical operations in feature extraction. We embed these features into our tree-structured model. We propose an effective method to encode string values, which can improve the generalization ability for predicate matching. As it is prohibitively expensive to enumerate all string values, we design a patten-based method, which selects patterns to cover string values and utilizes the patterns to embed string values. We conducted experiments on real-world datasets and experimental results showed that our method outperformed baselines.

References

Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 157(2):406 -- 427, 2009.Google ScholarDigital Library
M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In ICDE, pages 390--401, 2012.Google ScholarDigital Library
A. Caprara, P. Toth, and M. Fischetti. Algorithms for the set covering problem. Annals of Operations Research, 98(1):353--371, 2000.Google ScholarCross Ref
R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.Google ScholarDigital Library
M. Durand and P. Flajolet. Loglog counting of large cardinalities. In Algorithms - ESA, pages 605--617, 2003.Google ScholarCross Ref
P. Flajolet, E. Fusy, O. Gandouet, and et al. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In AOFA, 2007.Google Scholar
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182--209, 1985.Google ScholarDigital Library
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592--603, 2009.Google ScholarDigital Library
S. Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317--330, 2011.Google ScholarDigital Library
S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples. Commun. ACM, 55(8):97--105, 2012.Google ScholarDigital Library
Y. E. Ioannidis. The history of histograms (abridged). In PVLDB, pages 19--30, 2003.Google ScholarCross Ref
A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR, 2019.Google Scholar
V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.Google ScholarDigital Library
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality estimation done right: Index-based join sampling. In CIDR, 2017.Google Scholar
G. Li, X. Zhou, and S. Li. Xuanyuan: An ai-native database. Data Engineering, page 70, 2019.Google Scholar
G. Li, X. Zhou, S. Li, and B. Gao. Qtune: A query-aware database tuning system with deep reinforcement learning. PVLDB, 12(12):2118--2130, 2019.Google ScholarDigital Library
J. Li, A. C. König, V. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for sql queries using statistical techniques. PVLDB, 5(11):1555--1566, 2012.Google ScholarDigital Library
R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampling. SIGMOD Rec., 19(2):1--11, May 1990.Google ScholarDigital Library
F. Liu and S. Blanas. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In SoCC, pages 153--166, 2015.Google ScholarDigital Library
G. Lohman. Is query optimization a "solved" problem?, 2014.Google Scholar
T. Malik, R. C. Burns, and N. V. Chawla. A black-box approach to query cardinality estimation. In CIDR, pages 56--67, 2007.Google Scholar
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. CoRR, abs/1904.03711, 2019.Google Scholar
R. Marcus and O. Papaemmanouil. Plan-structured deep neural network models for query performance prediction. CoRR, abs/1902.00132, 2019.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.Google ScholarDigital Library
T. Neumann, V. Leis, and A. Kemper. The complete story of joins (in hyper). In BTW, pages 31--50, 2017.Google Scholar
F. Olken and D. Rotem. Random sampling from database files: A survey. In Statistical and Scientific Database Management, 1990.Google ScholarCross Ref
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning state representations for query optimization with deep reinforcement learning. In DEEM@SIGMOD, pages 4:1--4:4, 2018.Google Scholar
K. Whang, B. T. V. Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2):208--229, 1990.Google ScholarDigital Library
W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacigumus, and J. F. Naughton. Predicting query execution time: Are optimizer cost models really unusable? In ICDE, pages 1081--1092, 2013.Google Scholar
W. Wu, J. F. Naughton, and H. Singh. Sampling-based query re-optimization. In SIGMOD, pages 1721--1736, 2016.Google ScholarDigital Library
Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Selectivity estimation with deep likelihood models. CoRR, abs/1905.04278, 2019.Google Scholar
X. Yu, G. Li, C. Chai, and N. Tang. Reinforcement learning with tree-lstm for join order selection. ICDE, 2020.Google Scholar
J. Zhang, Y. Liu, K. Zhou, and G. Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In SIGMOD, pages 415--432, 2019.Google ScholarDigital Library
N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical learning techniques for costing xml queries. In PVLDB, pages 289--300, 2005.Google Scholar

Index Terms

An end-to-end learning-based cost estimator
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation
PACMMOD

Fast query execution requires learning-based cardinality estimators to have short inference time (as model inference time adds to end-to-end query execution time) and high estimation accuracy (which is crucial for finding good execution plan). However, ...
Read More
On the end-performance metric estimator selection

It is well known that appropriately biasing an estimator can potentially lead to a lower mean square error (MSE) than the achievable MSE within the class of unbiased estimators. Nevertheless, the choice of an appropriate bias is generally unclear and ...
Read More
D²F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification
Abstract
Recently, two-stream networks with multi-modality inputs have shown to be of vital importance for state-of-the-art video understanding. Previous deep systems typically employ a late fusion strategy, however, despite its simplicity and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 13, Issue 3
November 2019
195 pages
ISSN:2150-8097
Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 November 2019
Published in pvldb Volume 13, Issue 3
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 68
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)77
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An end-to-end learning-based cost estimator

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation

On the end-performance metric estimator selection

D²F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An end-to-end learning-based cost estimator

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation

On the end-performance metric estimator selection

D2F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

D²F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification