skip to main content
research-article

An end-to-end learning-based cost estimator

Published:01 November 2019Publication History
Skip Abstract Section

Abstract

Cost and cardinality estimation is vital to query optimizer, which can guide the query plan selection. However traditional empirical cost and cardinality estimation techniques cannot provide high-quality estimation, because they may not effectively capture the correlation between multiple tables. Recently the database community shows that the learning-based cardinality estimation is better than the empirical methods. However, existing learning-based methods have several limitations. Firstly, they focus on estimating the cardinality, but cannot estimate the cost. Secondly, they are either too heavy or hard to represent complicated structures, e.g., complex predicates.

To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. We propose effective feature extraction and encoding techniques, which consider both queries and physical operations in feature extraction. We embed these features into our tree-structured model. We propose an effective method to encode string values, which can improve the generalization ability for predicate matching. As it is prohibitively expensive to enumerate all string values, we design a patten-based method, which selects patterns to cover string values and utilizes the patterns to embed string values. We conducted experiments on real-world datasets and experimental results showed that our method outperformed baselines.

References

  1. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 157(2):406 -- 427, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In ICDE, pages 390--401, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Caprara, P. Toth, and M. Fischetti. Algorithms for the set covering problem. Annals of Operations Research, 98(1):353--371, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Caruana. Multitask learning. Machine Learning, 28(1):41--75, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Durand and P. Flajolet. Loglog counting of large cardinalities. In Algorithms - ESA, pages 605--617, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Flajolet, E. Fusy, O. Gandouet, and et al. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In AOFA, 2007.Google ScholarGoogle Scholar
  7. P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182--209, 1985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592--603, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317--330, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples. Commun. ACM, 55(8):97--105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. E. Ioannidis. The history of histograms (abridged). In PVLDB, pages 19--30, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR, 2019.Google ScholarGoogle Scholar
  13. V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality estimation done right: Index-based join sampling. In CIDR, 2017.Google ScholarGoogle Scholar
  15. G. Li, X. Zhou, and S. Li. Xuanyuan: An ai-native database. Data Engineering, page 70, 2019.Google ScholarGoogle Scholar
  16. G. Li, X. Zhou, S. Li, and B. Gao. Qtune: A query-aware database tuning system with deep reinforcement learning. PVLDB, 12(12):2118--2130, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Li, A. C. König, V. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for sql queries using statistical techniques. PVLDB, 5(11):1555--1566, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampling. SIGMOD Rec., 19(2):1--11, May 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Liu and S. Blanas. Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In SoCC, pages 153--166, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Lohman. Is query optimization a "solved" problem?, 2014.Google ScholarGoogle Scholar
  21. T. Malik, R. C. Burns, and N. V. Chawla. A black-box approach to query cardinality estimation. In CIDR, pages 56--67, 2007.Google ScholarGoogle Scholar
  22. R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. CoRR, abs/1904.03711, 2019.Google ScholarGoogle Scholar
  23. R. Marcus and O. Papaemmanouil. Plan-structured deep neural network models for query performance prediction. CoRR, abs/1902.00132, 2019.Google ScholarGoogle Scholar
  24. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Neumann, V. Leis, and A. Kemper. The complete story of joins (in hyper). In BTW, pages 31--50, 2017.Google ScholarGoogle Scholar
  26. F. Olken and D. Rotem. Random sampling from database files: A survey. In Statistical and Scientific Database Management, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning state representations for query optimization with deep reinforcement learning. In DEEM@SIGMOD, pages 4:1--4:4, 2018.Google ScholarGoogle Scholar
  28. K. Whang, B. T. V. Zanden, and H. M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2):208--229, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacigumus, and J. F. Naughton. Predicting query execution time: Are optimizer cost models really unusable? In ICDE, pages 1081--1092, 2013.Google ScholarGoogle Scholar
  30. W. Wu, J. F. Naughton, and H. Singh. Sampling-based query re-optimization. In SIGMOD, pages 1721--1736, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Selectivity estimation with deep likelihood models. CoRR, abs/1905.04278, 2019.Google ScholarGoogle Scholar
  32. X. Yu, G. Li, C. Chai, and N. Tang. Reinforcement learning with tree-lstm for join order selection. ICDE, 2020.Google ScholarGoogle Scholar
  33. J. Zhang, Y. Liu, K. Zhou, and G. Li. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In SIGMOD, pages 415--432, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical learning techniques for costing xml queries. In PVLDB, pages 289--300, 2005.Google ScholarGoogle Scholar

Index Terms

  1. An end-to-end learning-based cost estimator
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 13, Issue 3
        November 2019
        195 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 November 2019
        Published in pvldb Volume 13, Issue 3

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader