Skip to main content
Log in

Smooth Scan: robust access path selection without cardinality estimation

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Query optimizers depend heavily on statistics representing column distributions to create good query plans. In many cases, though, statistics are outdated or nonexistent, and the process of refreshing statistics is very expensive, especially for ad hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The core of the problem is the fixed decision on the type of physical operators that comprise a query plan. This paper makes a case for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the observed statistical properties of the data at run time. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no optimization decisions on the access paths up front. Additionally, by depending only on the result distribution and eschewing statistics and cardinality estimates altogether, Smooth Scan ensures repeatable execution across multiple query invocations. Smooth Scan implemented in PostgreSQL demonstrates robust, near-optimal performance on micro-benchmarks and real-life workloads, while being statistics oblivious at the same time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

Notes

  1. We use the term cardinality value as the value derived from the optimizer’s estimate on result selectivity, i.e., \(card\_value=|T|*{selectivity}\), where |T| is the number of tuples in a relation and selectivity is a selectivity factor [73].

  2. 128 KB block requests were consecutive up to 16 MB of size, which is the maximum expansion region of Smooth Scan.

References

  1. Abdel Kader, R., Boncz, P., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD (2009)

  2. Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: SIGMOD (1999)

  3. Amsaleg, L., Franklin, M.J., Tomasic, A., Urhan, T.: Scrambling query plans to cope with unexpected delays. In: DIS (1996)

  4. Antoshenkov, G.: Dynamic query optimization in Rdb/VMS. In: ICDE (1993)

  5. Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle Rdb. PVLDB 5(4), 229–237 (1996)

    Google Scholar 

  6. Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)

  7. Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD (2005)

  8. Babu, S., Bizarro, P.: Adaptive query processing in the looking glass. In: CIDR (2005)

  9. Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: SIGMOD (2005)

  10. Barber, R., Bendel, P., Czech, M., Draese, O., Ho, F., Hrle, N., Idreos, S., Kim, M., Koeth, O., Lee, J., Li, T.T., Lohman, G.M., Morfonios, K., Müller, R., Murthy, K., Pandis, I., Qiao, L., Raman, V., Szabo, S., Sidle, R., Stolze, K.: Blink: not your father’s database! In: BIRTE (2011)

  11. Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in oracle. PVLDB 6(11), 1102–1113 (2013)

    Google Scholar 

  12. Bizarro, P., Babu, S., DeWitt, D., Widom, J.: Content-based routing: different plans for different data. In: PVLDB, pp. 757–768 (2005)

  13. Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC (2013)

  14. Borovica, R., Alagiannis, I., Ailamaki, A.: Automated physical designers: what you see is (not) what you get. In: DBTest (2012)

  15. Borovica-Gajic, R.: Toward timely, predictable and cost-effective data analytics. Ph.D. thesis (2016)

  16. Borovica-Gajic, R., Appuswamy, R., Ailamaki, A.: Cheap data analytics using cold storage devices. PVLDB 9(12), 1029–1040 (2016)

    Google Scholar 

  17. Borovica-Gajic, R., Graefe, G., Lee, A.: Robust performance in database query processing (dagstuhl seminar 17222). Dagstuhl Rep. 7(5), 169–180 (2017)

    Google Scholar 

  18. Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE (2015)

  19. Bouganim, L., Fabret, F., Mohan, C., Valduriez, P.: Dynamic query scheduling in data integration systems. In: ICDE (2000)

  20. Bruno, N., Chaudhuri, S.: Efficient creation of statistics over query expressions. In: ICDE (2003)

  21. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A multidimensional workload-aware histogram. In: SIGMOD (2001)

  22. Cao, L., Rundensteiner, E.A.: High performance stream query processing with correlation-aware partitioning. PVLDB 7(4), 265–276 (2013)

    Google Scholar 

  23. Cao, L.B.P.: Web caching and Zipf-like distributions: evidence and implications. In: INFOCOM (1999)

  24. Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD (2009)

  25. Chaudhuri, S., Narasayya, V., Ramamurthy, R.: A pay-as-you-go framework for query execution feedback. PVLDB 1(1), 1141–1152 (2008)

    Google Scholar 

  26. Chaudhuri, S., Narasayya, V.R.: Automating statistics management for query optimizers. In: ICDE (2000)

  27. Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: SIGMOD (1994)

  28. Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Inspector joins. In: VLDB (2005)

  29. Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. TODS 9(2), 163–186 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  30. Chu, F.: Least expected cost query optimization: What can we expect. In: Proceedings of the ACM Symposim on Principles of Database Systems, pp. 293–302 (2002)

  31. Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: SIGMOD (1994)

  32. Curino, C., Jones, E., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational cloud: a database service for the cloud. In: CIDR (2011)

  33. D., H., Darera, P.N., Haritsa, J.R.: On the production of anorexic plan diagrams. In: VLDB (2007)

  34. Harish, D., Darera, P.N., Haritsa, J.R.: Identifying robust plans through plan diagram reduction. PVLDB 1(1), 1124–1140 (2008)

    Google Scholar 

  35. Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). https://doi.org/10.1561/1900000001

    Article  Google Scholar 

  36. DeWitt, D.J., Naughton, J.F., Burger, J.: Nested loops revisited. In: PDIS (1993)

  37. Dutt, A., Haritsa, J.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD (2014)

  38. Dutt, A., Narasayya, V.R., Chaudhuri, S.: Leveraging re-costing for online optimization of parameterized queries with guarantees. In: SIGMOD (2017)

  39. Elhemali, M., Galindo-Legaria, C.A., Grabs, T., Joshi, M.M.: Execution strategies for SQL subqueries. In: SIGMOD (2007)

  40. Eurviriyanukul, K., Paton, N.W., Fernandes, A.A.A., Lynden, S.J.: Adaptive join processing in pipelined plans. In: EDBT (2010)

  41. Graefe, G.: Modern B-tree techniques. Found. Trends Databases 3(4), 203–402 (2011)

    Article  Google Scholar 

  42. Graefe, G.: New algorithms for join and grouping operations. Comput. Sci. 27(1), 3–27 (2012)

    Google Scholar 

  43. Graefe, G., Guy, W., Kuno, H.A., Paulley, G.N.: Robust query processing (dagstuhl seminar 12321). Dagstuhl Rep. 2(8), 1–15 (2012)

    Google Scholar 

  44. Graefe, G., König, A.C., Kuno, H.A., Markl, V., Sattler, K.U.: Robust query processing (dagstuhl seminar 10381). In: Robust Query Processing (2011)

  45. Graefe, G., Kuno, H.A., Wiener, J.L.: Visualizing the robustness of query execution. In: CIDR (2009)

  46. Graefe, G., Ward, K.: Dynamic query evaluation plans. In: SIGMOD (1989)

  47. Gray, J.: Tape is dead, disk is tape, flash is disk. RAM locality is king, Presented at CIDR (2007)

  48. Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD (1999)

  49. Harris, L.: Stock price clustering and discreteness. Rev. Financ. Stud. 4(3), 389–415 (1991)

    Article  Google Scholar 

  50. Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive query processing: technology in evolution. IEEE Data Eng. Bull. 23, 2000 (2000)

    Google Scholar 

  51. Herodotou, H., Babu, S.: Xplus: a SQL-tuning-aware query optimizer. PVLDB 3(1–2), 1149–1160 (2010)

    Google Scholar 

  52. IBM: Managing big data for smart grids and smart meters. White paper. http://goo.gl/n1Ijtd (2012)

  53. Ilyas, I.F., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD (2004)

  54. Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)

    Article  Google Scholar 

  55. Ioannidis, Y.E., Ng, R.T., Shim, K., Sellis, T.K.: Parametric query optimization. PVLDB 6(2), 132–151 (1997)

    Google Scholar 

  56. Ives, Z.G.: Efficient Query Processing for Data Integration. University of Washington, Seattle (2002)

    Google Scholar 

  57. Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD (1999)

  58. Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: SIGMOD (2004)

  59. Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD (1998)

  60. Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: should I scan or should I probe? In: SIGMOD (2017)

  61. Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)

    Google Scholar 

  62. Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE (2007)

  63. Lohman, G.: Is query optimization a “solved” problem? In: ACM SIGMOD Blog (2014)

  64. Mackert, L., Lohman, G.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD (1986)

  65. Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., Cilimdzic, M.: Robust query processing through progressive optimization. In: SIGMOD (2004)

  66. Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD (2015)

  67. Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Self-tuning query mesh for adaptive multi-route query processing. In: EDBT, pp. 803–814 (2009)

  68. Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: DBIS (2013)

  69. Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)

    MATH  Google Scholar 

  70. Raman, V., Deshpande, A., Hellerstein, J.M.: Using state modules for adaptive query processing. In: ICDE (2003)

  71. Schindler, J.: I/O characteristics of NoSQL databases. PVLDB 5(12), 2020–2021 (2012)

    Google Scholar 

  72. Schindler, J.: Profiling and analyzing the I/O performance of NoSQL DBs. In: SIGMETRICS (2013)

  73. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD (1979)

  74. Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: ISOMER: consistent histogram construction using query feedback. In: ICDE (2006)

  75. Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO–DB2’s learning optimizer. In: VLDB (2001)

  76. TPC: TPC-H benchmark. http://www.tpc.org/tpch/

  77. Urhan, T., Franklin, M.J., Amsaleg, L.: Cost-based query scrambling for initial delays. In: SIGMOD (1998)

  78. Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB (2003)

  79. Wilschut, A., Apers, P.: Dataflow query execution in a parallel main-memory environment. In: PDIS (1991)

  80. Wong, E., Youssefi, K.: Decomposition—a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)

    Article  Google Scholar 

  81. Yin, S., Hameurlain, A., Morvan, F.: Robust query optimization methods with respect to estimation errors: a survey. SIGMOD Rec. 44(3), 25–36 (2015)

    Article  Google Scholar 

  82. Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: SIGMOD (2004)

Download references

Acknowledgements

We would like to thank the organizers of the Dagstuhl seminar 12321 on “Robust query processing” for the inspirational sessions and the introduction of robustness issues in query processing. In particular, we thank Goetz Graefe for his support throughout this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renata Borovica-Gajic.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (xlsx 6482 KB)

Supplementary material 2 (pptx 7283 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Borovica-Gajic, R., Idreos, S., Ailamaki, A. et al. Smooth Scan: robust access path selection without cardinality estimation. The VLDB Journal 27, 521–545 (2018). https://doi.org/10.1007/s00778-018-0507-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0507-8

Keywords

Navigation