Smooth Scan: robust access path selection without cardinality estimation

Borovica-Gajic, Renata; Idreos, Stratos; Ailamaki, Anastasia; Zukowski, Marcin; Fraser, Campbell

doi:10.1007/s00778-018-0507-8

Smooth Scan: robust access path selection without cardinality estimation

Regular Paper
Published: 29 May 2018

Volume 27, pages 521–545, (2018)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Renata Borovica-Gajic ORCID: orcid.org/0000-0003-3503-4123¹,
Stratos Idreos²,
Anastasia Ailamaki³,
Marcin Zukowski⁴ &
…
Campbell Fraser⁵

575 Accesses
11 Citations
4 Altmetric
Explore all metrics

Abstract

Query optimizers depend heavily on statistics representing column distributions to create good query plans. In many cases, though, statistics are outdated or nonexistent, and the process of refreshing statistics is very expensive, especially for ad hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The core of the problem is the fixed decision on the type of physical operators that comprise a query plan. This paper makes a case for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the observed statistical properties of the data at run time. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no optimization decisions on the access paths up front. Additionally, by depending only on the result distribution and eschewing statistics and cardinality estimates altogether, Smooth Scan ensures repeatable execution across multiple query invocations. Smooth Scan implemented in PostgreSQL demonstrates robust, near-optimal performance on micro-benchmarks and real-life workloads, while being statistics oblivious at the same time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 12

Stratified random sampling from streaming and stored data

Article 23 October 2020

Trong Duc Nguyen, Ming-Hung Shih, … Bojian Xu

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Antonios Makris, Konstantinos Tserpes, … Dimosthenis Anagnostopoulos

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

Marios Fragkoulis, Paris Carbone, … Asterios Katsifodimos

Notes

We use the term cardinality value as the value derived from the optimizer’s estimate on result selectivity, i.e., \(card\_value=|T|*{selectivity}\), where |T| is the number of tuples in a relation and selectivity is a selectivity factor [73].
128 KB block requests were consecutive up to 16 MB of size, which is the maximum expansion region of Smooth Scan.

References

Abdel Kader, R., Boncz, P., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD (2009)
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: SIGMOD (1999)
Amsaleg, L., Franklin, M.J., Tomasic, A., Urhan, T.: Scrambling query plans to cope with unexpected delays. In: DIS (1996)
Antoshenkov, G.: Dynamic query optimization in Rdb/VMS. In: ICDE (1993)
Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle Rdb. PVLDB 5(4), 229–237 (1996)
Google Scholar
Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)
Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD (2005)
Babu, S., Bizarro, P.: Adaptive query processing in the looking glass. In: CIDR (2005)
Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: SIGMOD (2005)
Barber, R., Bendel, P., Czech, M., Draese, O., Ho, F., Hrle, N., Idreos, S., Kim, M., Koeth, O., Lee, J., Li, T.T., Lohman, G.M., Morfonios, K., Müller, R., Murthy, K., Pandis, I., Qiao, L., Raman, V., Szabo, S., Sidle, R., Stolze, K.: Blink: not your father’s database! In: BIRTE (2011)
Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in oracle. PVLDB 6(11), 1102–1113 (2013)
Google Scholar
Bizarro, P., Babu, S., DeWitt, D., Widom, J.: Content-based routing: different plans for different data. In: PVLDB, pp. 757–768 (2005)
Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC (2013)
Borovica, R., Alagiannis, I., Ailamaki, A.: Automated physical designers: what you see is (not) what you get. In: DBTest (2012)
Borovica-Gajic, R.: Toward timely, predictable and cost-effective data analytics. Ph.D. thesis (2016)
Borovica-Gajic, R., Appuswamy, R., Ailamaki, A.: Cheap data analytics using cold storage devices. PVLDB 9(12), 1029–1040 (2016)
Google Scholar
Borovica-Gajic, R., Graefe, G., Lee, A.: Robust performance in database query processing (dagstuhl seminar 17222). Dagstuhl Rep. 7(5), 169–180 (2017)
Google Scholar
Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE (2015)
Bouganim, L., Fabret, F., Mohan, C., Valduriez, P.: Dynamic query scheduling in data integration systems. In: ICDE (2000)
Bruno, N., Chaudhuri, S.: Efficient creation of statistics over query expressions. In: ICDE (2003)
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A multidimensional workload-aware histogram. In: SIGMOD (2001)
Cao, L., Rundensteiner, E.A.: High performance stream query processing with correlation-aware partitioning. PVLDB 7(4), 265–276 (2013)
Google Scholar
Cao, L.B.P.: Web caching and Zipf-like distributions: evidence and implications. In: INFOCOM (1999)
Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD (2009)
Chaudhuri, S., Narasayya, V., Ramamurthy, R.: A pay-as-you-go framework for query execution feedback. PVLDB 1(1), 1141–1152 (2008)
Google Scholar
Chaudhuri, S., Narasayya, V.R.: Automating statistics management for query optimizers. In: ICDE (2000)
Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: SIGMOD (1994)
Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Inspector joins. In: VLDB (2005)
Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. TODS 9(2), 163–186 (1984)
Article MathSciNet MATH Google Scholar
Chu, F.: Least expected cost query optimization: What can we expect. In: Proceedings of the ACM Symposim on Principles of Database Systems, pp. 293–302 (2002)
Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: SIGMOD (1994)
Curino, C., Jones, E., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational cloud: a database service for the cloud. In: CIDR (2011)
D., H., Darera, P.N., Haritsa, J.R.: On the production of anorexic plan diagrams. In: VLDB (2007)
Harish, D., Darera, P.N., Haritsa, J.R.: Identifying robust plans through plan diagram reduction. PVLDB 1(1), 1124–1140 (2008)
Google Scholar
Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). https://doi.org/10.1561/1900000001
Article Google Scholar
DeWitt, D.J., Naughton, J.F., Burger, J.: Nested loops revisited. In: PDIS (1993)
Dutt, A., Haritsa, J.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD (2014)
Dutt, A., Narasayya, V.R., Chaudhuri, S.: Leveraging re-costing for online optimization of parameterized queries with guarantees. In: SIGMOD (2017)
Elhemali, M., Galindo-Legaria, C.A., Grabs, T., Joshi, M.M.: Execution strategies for SQL subqueries. In: SIGMOD (2007)
Eurviriyanukul, K., Paton, N.W., Fernandes, A.A.A., Lynden, S.J.: Adaptive join processing in pipelined plans. In: EDBT (2010)
Graefe, G.: Modern B-tree techniques. Found. Trends Databases 3(4), 203–402 (2011)
Article Google Scholar
Graefe, G.: New algorithms for join and grouping operations. Comput. Sci. 27(1), 3–27 (2012)
Google Scholar
Graefe, G., Guy, W., Kuno, H.A., Paulley, G.N.: Robust query processing (dagstuhl seminar 12321). Dagstuhl Rep. 2(8), 1–15 (2012)
Google Scholar
Graefe, G., König, A.C., Kuno, H.A., Markl, V., Sattler, K.U.: Robust query processing (dagstuhl seminar 10381). In: Robust Query Processing (2011)
Graefe, G., Kuno, H.A., Wiener, J.L.: Visualizing the robustness of query execution. In: CIDR (2009)
Graefe, G., Ward, K.: Dynamic query evaluation plans. In: SIGMOD (1989)
Gray, J.: Tape is dead, disk is tape, flash is disk. RAM locality is king, Presented at CIDR (2007)
Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD (1999)
Harris, L.: Stock price clustering and discreteness. Rev. Financ. Stud. 4(3), 389–415 (1991)
Article Google Scholar
Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive query processing: technology in evolution. IEEE Data Eng. Bull. 23, 2000 (2000)
Google Scholar
Herodotou, H., Babu, S.: Xplus: a SQL-tuning-aware query optimizer. PVLDB 3(1–2), 1149–1160 (2010)
Google Scholar
IBM: Managing big data for smart grids and smart meters. White paper. http://goo.gl/n1Ijtd (2012)
Ilyas, I.F., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD (2004)
Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)
Article Google Scholar
Ioannidis, Y.E., Ng, R.T., Shim, K., Sellis, T.K.: Parametric query optimization. PVLDB 6(2), 132–151 (1997)
Google Scholar
Ives, Z.G.: Efficient Query Processing for Data Integration. University of Washington, Seattle (2002)
Google Scholar
Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD (1999)
Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: SIGMOD (2004)
Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD (1998)
Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: should I scan or should I probe? In: SIGMOD (2017)
Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)
Google Scholar
Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE (2007)
Lohman, G.: Is query optimization a “solved” problem? In: ACM SIGMOD Blog (2014)
Mackert, L., Lohman, G.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD (1986)
Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., Cilimdzic, M.: Robust query processing through progressive optimization. In: SIGMOD (2004)
Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD (2015)
Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Self-tuning query mesh for adaptive multi-route query processing. In: EDBT, pp. 803–814 (2009)
Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: DBIS (2013)
Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)
MATH Google Scholar
Raman, V., Deshpande, A., Hellerstein, J.M.: Using state modules for adaptive query processing. In: ICDE (2003)
Schindler, J.: I/O characteristics of NoSQL databases. PVLDB 5(12), 2020–2021 (2012)
Google Scholar
Schindler, J.: Profiling and analyzing the I/O performance of NoSQL DBs. In: SIGMETRICS (2013)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD (1979)
Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: ISOMER: consistent histogram construction using query feedback. In: ICDE (2006)
Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO–DB2’s learning optimizer. In: VLDB (2001)
TPC: TPC-H benchmark. http://www.tpc.org/tpch/
Urhan, T., Franklin, M.J., Amsaleg, L.: Cost-based query scrambling for initial delays. In: SIGMOD (1998)
Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB (2003)
Wilschut, A., Apers, P.: Dataflow query execution in a parallel main-memory environment. In: PDIS (1991)
Wong, E., Youssefi, K.: Decomposition—a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)
Article Google Scholar
Yin, S., Hameurlain, A., Morvan, F.: Robust query optimization methods with respect to estimation errors: a survey. SIGMOD Rec. 44(3), 25–36 (2015)
Article Google Scholar
Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: SIGMOD (2004)

Download references

Acknowledgements

We would like to thank the organizers of the Dagstuhl seminar 12321 on “Robust query processing” for the inspirational sessions and the introduction of robustness issues in query processing. In particular, we thank Goetz Graefe for his support throughout this work.

Author information

Authors and Affiliations

School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Renata Borovica-Gajic
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Stratos Idreos
School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne and RAW Labs, Lausanne, VD, Switzerland
Anastasia Ailamaki
Snowflake Computing, San Mateo, CA, USA
Marcin Zukowski
Google Inc., Seattle, WA, USA
Campbell Fraser

Authors

Renata Borovica-Gajic
View author publications
You can also search for this author in PubMed Google Scholar
Stratos Idreos
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Ailamaki
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Zukowski
View author publications
You can also search for this author in PubMed Google Scholar
Campbell Fraser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renata Borovica-Gajic.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (xlsx 6482 KB)

Supplementary material 2 (pptx 7283 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borovica-Gajic, R., Idreos, S., Ailamaki, A. et al. Smooth Scan: robust access path selection without cardinality estimation. The VLDB Journal 27, 521–545 (2018). https://doi.org/10.1007/s00778-018-0507-8

Download citation

Received: 03 July 2017
Revised: 11 May 2018
Accepted: 15 May 2018
Published: 29 May 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00778-018-0507-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smooth Scan: robust access path selection without cardinality estimation

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

MongoDB Vs PostgreSQL: A comparative study on performance aspects

A survey on the evolution of stream processing systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (xlsx 6482 KB)

Supplementary material 2 (pptx 7283 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Smooth Scan: robust access path selection without cardinality estimation

Abstract

Access this article

Similar content being viewed by others

Stratified random sampling from streaming and stored data

MongoDB Vs PostgreSQL: A comparative study on performance aspects

A survey on the evolution of stream processing systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (xlsx 6482 KB)

Supplementary material 2 (pptx 7283 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation