Abstract
Query optimizers depend heavily on statistics representing column distributions to create good query plans. In many cases, though, statistics are outdated or nonexistent, and the process of refreshing statistics is very expensive, especially for ad hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The core of the problem is the fixed decision on the type of physical operators that comprise a query plan. This paper makes a case for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the observed statistical properties of the data at run time. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no optimization decisions on the access paths up front. Additionally, by depending only on the result distribution and eschewing statistics and cardinality estimates altogether, Smooth Scan ensures repeatable execution across multiple query invocations. Smooth Scan implemented in PostgreSQL demonstrates robust, near-optimal performance on micro-benchmarks and real-life workloads, while being statistics oblivious at the same time.
Similar content being viewed by others
Notes
We use the term cardinality value as the value derived from the optimizer’s estimate on result selectivity, i.e., \(card\_value=|T|*{selectivity}\), where |T| is the number of tuples in a relation and selectivity is a selectivity factor [73].
128 KB block requests were consecutive up to 16 MB of size, which is the maximum expansion region of Smooth Scan.
References
Abdel Kader, R., Boncz, P., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD (2009)
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: SIGMOD (1999)
Amsaleg, L., Franklin, M.J., Tomasic, A., Urhan, T.: Scrambling query plans to cope with unexpected delays. In: DIS (1996)
Antoshenkov, G.: Dynamic query optimization in Rdb/VMS. In: ICDE (1993)
Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle Rdb. PVLDB 5(4), 229–237 (1996)
Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)
Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD (2005)
Babu, S., Bizarro, P.: Adaptive query processing in the looking glass. In: CIDR (2005)
Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: SIGMOD (2005)
Barber, R., Bendel, P., Czech, M., Draese, O., Ho, F., Hrle, N., Idreos, S., Kim, M., Koeth, O., Lee, J., Li, T.T., Lohman, G.M., Morfonios, K., Müller, R., Murthy, K., Pandis, I., Qiao, L., Raman, V., Szabo, S., Sidle, R., Stolze, K.: Blink: not your father’s database! In: BIRTE (2011)
Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in oracle. PVLDB 6(11), 1102–1113 (2013)
Bizarro, P., Babu, S., DeWitt, D., Widom, J.: Content-based routing: different plans for different data. In: PVLDB, pp. 757–768 (2005)
Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC (2013)
Borovica, R., Alagiannis, I., Ailamaki, A.: Automated physical designers: what you see is (not) what you get. In: DBTest (2012)
Borovica-Gajic, R.: Toward timely, predictable and cost-effective data analytics. Ph.D. thesis (2016)
Borovica-Gajic, R., Appuswamy, R., Ailamaki, A.: Cheap data analytics using cold storage devices. PVLDB 9(12), 1029–1040 (2016)
Borovica-Gajic, R., Graefe, G., Lee, A.: Robust performance in database query processing (dagstuhl seminar 17222). Dagstuhl Rep. 7(5), 169–180 (2017)
Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE (2015)
Bouganim, L., Fabret, F., Mohan, C., Valduriez, P.: Dynamic query scheduling in data integration systems. In: ICDE (2000)
Bruno, N., Chaudhuri, S.: Efficient creation of statistics over query expressions. In: ICDE (2003)
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A multidimensional workload-aware histogram. In: SIGMOD (2001)
Cao, L., Rundensteiner, E.A.: High performance stream query processing with correlation-aware partitioning. PVLDB 7(4), 265–276 (2013)
Cao, L.B.P.: Web caching and Zipf-like distributions: evidence and implications. In: INFOCOM (1999)
Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD (2009)
Chaudhuri, S., Narasayya, V., Ramamurthy, R.: A pay-as-you-go framework for query execution feedback. PVLDB 1(1), 1141–1152 (2008)
Chaudhuri, S., Narasayya, V.R.: Automating statistics management for query optimizers. In: ICDE (2000)
Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: SIGMOD (1994)
Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Inspector joins. In: VLDB (2005)
Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. TODS 9(2), 163–186 (1984)
Chu, F.: Least expected cost query optimization: What can we expect. In: Proceedings of the ACM Symposim on Principles of Database Systems, pp. 293–302 (2002)
Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: SIGMOD (1994)
Curino, C., Jones, E., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational cloud: a database service for the cloud. In: CIDR (2011)
D., H., Darera, P.N., Haritsa, J.R.: On the production of anorexic plan diagrams. In: VLDB (2007)
Harish, D., Darera, P.N., Haritsa, J.R.: Identifying robust plans through plan diagram reduction. PVLDB 1(1), 1124–1140 (2008)
Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). https://doi.org/10.1561/1900000001
DeWitt, D.J., Naughton, J.F., Burger, J.: Nested loops revisited. In: PDIS (1993)
Dutt, A., Haritsa, J.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD (2014)
Dutt, A., Narasayya, V.R., Chaudhuri, S.: Leveraging re-costing for online optimization of parameterized queries with guarantees. In: SIGMOD (2017)
Elhemali, M., Galindo-Legaria, C.A., Grabs, T., Joshi, M.M.: Execution strategies for SQL subqueries. In: SIGMOD (2007)
Eurviriyanukul, K., Paton, N.W., Fernandes, A.A.A., Lynden, S.J.: Adaptive join processing in pipelined plans. In: EDBT (2010)
Graefe, G.: Modern B-tree techniques. Found. Trends Databases 3(4), 203–402 (2011)
Graefe, G.: New algorithms for join and grouping operations. Comput. Sci. 27(1), 3–27 (2012)
Graefe, G., Guy, W., Kuno, H.A., Paulley, G.N.: Robust query processing (dagstuhl seminar 12321). Dagstuhl Rep. 2(8), 1–15 (2012)
Graefe, G., König, A.C., Kuno, H.A., Markl, V., Sattler, K.U.: Robust query processing (dagstuhl seminar 10381). In: Robust Query Processing (2011)
Graefe, G., Kuno, H.A., Wiener, J.L.: Visualizing the robustness of query execution. In: CIDR (2009)
Graefe, G., Ward, K.: Dynamic query evaluation plans. In: SIGMOD (1989)
Gray, J.: Tape is dead, disk is tape, flash is disk. RAM locality is king, Presented at CIDR (2007)
Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD (1999)
Harris, L.: Stock price clustering and discreteness. Rev. Financ. Stud. 4(3), 389–415 (1991)
Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive query processing: technology in evolution. IEEE Data Eng. Bull. 23, 2000 (2000)
Herodotou, H., Babu, S.: Xplus: a SQL-tuning-aware query optimizer. PVLDB 3(1–2), 1149–1160 (2010)
IBM: Managing big data for smart grids and smart meters. White paper. http://goo.gl/n1Ijtd (2012)
Ilyas, I.F., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD (2004)
Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)
Ioannidis, Y.E., Ng, R.T., Shim, K., Sellis, T.K.: Parametric query optimization. PVLDB 6(2), 132–151 (1997)
Ives, Z.G.: Efficient Query Processing for Data Integration. University of Washington, Seattle (2002)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD (1999)
Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: SIGMOD (2004)
Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD (1998)
Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: should I scan or should I probe? In: SIGMOD (2017)
Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)
Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE (2007)
Lohman, G.: Is query optimization a “solved” problem? In: ACM SIGMOD Blog (2014)
Mackert, L., Lohman, G.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD (1986)
Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., Cilimdzic, M.: Robust query processing through progressive optimization. In: SIGMOD (2004)
Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD (2015)
Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Self-tuning query mesh for adaptive multi-route query processing. In: EDBT, pp. 803–814 (2009)
Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: DBIS (2013)
Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)
Raman, V., Deshpande, A., Hellerstein, J.M.: Using state modules for adaptive query processing. In: ICDE (2003)
Schindler, J.: I/O characteristics of NoSQL databases. PVLDB 5(12), 2020–2021 (2012)
Schindler, J.: Profiling and analyzing the I/O performance of NoSQL DBs. In: SIGMETRICS (2013)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD (1979)
Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: ISOMER: consistent histogram construction using query feedback. In: ICDE (2006)
Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO–DB2’s learning optimizer. In: VLDB (2001)
TPC: TPC-H benchmark. http://www.tpc.org/tpch/
Urhan, T., Franklin, M.J., Amsaleg, L.: Cost-based query scrambling for initial delays. In: SIGMOD (1998)
Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB (2003)
Wilschut, A., Apers, P.: Dataflow query execution in a parallel main-memory environment. In: PDIS (1991)
Wong, E., Youssefi, K.: Decomposition—a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)
Yin, S., Hameurlain, A., Morvan, F.: Robust query optimization methods with respect to estimation errors: a survey. SIGMOD Rec. 44(3), 25–36 (2015)
Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: SIGMOD (2004)
Acknowledgements
We would like to thank the organizers of the Dagstuhl seminar 12321 on “Robust query processing” for the inspirational sessions and the introduction of robustness issues in query processing. In particular, we thank Goetz Graefe for his support throughout this work.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Borovica-Gajic, R., Idreos, S., Ailamaki, A. et al. Smooth Scan: robust access path selection without cardinality estimation. The VLDB Journal 27, 521–545 (2018). https://doi.org/10.1007/s00778-018-0507-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-018-0507-8