research-article

Building efficient query engines in a high-level language

Authors:
Yannis Klonatos

École Polytechnique Fédérale de Lausanne (EPFL)

École Polytechnique Fédérale de Lausanne (EPFL)
View Profile

,
Christoph Koch

École Polytechnique Fédérale de Lausanne (EPFL)

École Polytechnique Fédérale de Lausanne (EPFL)
View Profile

,
Tiark Rompf

École Polytechnique Fédérale de Lausanne (EPFL)

École Polytechnique Fédérale de Lausanne (EPFL)
View Profile

,
Hassan Chafi

Oracle Labs

Oracle Labs
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 10pp 853–864https://doi.org/10.14778/2732951.2732959

Published:01 June 2014Publication History

Proceedings of the VLDB Endowment

Abstract

In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine.

We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines.

References

D. J. Abadi, S. Madden, and N. Hachem. Column stores vs. Row stores: How Different Are They Really? In ACM SIGMOD, pages 967--980, 2008. Google ScholarDigital Library
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Comm. ACM, 24(10):632--646, 1981. Google ScholarDigital Library
F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA: data management for modern business applications. SIGMOD Record, 40(4):45--51, 2012. Google ScholarDigital Library
G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120--135, 1994. Google ScholarDigital Library
R. Greer. Daytona and the fourth-generation language Cymbal. In ACM SIGMOD, pages 525--526, 1999. Google ScholarDigital Library
L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible Query Processing in Starburst. In ACM SIGMOD, pages 377--388, 1989. Google ScholarDigital Library
S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487--498, 2006. Google ScholarDigital Library
G. C. Hunt and J. R. Larus. Singularity: Rethinking the Software Stack. SIGOPS Oper. Syst. Rev., 41(2):37--49, 2007. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008. Google ScholarDigital Library
C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.Google Scholar
C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.Google Scholar
K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarCross Ref
C. Lattner. LLVM: An Infrastructure for Multi-Stage Optimization. http://llvm.org/.Google Scholar
S. Manegold, M. L. Kersten, and P. Boncz. Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB, 2(2):1648--1653, 2009. Google ScholarDigital Library
T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011. Google ScholarDigital Library
M. Odersky and M. Zenger. Scalable Component Abstractions. In OOPSLA, pages 41--57, 2005. Google ScholarDigital Library
Oracle Corporation. TimesTen Database Architecture. http://download.oracle.com/otn_hosted_doc/timesten/603/TimesTen-Documentation/arch.pdf.Google Scholar
S. Padmanabhan, T. Malkemus, A. Jhingran, and R. Agarwal. Block oriented processing of relational database operations in modern computer architectures. In ICDE, pages 567--574, 2001. Google ScholarDigital Library
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, pages 60--69, 2008. Google ScholarDigital Library
J. Rao, H. Pirahesh, C. Mohan, and G. Lohman. Compiled Query Execution Engine using JVM. In ICDE, pages 23--, 2006. Google ScholarDigital Library
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Generative Programming and Component Engineering, pages 127--136, 2010. http://scala-lms.github.io/. Google ScholarDigital Library
T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In POPL, pages 497--510, 2013. Google ScholarDigital Library
J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In DaMoN, pages 33--40, 2011. Google ScholarDigital Library
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, pages 553--564, 2005. Google ScholarDigital Library
M. Stonebraker and U. Cetintemel. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, pages 2--11, 2005. Google ScholarDigital Library
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In VLDB, pages 1150--1160, 2007. Google ScholarDigital Library
W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000. Google ScholarDigital Library
Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.Google Scholar
B. M. Zane, J. P. Ballard, F. D. Hinshaw, D. A. Kirkpatrick, and L. Premanand Yerabothu. Optimized SQL code generation, 2008. US Patent 7430549 B2.Google Scholar
R. Zhang, S. Debray, and R. T. Snodgrass. Micro-specialization: dynamic code specialization of database management systems. In Code Generation and Optimization, pages 63--73, 2012. Google ScholarDigital Library
R. Zhang, R. Snodgrass, and S. Debray. Application of Micro-specialization to Query Evaluation Operators. In ICDE Workshops, pages 315--321, 2012. Google ScholarDigital Library
R. Zhang, R. Snodgrass, and S. Debray. Micro-Specialization in DBMSes. In ICDE, pages 690--701, 2012. Google ScholarDigital Library
M. Zukowski, P. A. Boncz, N. Nes, and S. HÃl'man. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., (2):17--22, 2005.Google Scholar

Index Terms

Building efficient query engines in a high-level language

Index terms have been assigned to the content through auto-classification.

Recommendations

Building Efficient Query Engines in a High-Level Language
Best of SIGMOD 2016 Papers and Regular Papers

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased ...
Read More
Source-Level Compiler Optimizations for High-Level Synthesis
SEEDA-CECNSM '16: Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference

With high-level synthesis becoming the preferred method for hardware design, tools that operate on high-level programming languages and optimize hardware output are crucial for successful synthesis. In high-level synthesis, conventional programming ...
Read More
Interactive High-Level Language Direct-Execution Microprocessor System

It is our habit in writing an English composition that, as we write each word, each phrase, each sentence, and each paragraph, we consciously or unconsciously check the syntax and the semantics of the composition just written. Writing a computer program ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 7, Issue 10
June 2014
146 pages
ISSN:2150-8097
Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 June 2014
Published in pvldb Volume 7, Issue 10
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 495
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Building efficient query engines in a high-level language

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Building Efficient Query Engines in a High-Level Language

Source-Level Compiler Optimizations for High-Level Synthesis

Interactive High-Level Language Direct-Execution Microprocessor System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Building efficient query engines in a high-level language

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Building Efficient Query Engines in a High-Level Language

Source-Level Compiler Optimizations for High-Level Synthesis

Interactive High-Level Language Direct-Execution Microprocessor System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media