skip to main content
research-article

Building efficient query engines in a high-level language

Published:01 June 2014Publication History
Skip Abstract Section

Abstract

In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine.

We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines.

References

  1. D. J. Abadi, S. Madden, and N. Hachem. Column stores vs. Row stores: How Different Are They Really? In ACM SIGMOD, pages 967--980, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Comm. ACM, 24(10):632--646, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA: data management for modern business applications. SIGMOD Record, 40(4):45--51, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120--135, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Greer. Daytona and the fourth-generation language Cymbal. In ACM SIGMOD, pages 525--526, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible Query Processing in Starburst. In ACM SIGMOD, pages 377--388, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487--498, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. C. Hunt and J. R. Larus. Singularity: Rethinking the Software Stack. SIGOPS Oper. Syst. Rev., 41(2):37--49, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.Google ScholarGoogle Scholar
  11. C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.Google ScholarGoogle Scholar
  12. K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Lattner. LLVM: An Infrastructure for Multi-Stage Optimization. http://llvm.org/.Google ScholarGoogle Scholar
  14. S. Manegold, M. L. Kersten, and P. Boncz. Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB, 2(2):1648--1653, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Odersky and M. Zenger. Scalable Component Abstractions. In OOPSLA, pages 41--57, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Oracle Corporation. TimesTen Database Architecture. http://download.oracle.com/otn_hosted_doc/timesten/603/TimesTen-Documentation/arch.pdf.Google ScholarGoogle Scholar
  18. S. Padmanabhan, T. Malkemus, A. Jhingran, and R. Agarwal. Block oriented processing of relational database operations in modern computer architectures. In ICDE, pages 567--574, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, pages 60--69, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Rao, H. Pirahesh, C. Mohan, and G. Lohman. Compiled Query Execution Engine using JVM. In ICDE, pages 23--, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Generative Programming and Component Engineering, pages 127--136, 2010. http://scala-lms.github.io/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In POPL, pages 497--510, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In DaMoN, pages 33--40, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, pages 553--564, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Stonebraker and U. Cetintemel. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, pages 2--11, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In VLDB, pages 1150--1160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  29. B. M. Zane, J. P. Ballard, F. D. Hinshaw, D. A. Kirkpatrick, and L. Premanand Yerabothu. Optimized SQL code generation, 2008. US Patent 7430549 B2.Google ScholarGoogle Scholar
  30. R. Zhang, S. Debray, and R. T. Snodgrass. Micro-specialization: dynamic code specialization of database management systems. In Code Generation and Optimization, pages 63--73, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Zhang, R. Snodgrass, and S. Debray. Application of Micro-specialization to Query Evaluation Operators. In ICDE Workshops, pages 315--321, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Zhang, R. Snodgrass, and S. Debray. Micro-Specialization in DBMSes. In ICDE, pages 690--701, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Zukowski, P. A. Boncz, N. Nes, and S. HÃl'man. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., (2):17--22, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Building efficient query engines in a high-level language
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 7, Issue 10
          June 2014
          146 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 June 2014
          Published in pvldb Volume 7, Issue 10

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader