skip to main content
article

Hardware-oblivious parallelism for in-memory column-stores

Published:01 July 2013Publication History
Skip Abstract Section

Abstract

The multi-core architectures of today's computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime. This design reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems.

We provide a proof-of-concept for this design by integrating operators written using the parallel programming framework OpenCL into the open-source database MonetDB. Following this approach, we achieve efficient, yet highly portable parallel code without the need for optimization by hand. We evaluated our implementation against MonetDB using TPC-H derived queries and observed a performance that rivals that of MonetDB's query execution on the CPU and surpasses it on the GPU. In addition, we show that the same set of operators runs nearly unchanged on a GPU, demonstrating the feasibility of our approach.

References

  1. Advanced Micro Devices. OpenCL Zone. http://developer.amd.com/resources/heterogeneous-computing/opencl-zone/, January 2013.Google ScholarGoogle Scholar
  2. D. A. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Amenta. Real-time parallel hashing on the gpu. In ACM SIGGRAPH Asia 2009 papers, SIGGRAPH Asia'09, pages 154:1-154:9, New York, NY, USA, 2009. ACM. Google ScholarGoogle Scholar
  3. D. A. F. Alcantara. Efficient Hash Tables on the GPU. PhD thesis, University of California, Davis, 2011. Google ScholarGoogle Scholar
  4. Altera Corporation. OpenCL for Altera FPGAs: Accelerating Performance and Design Productivity. http://www.altera.com/products/software/opencl/opencl-index.html, January 2013.Google ScholarGoogle Scholar
  5. C. Balkesen, J. Teubner, G. Alonso, and M. T. Ozsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. ETH Zurich, Systems Group, Tech. Rep, 2012.Google ScholarGoogle Scholar
  6. D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In Proceedings of the 1st ACM symposium on Cloud computing, pages 119-130. ACM, 2010. Google ScholarGoogle Scholar
  7. P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking The Memory Wall In MonetDB. Communications of the ACM, 51(12):77-85, December 2008. Google ScholarGoogle Scholar
  8. S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67-77, 2011. Google ScholarGoogle Scholar
  9. S. Breß, F. Beier, H. Rauhe, E. Schallehn, K.-U. Sattler, and G. Saake. Automatic selection of processing units for coprocessing in databases. In Advances in Databases and Information Systems, pages 57-70. Springer, 2012. Google ScholarGoogle Scholar
  10. N. Cascarano, P. Rolando, F. Risso, and R. Sisto. infant: Nfa pattern matching on gpgpu devices. SIGCOMM Comput. Commun. Rev., 40(5):20-26, Oct. 2010. Google ScholarGoogle Scholar
  11. M. M. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data parallel haskell: a status report. In Proceedings of the 2007 workshop on Declarative aspects of multicore programming, pages 10-18. ACM, 2007. Google ScholarGoogle Scholar
  12. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008. Google ScholarGoogle Scholar
  13. D. J. DeWitt. Direct - a multiprocessor organization for supporting relational data base management systems. In Proceedings of the 5th annual symposium on Computer architecture, ISCA'78, pages 182-189, New York, NY, USA, 1978. ACM. Google ScholarGoogle Scholar
  14. I. García, S. Lefebvre, S. Hornus, and A. Lasram. Coherent parallel hashing. In Proceedings of the 2011 SIGGRAPH Asia Conference, SA'11, pages 161:1-161:8, New York, NY, USA, 2011. ACM. Google ScholarGoogle Scholar
  15. B. Gold, A. Ailamaki, L. Huston, and B. Falsafi. Accelerating database operators using a network processor. In Proceedings of the 1st international workshop on Data management on new hardware, DaMoN'05, New York, NY, USA, 2005. ACM. Google ScholarGoogle Scholar
  16. N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics co-processor sorting for large database management. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, SIGMOD'06, pages 325-336, New York, NY, USA, 2006. ACM. Google ScholarGoogle Scholar
  17. N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, SIGMOD'04, pages 215-226, New York, NY, USA, 2004. ACM. Google ScholarGoogle Scholar
  18. B. He, N. K. Govindaraju, Q. Luo, and B. Smith. Efficient gather and scatter operations on graphics processors. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC'07, pages 46:1-46:12, New York, NY, USA, 2007. ACM. Google ScholarGoogle Scholar
  19. B. He, M. Lu, K. Yang, R. Fang, N. Govindaraju, Q. Luo, and P. Sander. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS), 34(4):21, 2009. Google ScholarGoogle Scholar
  20. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511-524. ACM, 2008. Google ScholarGoogle Scholar
  21. M. Heimel and V. Markl. A first step towards gpu-assisted query optimization. ADMS, 2012.Google ScholarGoogle Scholar
  22. P. Helluy. A portable implementation of the radix sort algorithm in opencl.Google ScholarGoogle Scholar
  23. S. Héman, N. Nes, M. Zukowski, and P. Boncz. Vectorized data processing on the cell broadband engine. In Proceedings of the 3rd international workshop on Data management on new hardware, page 4. ACM, 2007. Google ScholarGoogle Scholar
  24. D. Horn. GPU Gems 2nd Edition, chapter Stream reduction operations for GPGPU applications. Addision Wesley, 2005.Google ScholarGoogle Scholar
  25. M. Ivanova, M. Kersten, and F. Groffen. Just-in-time data distribution for analytical query processing. In Advances in Databases and Information Systems, pages 209-222. Springer, 2012. Google ScholarGoogle Scholar
  26. C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. Proceedings of the VLDB Endowment, 2(2):1378-1389, 2009. Google ScholarGoogle Scholar
  27. S. Lee, M. M. Chakravarty, V. Grover, and G. Keller. Gpu kernels as data-parallel array computations in haskell. In Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods, 2009.Google ScholarGoogle Scholar
  28. R. Mueller, J. Teubner, and G. Alonso. Data processing on fpgas. Proc. VLDB Endow., 2(1):910-921, Aug. 2009. Google ScholarGoogle Scholar
  29. C. Nvidia. Compute Unified Device Architecture Programming Guide. NVIDIA: Santa Clara, CA, 83:129, 2007.Google ScholarGoogle Scholar
  30. N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore gpus. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS'09, pages 1-10, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle Scholar
  31. N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore gpus. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS'09, pages 1-10, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle Scholar
  32. N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD'10, pages 351-362, New York, NY, USA, 2010. ACM. Google ScholarGoogle Scholar
  33. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, GH'07, pages 97-106, Aire-la-Ville, Switzerland, Switzerland, 2007. Eurographics Association. Google ScholarGoogle Scholar
  34. D. Singh and S. P. Engineer. Higher level programming abstractions for fpgas using opencl. In Workshop on Design Methods and Tools for FPGA-Based Acceleration of Scientific Computing, 2011.Google ScholarGoogle Scholar
  35. The Khronos Group Inc. OpenCL - the open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/, May 2011.Google ScholarGoogle Scholar
  36. Transaction Processing Performance Council. TPC-H. http://www.tpc.org/tpch/default.asp, May 2011.Google ScholarGoogle Scholar
  37. R. Wu, B. Zhang, M. Hsu, and Q. Chen. Gpu-accelerated predicate evaluation on column store. In Proceedings of the 11th international conference on Web-age information management, WAIM'10, pages 570-581, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle Scholar

Index Terms

  1. Hardware-oblivious parallelism for in-memory column-stores
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 9
        July 2013
        180 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 July 2013
        Published in pvldb Volume 6, Issue 9

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader