skip to main content
10.1145/3365109.3368789acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article

GPU Acceleration of Range Queries over Large Data Sets

Published:02 December 2019Publication History

ABSTRACT

Data management systems commonly use bitmap indices to increase the efficiency of querying scientific data. Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations. The processing of bitmap queries is inherently parallel in structure, which suggests they could benefit from concurrent computer systems. In particular, bitmap-range queries offer a highly parallel computational problem, and the hardware features of graphics processing units (GPUs) offer an alluring platform for accelerating their execution. In this paper, we present three GPU algorithms and one CPU based algorithm for the parallel execution of bitmap-range queries. We show that in 95% of our tests, using real and synthetic data, the GPU algorithms greatly outperform the parallel CPU algorithm. For these tests, the GPU algorithms provide up to 87.7× speedup and an average speedup of 30.22× over the parallel CPU algorithm. In addition to enhancing performance, augmenting traditional bitmap query systems with GPUs to offload bitmap query processing allows the CPU to process other requests.

References

  1. Witold Andrzejewski and Robert Wrembel. 2010. GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid. In International Conference on Database and Expert Systems Applications . Springer, Berlin,Heidelberg, 315--329.Google ScholarGoogle ScholarCross RefCross Ref
  2. Witold Andrzejewski and Robert Wrembel. 2011. GPU-PLWAH: GPU-based implementation of the PLWAH algorithm for compressing bitmaps. Control and cybernetics , Vol. 40 (2011), 627--650.Google ScholarGoogle Scholar
  3. Gennady Antoshenkov. 1995. Byte-aligned bitmap compression. In Proceedings DCC'95 Data Compression Conference . IEEE, 476.Google ScholarGoogle ScholarCross RefCross Ref
  4. Peter Bailey, Joseph Myre, Stuart DC Walsh, David J Lilja, and Martin O Saar. 2009. Accelerating lattice Boltzmann fluid flow simulations using graphics processors. In 2009 International Conference on Parallel Processing. IEEE, 550--557.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter Bakkum and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3). ACM, New York, NY, USA, 94--103. https://doi.org/10.1145/1735688.1735706Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nathan Bell and Jared Hoberock. 2012. Thrust: A productivity-oriented library for CUDA . In GPU computing gems Jade edition . Elsevier, 359--371.Google ScholarGoogle ScholarCross RefCross Ref
  7. Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016. Better Bitmap Performance with Roaring Bitmaps. Softw. Pract. Exper. , Vol. 46, 5 (May 2016), 709--719.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, and Rob D. Ryne. 2011. Parallel Index and Query for Large Scale Data Analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). 30:1--30:11.Google ScholarGoogle Scholar
  9. Alessandro Colantonio and Roberto Di Pietro. 2010. Concise: Compressed 'n' Composable Integer Set. Inform. Process. Lett. , Vol. 110, 16 (2010), 644--650.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fabian Corrales, David Chiu, and Jason Sawin. 2011. Variable Length Compression for Bitmap Indices. In Database and Expert Systems Applications , , Abdelkader Hameurlain, Stephen W. Liddle, Klaus-Dieter Schewe, and Xiaofang Zhou (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 381--395.Google ScholarGoogle Scholar
  11. C CUDA. 2019. Best practice guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guideGoogle ScholarGoogle Scholar
  12. Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering , Vol. 5, 1 (1998), 46--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Franccois Deliège and Torben Bach Pedersen. 2010. Position List Word Aligned Hybrid: Optimizing Space and Performance for Compressed Bitmaps. In International Conference on Extending Database Technology (EDBT '10). 228--239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bin Dong, Surendra Byna, and Kesheng Wu. 2014. Parallel query evaluation as a Scientific Data Service. 2014 IEEE International Conference on Cluster Computing (CLUSTER) (2014), 194--202.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating Numerical Dense Linear Algebra Calculations with GPUs . Numerical Computations with GPUs (2014), 1--26.Google ScholarGoogle Scholar
  16. Francesco Fusco, Marc Ph. Stoecklin, and Michail Vlachos. 2010. Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic. VLDB , Vol. 3, 2 (2010), 1382--1393.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Francesco Fusco, Michail Vlachos, Xenofontas Dimitropoulos, and Luca Deri. 2013. Indexing Million of Packets Per Second Using GPUs. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC '13). 327--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, and Kenneth I. Joy. 2009. Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures. In Scientific and Statistical Database Management , , Marianne Winslett (Ed.). 110--129.Google ScholarGoogle Scholar
  19. Gheorghi Guzun, Guadalupe Canahuate, David Chiu, and Jason Sawin. 2014. A tunable compression framework for bitmap indices. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 484--495.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Haas, T. Karnagel , O. Arnold, E. Laux , B. Schlegel, G. Fettweis, and W. Lehner. 2016. HW/SW-database-codesign for compressed bitmap index processing. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 50--57.Google ScholarGoogle Scholar
  21. Max Heimel and Volker Markl. 2012. In Proceedings of VLDB . 33--44.Google ScholarGoogle Scholar
  22. Jinwoong Kim, Sul-Gi Kim, and Beomseok Nam. 2013. Parallel Multi-dimensional Range Query Processing with R-trees on GPU. J. Parallel Distrib. Comput. , Vol. 73, 8 (2013), 1195--1207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Victor W Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, et almbox. 2010. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU . ACM SIGARCH computer architecture news , Vol. 38, 3 (2010), 451--460.Google ScholarGoogle Scholar
  24. M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  25. Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory , Vol. 28, 2 (1982), 129--137.Google ScholarGoogle Scholar
  26. Duane Merrill. 2016. Cub: Cuda unbound. URL: http://nvlabs. github. io/cub (2016).Google ScholarGoogle Scholar
  27. Joseph Myre, Stuart DC Walsh, D Lilja, and Martin O Saar. 2011. Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters. Concurrency and Computation: Practice and Experience , Vol. 23, 4 (2011), 332--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Nguyen, T. Hoang , H. Nguyen, K. Inoue, and C. Pham. 2018. An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation. IEEE Access , Vol. 6 (2018), 16046--16059.Google ScholarGoogle ScholarCross RefCross Ref
  29. Ray P. Norris. 2010. Data Challenges for Next-generation Radio Telescopes. In Proceedings of the 2010 Sixth IEEE International Conference on e-Science Workshops (E-SCIENCEW '10). IEEE, 21--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Murat Sariyar, Andreas Borg, and Klaus Pommerening. 2011. Controlling false match rates in record linkage using extreme value theory. Journal of Biomedical Informatics , Vol. 44, 4 (2011), 648--654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Su, G. Agrawal , and J. Woodring. 2012. Indexing and Parallel Query Processing Support for Visualizing Climate Datasets. In 2012 41st International Conference on Parallel Processing. IEEE, 249--258.Google ScholarGoogle Scholar
  32. Stanimire Tomov, Jack Dongarra, and Marc Baboulin. 2010a. Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. , Vol. 36, 5--6 (2010), 232--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra. 2010b. Dense linear algebra solvers for multicore with GPU accelerators. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  34. Nhat-Phuong Tran, Myungho Lee, and Dong Hoon Choi. 2015. Memory-efficient parallelization of 3D lattice Boltzmann flow solver on a GPU. In 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 315--324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sebastiaan J. van Schaik and Oege de Moor. 2011. A Memory Efficient Reachability Data Structure Through Bit Vector Compression. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). 913--924.Google ScholarGoogle Scholar
  36. Stuart DC Walsh, Martin O Saar, Peter Bailey, and David J Lilja. 2009. Accelerating geoscience and engineering system simulations on graphics hardware. Computers & Geosciences , Vol. 35, 12 (2009), 2353--2364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nicolas Weber and Michael Goesele. 2017. MATOG: array layout auto-tuning for CUDA . ACM Transactions on Architecture and Code Optimization (TACO) , Vol. 14, 3 (2017), 28.Google ScholarGoogle Scholar
  38. Kesheng Wu, Ekow J Otoo, and Arie Shoshani. 2002. Compressing bitmap indexes for faster search operations. In Proceedings 14th International Conference on Scientific and Statistical Database Management. IEEE, 99--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. , Vol. 31, 1 (2006), 1--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. 2001. Notes on design and implementation of compressed bit vectors . Technical Report LBNL/PUB-3161. Lawrence Berkeley National Laboratory.Google ScholarGoogle Scholar
  41. Beytullah Yildiz, Kesheng Wu, Suren Byna, and Arie Shoshani. 2019. Parallel membership queries on very large scientific data sets using bitmap indexes. Concurrency and Computation: Practice and Experience (2019), e5157.Google ScholarGoogle Scholar

Index Terms

  1. GPU Acceleration of Range Queries over Large Data Sets

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
        December 2019
        174 pages
        ISBN:9781450370165
        DOI:10.1145/3365109

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 December 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate27of93submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader