ABSTRACT
Data management systems commonly use bitmap indices to increase the efficiency of querying scientific data. Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations. The processing of bitmap queries is inherently parallel in structure, which suggests they could benefit from concurrent computer systems. In particular, bitmap-range queries offer a highly parallel computational problem, and the hardware features of graphics processing units (GPUs) offer an alluring platform for accelerating their execution. In this paper, we present three GPU algorithms and one CPU based algorithm for the parallel execution of bitmap-range queries. We show that in 95% of our tests, using real and synthetic data, the GPU algorithms greatly outperform the parallel CPU algorithm. For these tests, the GPU algorithms provide up to 87.7× speedup and an average speedup of 30.22× over the parallel CPU algorithm. In addition to enhancing performance, augmenting traditional bitmap query systems with GPUs to offload bitmap query processing allows the CPU to process other requests.
- Witold Andrzejewski and Robert Wrembel. 2010. GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid. In International Conference on Database and Expert Systems Applications . Springer, Berlin,Heidelberg, 315--329.Google ScholarCross Ref
- Witold Andrzejewski and Robert Wrembel. 2011. GPU-PLWAH: GPU-based implementation of the PLWAH algorithm for compressing bitmaps. Control and cybernetics , Vol. 40 (2011), 627--650.Google Scholar
- Gennady Antoshenkov. 1995. Byte-aligned bitmap compression. In Proceedings DCC'95 Data Compression Conference . IEEE, 476.Google ScholarCross Ref
- Peter Bailey, Joseph Myre, Stuart DC Walsh, David J Lilja, and Martin O Saar. 2009. Accelerating lattice Boltzmann fluid flow simulations using graphics processors. In 2009 International Conference on Parallel Processing. IEEE, 550--557.Google ScholarDigital Library
- Peter Bakkum and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3). ACM, New York, NY, USA, 94--103. https://doi.org/10.1145/1735688.1735706Google ScholarDigital Library
- Nathan Bell and Jared Hoberock. 2012. Thrust: A productivity-oriented library for CUDA . In GPU computing gems Jade edition . Elsevier, 359--371.Google ScholarCross Ref
- Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016. Better Bitmap Performance with Roaring Bitmaps. Softw. Pract. Exper. , Vol. 46, 5 (May 2016), 709--719.Google ScholarDigital Library
- Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, and Rob D. Ryne. 2011. Parallel Index and Query for Large Scale Data Analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). 30:1--30:11.Google Scholar
- Alessandro Colantonio and Roberto Di Pietro. 2010. Concise: Compressed 'n' Composable Integer Set. Inform. Process. Lett. , Vol. 110, 16 (2010), 644--650.Google ScholarDigital Library
- Fabian Corrales, David Chiu, and Jason Sawin. 2011. Variable Length Compression for Bitmap Indices. In Database and Expert Systems Applications , , Abdelkader Hameurlain, Stephen W. Liddle, Klaus-Dieter Schewe, and Xiaofang Zhou (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 381--395.Google Scholar
- C CUDA. 2019. Best practice guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guideGoogle Scholar
- Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering , Vol. 5, 1 (1998), 46--55.Google ScholarDigital Library
- Franccois Deliège and Torben Bach Pedersen. 2010. Position List Word Aligned Hybrid: Optimizing Space and Performance for Compressed Bitmaps. In International Conference on Extending Database Technology (EDBT '10). 228--239.Google ScholarDigital Library
- Bin Dong, Surendra Byna, and Kesheng Wu. 2014. Parallel query evaluation as a Scientific Data Service. 2014 IEEE International Conference on Cluster Computing (CLUSTER) (2014), 194--202.Google ScholarCross Ref
- Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating Numerical Dense Linear Algebra Calculations with GPUs . Numerical Computations with GPUs (2014), 1--26.Google Scholar
- Francesco Fusco, Marc Ph. Stoecklin, and Michail Vlachos. 2010. Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic. VLDB , Vol. 3, 2 (2010), 1382--1393.Google ScholarDigital Library
- Francesco Fusco, Michail Vlachos, Xenofontas Dimitropoulos, and Luca Deri. 2013. Indexing Million of Packets Per Second Using GPUs. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC '13). 327--332.Google ScholarDigital Library
- Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, and Kenneth I. Joy. 2009. Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures. In Scientific and Statistical Database Management , , Marianne Winslett (Ed.). 110--129.Google Scholar
- Gheorghi Guzun, Guadalupe Canahuate, David Chiu, and Jason Sawin. 2014. A tunable compression framework for bitmap indices. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 484--495.Google ScholarCross Ref
- S. Haas, T. Karnagel , O. Arnold, E. Laux , B. Schlegel, G. Fettweis, and W. Lehner. 2016. HW/SW-database-codesign for compressed bitmap index processing. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 50--57.Google Scholar
- Max Heimel and Volker Markl. 2012. In Proceedings of VLDB . 33--44.Google Scholar
- Jinwoong Kim, Sul-Gi Kim, and Beomseok Nam. 2013. Parallel Multi-dimensional Range Query Processing with R-trees on GPU. J. Parallel Distrib. Comput. , Vol. 73, 8 (2013), 1195--1207.Google ScholarDigital Library
- Victor W Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, et almbox. 2010. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU . ACM SIGARCH computer architecture news , Vol. 38, 3 (2010), 451--460.Google Scholar
- M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory , Vol. 28, 2 (1982), 129--137.Google Scholar
- Duane Merrill. 2016. Cub: Cuda unbound. URL: http://nvlabs. github. io/cub (2016).Google Scholar
- Joseph Myre, Stuart DC Walsh, D Lilja, and Martin O Saar. 2011. Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters. Concurrency and Computation: Practice and Experience , Vol. 23, 4 (2011), 332--350.Google ScholarDigital Library
- X. Nguyen, T. Hoang , H. Nguyen, K. Inoue, and C. Pham. 2018. An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation. IEEE Access , Vol. 6 (2018), 16046--16059.Google ScholarCross Ref
- Ray P. Norris. 2010. Data Challenges for Next-generation Radio Telescopes. In Proceedings of the 2010 Sixth IEEE International Conference on e-Science Workshops (E-SCIENCEW '10). IEEE, 21--24.Google ScholarDigital Library
- Murat Sariyar, Andreas Borg, and Klaus Pommerening. 2011. Controlling false match rates in record linkage using extreme value theory. Journal of Biomedical Informatics , Vol. 44, 4 (2011), 648--654.Google ScholarDigital Library
- Y. Su, G. Agrawal , and J. Woodring. 2012. Indexing and Parallel Query Processing Support for Visualizing Climate Datasets. In 2012 41st International Conference on Parallel Processing. IEEE, 249--258.Google Scholar
- Stanimire Tomov, Jack Dongarra, and Marc Baboulin. 2010a. Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. , Vol. 36, 5--6 (2010), 232--240.Google ScholarDigital Library
- Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra. 2010b. Dense linear algebra solvers for multicore with GPU accelerators. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW). IEEE, 1--8.Google ScholarCross Ref
- Nhat-Phuong Tran, Myungho Lee, and Dong Hoon Choi. 2015. Memory-efficient parallelization of 3D lattice Boltzmann flow solver on a GPU. In 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 315--324.Google ScholarDigital Library
- Sebastiaan J. van Schaik and Oege de Moor. 2011. A Memory Efficient Reachability Data Structure Through Bit Vector Compression. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). 913--924.Google Scholar
- Stuart DC Walsh, Martin O Saar, Peter Bailey, and David J Lilja. 2009. Accelerating geoscience and engineering system simulations on graphics hardware. Computers & Geosciences , Vol. 35, 12 (2009), 2353--2364.Google ScholarDigital Library
- Nicolas Weber and Michael Goesele. 2017. MATOG: array layout auto-tuning for CUDA . ACM Transactions on Architecture and Code Optimization (TACO) , Vol. 14, 3 (2017), 28.Google Scholar
- Kesheng Wu, Ekow J Otoo, and Arie Shoshani. 2002. Compressing bitmap indexes for faster search operations. In Proceedings 14th International Conference on Scientific and Statistical Database Management. IEEE, 99--108.Google ScholarDigital Library
- Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. , Vol. 31, 1 (2006), 1--38.Google ScholarDigital Library
- K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. 2001. Notes on design and implementation of compressed bit vectors . Technical Report LBNL/PUB-3161. Lawrence Berkeley National Laboratory.Google Scholar
- Beytullah Yildiz, Kesheng Wu, Suren Byna, and Arie Shoshani. 2019. Parallel membership queries on very large scientific data sets using bitmap indexes. Concurrency and Computation: Practice and Experience (2019), e5157.Google Scholar
Index Terms
- GPU Acceleration of Range Queries over Large Data Sets
Recommendations
Parallel acceleration of CPU and GPU range queries over large data sets
AbstractData management systems commonly use bitmap indices to increase the efficiency of querying scientific data. Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations. The ...
Neural acceleration for GPU throughput processors
MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureGraphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application ...
Data parallel acceleration of decision support queries using Cell/BE and GPUs
CF '09: Proceedings of the 6th ACM conference on Computing frontiersDecision Support System (DSS) workloads are known to be one of the most time-consuming database workloads that processes large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessor. The topic addressed in this work ...
Comments