research-article

GPU Acceleration of Range Queries over Large Data Sets

Authors:
Mitchell Nelson

University of St. Thomas, St. Paul, MN, USA

University of St. Thomas, St. Paul, MN, USA
View Profile

,
Zachary Sorenson

University of St. Thomas, St. Paul, MN, USA

University of St. Thomas, St. Paul, MN, USA
View Profile

,
Joseph M. Myre

University of St. Thomas, St. Paul, MN, USA

University of St. Thomas, St. Paul, MN, USA
View Profile

,
Jason Sawin

University of St. Thomas, St. Paul, MN, USA

University of St. Thomas, St. Paul, MN, USA
View Profile

,
David Chiu

University of Puget Sound, Tacoma, WA, USA

University of Puget Sound, Tacoma, WA, USA
View Profile

BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and TechnologiesDecember 2019Pages 11–20https://doi.org/10.1145/3365109.3368789

Published:02 December 2019Publication History

BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

Pages 11–20

ABSTRACT

Data management systems commonly use bitmap indices to increase the efficiency of querying scientific data. Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations. The processing of bitmap queries is inherently parallel in structure, which suggests they could benefit from concurrent computer systems. In particular, bitmap-range queries offer a highly parallel computational problem, and the hardware features of graphics processing units (GPUs) offer an alluring platform for accelerating their execution. In this paper, we present three GPU algorithms and one CPU based algorithm for the parallel execution of bitmap-range queries. We show that in 95% of our tests, using real and synthetic data, the GPU algorithms greatly outperform the parallel CPU algorithm. For these tests, the GPU algorithms provide up to 87.7× speedup and an average speedup of 30.22× over the parallel CPU algorithm. In addition to enhancing performance, augmenting traditional bitmap query systems with GPUs to offload bitmap query processing allows the CPU to process other requests.

References

Witold Andrzejewski and Robert Wrembel. 2010. GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid. In International Conference on Database and Expert Systems Applications . Springer, Berlin,Heidelberg, 315--329.Google ScholarCross Ref
Witold Andrzejewski and Robert Wrembel. 2011. GPU-PLWAH: GPU-based implementation of the PLWAH algorithm for compressing bitmaps. Control and cybernetics , Vol. 40 (2011), 627--650.Google Scholar
Gennady Antoshenkov. 1995. Byte-aligned bitmap compression. In Proceedings DCC'95 Data Compression Conference . IEEE, 476.Google ScholarCross Ref
Peter Bailey, Joseph Myre, Stuart DC Walsh, David J Lilja, and Martin O Saar. 2009. Accelerating lattice Boltzmann fluid flow simulations using graphics processors. In 2009 International Conference on Parallel Processing. IEEE, 550--557.Google ScholarDigital Library
Peter Bakkum and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3). ACM, New York, NY, USA, 94--103. https://doi.org/10.1145/1735688.1735706Google ScholarDigital Library
Nathan Bell and Jared Hoberock. 2012. Thrust: A productivity-oriented library for CUDA . In GPU computing gems Jade edition . Elsevier, 359--371.Google ScholarCross Ref
Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016. Better Bitmap Performance with Roaring Bitmaps. Softw. Pract. Exper. , Vol. 46, 5 (May 2016), 709--719.Google ScholarDigital Library
Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, and Rob D. Ryne. 2011. Parallel Index and Query for Large Scale Data Analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). 30:1--30:11.Google Scholar
Alessandro Colantonio and Roberto Di Pietro. 2010. Concise: Compressed 'n' Composable Integer Set. Inform. Process. Lett. , Vol. 110, 16 (2010), 644--650.Google ScholarDigital Library
Fabian Corrales, David Chiu, and Jason Sawin. 2011. Variable Length Compression for Bitmap Indices. In Database and Expert Systems Applications , , Abdelkader Hameurlain, Stephen W. Liddle, Klaus-Dieter Schewe, and Xiaofang Zhou (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 381--395.Google Scholar
C CUDA. 2019. Best practice guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guideGoogle Scholar
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: An industry-standard API for shared-memory programming. Computing in Science & Engineering , Vol. 5, 1 (1998), 46--55.Google ScholarDigital Library
Franccois Deliège and Torben Bach Pedersen. 2010. Position List Word Aligned Hybrid: Optimizing Space and Performance for Compressed Bitmaps. In International Conference on Extending Database Technology (EDBT '10). 228--239.Google ScholarDigital Library
Bin Dong, Surendra Byna, and Kesheng Wu. 2014. Parallel query evaluation as a Scientific Data Service. 2014 IEEE International Conference on Cluster Computing (CLUSTER) (2014), 194--202.Google ScholarCross Ref
Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating Numerical Dense Linear Algebra Calculations with GPUs . Numerical Computations with GPUs (2014), 1--26.Google Scholar
Francesco Fusco, Marc Ph. Stoecklin, and Michail Vlachos. 2010. Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic. VLDB , Vol. 3, 2 (2010), 1382--1393.Google ScholarDigital Library
Francesco Fusco, Michail Vlachos, Xenofontas Dimitropoulos, and Luca Deri. 2013. Indexing Million of Packets Per Second Using GPUs. In Proceedings of the 2013 Conference on Internet Measurement Conference (IMC '13). 327--332.Google ScholarDigital Library
Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, and Kenneth I. Joy. 2009. Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures. In Scientific and Statistical Database Management , , Marianne Winslett (Ed.). 110--129.Google Scholar
Gheorghi Guzun, Guadalupe Canahuate, David Chiu, and Jason Sawin. 2014. A tunable compression framework for bitmap indices. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 484--495.Google ScholarCross Ref
S. Haas, T. Karnagel , O. Arnold, E. Laux , B. Schlegel, G. Fettweis, and W. Lehner. 2016. HW/SW-database-codesign for compressed bitmap index processing. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 50--57.Google Scholar
Max Heimel and Volker Markl. 2012. In Proceedings of VLDB . 33--44.Google Scholar
Jinwoong Kim, Sul-Gi Kim, and Beomseok Nam. 2013. Parallel Multi-dimensional Range Query Processing with R-trees on GPU. J. Parallel Distrib. Comput. , Vol. 73, 8 (2013), 1195--1207.Google ScholarDigital Library
Victor W Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, et almbox. 2010. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU . ACM SIGARCH computer architecture news , Vol. 38, 3 (2010), 451--460.Google Scholar
M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory , Vol. 28, 2 (1982), 129--137.Google Scholar
Duane Merrill. 2016. Cub: Cuda unbound. URL: http://nvlabs. github. io/cub (2016).Google Scholar
Joseph Myre, Stuart DC Walsh, D Lilja, and Martin O Saar. 2011. Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters. Concurrency and Computation: Practice and Experience , Vol. 23, 4 (2011), 332--350.Google ScholarDigital Library
X. Nguyen, T. Hoang , H. Nguyen, K. Inoue, and C. Pham. 2018. An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation. IEEE Access , Vol. 6 (2018), 16046--16059.Google ScholarCross Ref
Ray P. Norris. 2010. Data Challenges for Next-generation Radio Telescopes. In Proceedings of the 2010 Sixth IEEE International Conference on e-Science Workshops (E-SCIENCEW '10). IEEE, 21--24.Google ScholarDigital Library
Murat Sariyar, Andreas Borg, and Klaus Pommerening. 2011. Controlling false match rates in record linkage using extreme value theory. Journal of Biomedical Informatics , Vol. 44, 4 (2011), 648--654.Google ScholarDigital Library
Y. Su, G. Agrawal , and J. Woodring. 2012. Indexing and Parallel Query Processing Support for Visualizing Climate Datasets. In 2012 41st International Conference on Parallel Processing. IEEE, 249--258.Google Scholar
Stanimire Tomov, Jack Dongarra, and Marc Baboulin. 2010a. Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. , Vol. 36, 5--6 (2010), 232--240.Google ScholarDigital Library
Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra. 2010b. Dense linear algebra solvers for multicore with GPU accelerators. In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW). IEEE, 1--8.Google ScholarCross Ref
Nhat-Phuong Tran, Myungho Lee, and Dong Hoon Choi. 2015. Memory-efficient parallelization of 3D lattice Boltzmann flow solver on a GPU. In 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 315--324.Google ScholarDigital Library
Sebastiaan J. van Schaik and Oege de Moor. 2011. A Memory Efficient Reachability Data Structure Through Bit Vector Compression. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). 913--924.Google Scholar
Stuart DC Walsh, Martin O Saar, Peter Bailey, and David J Lilja. 2009. Accelerating geoscience and engineering system simulations on graphics hardware. Computers & Geosciences , Vol. 35, 12 (2009), 2353--2364.Google ScholarDigital Library
Nicolas Weber and Michael Goesele. 2017. MATOG: array layout auto-tuning for CUDA . ACM Transactions on Architecture and Code Optimization (TACO) , Vol. 14, 3 (2017), 28.Google Scholar
Kesheng Wu, Ekow J Otoo, and Arie Shoshani. 2002. Compressing bitmap indexes for faster search operations. In Proceedings 14th International Conference on Scientific and Statistical Database Management. IEEE, 99--108.Google ScholarDigital Library
Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. , Vol. 31, 1 (2006), 1--38.Google ScholarDigital Library
K. Wu, E. J. Otoo, A. Shoshani, and H. Nordberg. 2001. Notes on design and implementation of compressed bit vectors . Technical Report LBNL/PUB-3161. Lawrence Berkeley National Laboratory.Google Scholar
Beytullah Yildiz, Kesheng Wu, Suren Byna, and Arie Shoshani. 2019. Parallel membership queries on very large scientific data sets using bitmap indexes. Concurrency and Computation: Practice and Experience (2019), e5157.Google Scholar

Index Terms

GPU Acceleration of Range Queries over Large Data Sets
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization

Recommendations

Parallel acceleration of CPU and GPU range queries over large data sets
Abstract
Data management systems commonly use bitmap indices to increase the efficiency of querying scientific data. Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations. The ...
Read More
Neural acceleration for GPU throughput processors
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application ...
Read More
Data parallel acceleration of decision support queries using Cell/BE and GPUs
CF '09: Proceedings of the 6th ACM conference on Computing frontiers

Decision Support System (DSS) workloads are known to be one of the most time-consuming database workloads that processes large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessor. The topic addressed in this work ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
December 2019
174 pages
ISBN:9781450370165
DOI:10.1145/3365109
General Chairs:
Kenneth Johnson
Auckland University of Technology, New Zealand
,
Josef Spillner
Zurich University of Applied Sciences, Switzerland
,
Program Chairs:
Xinghui Zhao
Washington State University, USA
,
Olga Datskova
CERN, Switzerland
,
Blesson Varghese
Queen's University Belfast, UK
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 December 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bitmap indices
gpu
range queries
wah compression
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate27of93submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 155
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GPU Acceleration of Range Queries over Large Data Sets

BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Parallel acceleration of CPU and GPU range queries over large data sets

Neural acceleration for GPU throughput processors

Data parallel acceleration of decision support queries using Cell/BE and GPUs