research-article

Efficient parallel lists intersection and index compression algorithms using graphics processing units

Authors:
Naiyong Ao

Nankai University

Nankai University
View Profile

,
Fan Zhang

Nankai University

Nankai University
View Profile

,
Di Wu

Nankai University

Nankai University
View Profile

,
Douglas S. Stones

Monash University

Monash University
View Profile

,
Gang Wang

Nankai University

Nankai University
View Profile

,
Xiaoguang Liu

Nankai University

Nankai University
View Profile

,
Jing Liu

Nankai University

Nankai University
View Profile

,
Sheng Lin

Nankai University

Nankai University
View Profile

Proceedings of the VLDB Endowment Volume 4 Issue 8pp 470–481https://doi.org/10.14778/2002974.2002975

Published:01 May 2011Publication History

Proceedings of the VLDB Endowment

Abstract

Major web search engines answer thousands of queries per second requesting information about billions of web pages. The data sizes and query loads are growing at an exponential rate. To manage the heavy workload, we consider techniques for utilizing a Graphics Processing Unit (GPU). We investigate new approaches to improve two important operations of search engines -- lists intersection and index compression.

For lists intersection, we develop techniques for efficient implementation of the binary search algorithm for parallel computation. We inspect some representative real-world datasets and find that a sufficiently long inverted list has an overall linear rate of increase. Based on this observation, we propose Linear Regression and Hash Segmentation techniques for contracting the search range. For index compression, the traditional d-gap based compression schemata are not well-suited for parallel computation, so we propose a Linear Regression Compression schema which has an inherent parallel structure. We further discuss how to efficiently intersect the compressed lists on a GPU. Our experimental results show significant improvements in the query processing throughput on several datasets.

References

V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Information Retrieval, 8(1):151--166, 2005. Google ScholarDigital Library
R. Baeza-Yates. A fast set intersection algorithm for sorted sequences. In Combinatorial Pattern Matching, pages 400--408, 2004.Google ScholarCross Ref
R. Baeza-Yates and A. Salinger. Experimental analysis of a fast intersection algorithm for sorted sequences. In Proc. 12th International Conference on String Processing and Information, pages 13--24, 2005. Google ScholarDigital Library
J. Barbay, A. López-Ortiz, and T. Lu. Faster adaptive set intersections for text searching. Experimental Algorithms: 5th International Workshop, pages 146--157, 2006. Google ScholarDigital Library
M. Billeter, O. Olsson, and U. Assarsson. Effcient stream compaction on wide SIMD many-core architectures. In Proc. Conference on High Performance Graphics, pages 159--166, 2009. Google ScholarDigital Library
D. Blandford and G. Blelloch. Index compression through document reordering. In Proc. Data Compression Conference, pages 342--351, 2002. Google ScholarDigital Library
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990. Google ScholarDigital Library
E. D. Demaine, A. López-Ortiz, and J. Ian Munro. Adaptive set intersections, unions, and differences. In Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 743--752, 2000. Google ScholarDigital Library
E. D. Demaine, A. López-Ortiz, and J. Ian Munro. Experiments on adaptive set intersections for text retrieval systems. Third International Workshop on Algorithm Engineering and Experimentation, pages 91--104, 2001. Google ScholarDigital Library
S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proc. 18th International Conference on World Wide Web, pages 421--430, 2009. Google ScholarDigital Library
V. Estivill-Castro and D. Wood. A survey of adaptive sorting algorithms. ACM Comput. Surv., 24(4):441--476, 1992. Google ScholarDigital Library
A. Grama, A. Gupta, and V. Kumar. Isoeffciency: Measuring the scalability of parallel algorithms and architectures. IEEE Parallel & Distributed Technology: Systems & Applications, 1(3):12{21, 1993. Google ScholarDigital Library
S. Héman. Super-scalar database compression between RAM and CPU-cache. Master's thesis, Centrum voor Wiskunde en Informatica Amsterdam, 2005.Google Scholar
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
Y. Perl, A. Itai, and H. Avni. Interpolation search -- a log log N search. Comm. ACM, 21(7):550--553, 1978. Google ScholarDigital Library
W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Comm. ACM, 33(6):668--676, 1990. Google ScholarDigital Library
F. Scholer, H. E. Williams, J. Yiannis, and J. Zobel. Compression of inverted indexes for fast query evaluation. In Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 222--229, 2002. Google ScholarDigital Library
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Proc. 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pages 97--106, 2007. Google ScholarDigital Library
W.-Y. Shieh, T.-F. Chen, J. J.-J. Shann, and C.-P. Chung. Inverted file compression through document identifier reassignment. Inform. Process. Manag., 39(1):117--131, 2003. Google ScholarDigital Library
F. Silvestri, R. Perego, and S. Orlando. Assigning document identifiers to enhance compressibility of web search engines indexes. In Proc. 2004 ACM Symposium on Applied Computing, pages 600--605, 2004. Google ScholarDigital Library
S. Tatikonda, F. Junqueira, B. Barla Cambazoglu, and V. Plachouras. On effcient posting list intersection with multicore processors. In Proc. 32nd international ACM SIGIR conference on Research and Development in Information Retrieval, pages 738--739, 2009. Google ScholarDigital Library
D. Tsirogiannis, S. Guha, and N. Koudas. Improving the performance of list intersection. Proc. VLDB Endowment, 2(1):838--849, 2009. Google ScholarDigital Library
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999. Google ScholarDigital Library
D. Wu, F. Zhang, N. Ao, G. Wang, X. Liu, and J. Liu. Effcient lists intersection by CPU--GPU cooperative computing. In 25th IEEE International Parallel and Distributed Processing Symposium, Workshops and PhD Forum (IPDPSW), pages 1--8, 2010.Google ScholarCross Ref
H. Yan, S. Ding, and T. Suel. Inverted index compression and query processing with optimized document ordering. In Proc. 18th International Conference on World Wide Web, pages 401--410, 2009. Google ScholarDigital Library
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2):1--56, 2006. Google ScholarDigital Library
M. Zukowski, S. Héman, N. Nes, and P. Boncz. Super-scalar RAM--CPU cache compression. In Proc. 22nd International Conference on Data Engineering (ICDE'06), page 59, 2006. Google ScholarDigital Library
S. Büttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 terabyte track. In Proc. 15th Text Retrieval Conference (TREC 2006), 2006.Google Scholar
R. Fisher and F. Yates. Statistical Tables for Biological, Agricultural and Medical Research. Oliver and Boyd, 1963.Google Scholar
NVIDIA Corporation. NVIDIA CUDA Programming Guide v3. 2010.Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
E. M. Voorhees. Overview of trec 2004. In In NIST Special Publication 500-261: The Thirteenth Text Retrieval Conference Proceedings (TREC 2004), pages 1--12, 2004.Google Scholar
E. M. Voorhees. Overview of TREC 2002. In Proc. 11th Text Retrieval Conference (TREC 2002), pages 1--16, 2003.Google Scholar

Index Terms

Efficient parallel lists intersection and index compression algorithms using graphics processing units
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Information systems
  1. Data management systems
    1. Database design and models
      1. Physical data models
  2. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Index compression and efficient query processing in large web search engines
Read More
Algorithmic performance studies on graphics processing units

We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floating-point co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear ...
Read More
Efficient processing of XML path queries using the disk-based F&B Index
VLDB '05: Proceedings of the 31st international conference on Very large data bases

With the proliferation of XML data and applications on the Internet, efficient XML query processing techniques are in great demand. Answering queries using XML indexes is a natural approach. A number of XML indexes have been proposed in the literature: ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 4, Issue 8
May 2011
58 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 May 2011
Published in pvldb Volume 4, Issue 8
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 414
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient parallel lists intersection and index compression algorithms using graphics processing units

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Index compression and efficient query processing in large web search engines

Algorithmic performance studies on graphics processing units

Efficient processing of XML path queries using the disk-based F&B Index

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient parallel lists intersection and index compression algorithms using graphics processing units

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Index compression and efficient query processing in large web search engines

Algorithmic performance studies on graphics processing units

Efficient processing of XML path queries using the disk-based F&B Index

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media