Abstract
The unique characteristics of scientific data and queries cause traditional indexing techniques to perform poorly on scientific workloads, occupy excessive space, or both. Refinements of bitmap indexes have been proposed previously as a solution to this problem. In this article, we describe the difficulties we encountered in deploying bitmap indexes with scientific data and queries from two real-world domains. In particular, previously proposed methods of binning, encoding, and compressing bitmap vectors either were quite slow for processing the large-range query conditions our scientists used, or required excessive storage space. Nor could the indexes easily be built or used on parallel platforms. In this article, we show how to solve these problems through the use of multi-resolution, parallelizable bitmap indexes, which support a fine-grained trade-off between storage requirements and query performance. Our experiments with large data sets from two scientific domains show that multi-resolution, parallelizable bitmap indexes occupy an acceptable amount of storage while improving range query performance by roughly a factor of 10, compared to a single-resolution bitmap index of reasonable size.
- Amer-Yahia, S. and Johnson, T. 2000. Optimizing queries on compressed bitmaps. In Proceedings of the Very Large Data Bases Conference. 329--338. Google ScholarDigital Library
- Antoshenkov, G. 1995. Byte-aligned bitmap compression. In DCC '95: Proceedings of the Conference on Data Compression (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 476. Google ScholarDigital Library
- Cha, G.-H. 2003. Bitmap indexing method for complex similarity queries with relevance feedback. In Proceedings of ACM International Workshop on Multimedia Databases. ACM, New York, 55--62. Google ScholarDigital Library
- Chan, C. and Ioannidis, Y. 1999. An efficient bitmap encoding scheme for selection queries. In Proceedings of the ACM Conference on Management of Data (SIGMOD). ACM, New York, 215--226. Google ScholarDigital Library
- Chan, C. Y. and Ioannidis, Y. E. 1998. Bitmap index design and evaluation. In Proceedings of the ACM Conference on Management of Data (SIGMOD). ACM, New York, 355--366. Google ScholarDigital Library
- Copeland, G. P. and Khoshafian, S. 1985. A decomposition storage model. In Proceedings of the ACM Conference on Management of Data (SIGMOD), S. B. Navathe, Ed. ACM, New York, 268--279. Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2002. Introduction to Algorithms, Second ed. McGraw Hill, New York. Google ScholarDigital Library
- Cover, T. M. and Thomas, J. A. 2002. Elements of Information Theory, Second ed. Wiley-Interscience, New York, Google ScholarDigital Library
- Department of Energy 2004. The Department of Energy Office of Science Data Management Challenge. http://www.sc.doe.gov/ascr/Final-report-v26.pdf.Google Scholar
- Gaede, V. and Günther, O. 1998. Multidimensional access methods. ACM Comput. Surv. 30, 2, 170--231. Google ScholarDigital Library
- Guttman, A. 1984. R-trees: A dynamic indexing structure for spatial searching. In Proceedings of the ACM Conference on Management of Data (SIGMOD). 47--57. Google ScholarDigital Library
- Jeong, J. and Nang, J. 2004. An efficient bitmap indexing method for similarity search in high dimensional multimedia databases. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, CA, 815--818.Google Scholar
- Johnson, T. 1999. Performance measurements of compressed bitmap indices. In Proceedings of the Very Large Data Bases Conference. 278--289. Google ScholarDigital Library
- Jürgens, M. and Lenz, H.-J. 2001. Tree based indexes versus bitmap indexes: A performance study. Int. J. Cooperat. Inf. Syst. 10, 3, 355--376.Google ScholarCross Ref
- Koudas, N. 2000. Space efficient bitmap indexing. In Proceedings of the Conference on Information and Knowledge Management. 194--201. Google ScholarDigital Library
- Mitra, S., Sinha, R. R., Winslett, M., and Jiao, X. 2005. An efficient, non intrusive, log based I/O mechanism for scientific simulations on clusters. In Proceedings of the IEEE Cluster Conference. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
- Morzy, M., Morzy, T., Nanopoulos, A., and Manolopoulos, Y. 2003. Hierarchical bitmap index: An efficient and scalable indexing technique for set-valued attributes. In Proceedings of the East-European Conference on Advances in Databases and Information Systems. 236--252.Google Scholar
- O'Neil, P. 1987. Model 204 architecture and performance. In Proceedings of Conference on High Performance Transaction Systems. 40--59. Google ScholarDigital Library
- O'Neil, P. and Quass, D. 1997. Improved query performance with variant indexes. In Proceedings of the ACM Conference on Management of Data (SIGMOD). ACM, New York, 38--49. Google ScholarDigital Library
- ROMIO. Romio: A high-performance, portable MPI-io implementation. www.mcs.anl.gov/romio.Google Scholar
- Rotem, D., Stockinger, K., and Wu, K. 2005a. Optimizing candidate check costs for bitmap indices. In Proceedings of the Conference on Information and Knowledge Management. 648--655. Google ScholarDigital Library
- Rotem, D., Stockinger, K., and Wu, K. 2005b. Optimizing I/O costs of multi-dimensional queries using bitmap indices. In Proceedings of the International Conference on Database and Expert System Applications. 220--229. Google ScholarDigital Library
- Sinha, R. R., Mitra, S., and Winslett, M. 2006. Bitmap indexes for large scientific data sets: A case study. In Proceedings of the IEEE International Parallel & Distributed Proceessing Symposium. IEEE Computer Society Press, Los Alamitos, CA. Google ScholarDigital Library
- Stockinger, K. 2001. Design and implementation of bitmap indices for scientific data. In Proceedings of the International Database Engineering & Applications Symposium. 47--57. Google ScholarDigital Library
- Stockinger, K., Düllmann, D., Hoschek, W., and Schikuta, E. 2000. Improving the performance of high-energy physics analysis through bitmap indices. In Proceedings of the International Conference on Database and Expert System Applications. 835--845. Google ScholarDigital Library
- Stockinger, K., Wu, K., and Shoshani, A. 2004. Evaluation strategies for bitmap indices with binning. In Proceedings of the International Conference on Database and Expert System Applications. 120--129.Google Scholar
- VTune. http://www.intel.com/cd/software/products/asmo-na/eng/239144.htm.Google Scholar
- Wong, H. K. T., Li, J., Olken, F., Rotem, D., and Wong, L. 1986. Bit transposition for very large scientific and statistical databases. Algorithmica 1, 3, 289--309.Google ScholarCross Ref
- Wong, H. K. T., Liu, H.-F., Olken, F., Rotem, D., and Wong, L. 1985. Bit transposed files. In Proceedings of the Very Large Data Bases Conference. 448--457. Google ScholarDigital Library
- Wu, C.-L., Koh, J.-L., and An, P.-Y. 2005. Improved sequential pattern mining using an extended bitmap representation. In Proceedings of the International Conference on Database and Expert System Applications. 776--785. Google ScholarDigital Library
- Wu, K., Otoo, E. J., and Shoshani, A. 2002a. Compressing bitmap indexes for faster search operations. In Proceedings of the International Scientific and Statistical Database Management Conference. 99--108. Google ScholarDigital Library
- Wu, K., Otoo, E. J., and Shoshani, A. 2004. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the Conference on Very Large Data Bases. 24--35. Google ScholarDigital Library
- Wu, K., Otoo, E. J., and Shoshani, A. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Datab. Syst. 31, 2, 1--38. Google ScholarDigital Library
- Wu, K., Otoo, E. J., Shoshani, A., and Nordberg, H. 2002b. Notes on design and implementation of compressed bit vectors. Tech. Rep. LBNL/PUB-3161, Lawrence Berkeley National Laboratory.Google Scholar
- Wu, K.-L. and Yu, P. S. 1998. Range-based bitmap indexing for high cardinality attributes with skew. In Proceedings of the International Computer Software and Applications Conference. 61--67. Google ScholarDigital Library
- Wu, M.-C. 1999. Query optimization for selections using bitmaps. In Proceedings of the ACM Conference on Management of Data (SIGMOD). ACM, New York, 227--238. Google ScholarDigital Library
- Wu, M.-C. and Buchmann, A. P. 1998. Encoded bitmap indexing for data warehouses. In Proceedings of the International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA, 220--230. Google ScholarDigital Library
Index Terms
- Multi-resolution bitmap indexes for scientific data
Recommendations
Optimizing bitmap indices with efficient compression
Bitmap indices are efficient for answering queries on low-cardinality attributes. In this article, we present a new compression scheme called Word-Aligned Hybrid (WAH) code that makes compressed bitmap indices efficient even for high-cardinality ...
Investigating design choices between Bitmap index and B-tree index for a large data warehouse system
ACS'08: Proceedings of the 8th conference on Applied computer scinceBuilding indexes on database is common, but it has an important impact on the query performance, especially in large databases such as a Data Warehouse where the queries are usually very complex and ad hoc. If a proper index structure is chosen, the ...
APPLE: a new compression scheme for bitmap indexes: poster abstract
SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor SystemsCompressed bitmap indexes are increasingly used in databases and search engines. By exploiting bit-level parallelism and bitwise operations, e.g. AND/OR operations, they can significantly accelerate the development of many areas. The Word Aligned Hybrid ...
Comments