ABSTRACT
Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates.
In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.
- D. Abadi et al. Integrating compression and execution in column-oriented database systems. SIGMOD 2006. Google ScholarDigital Library
- D. Abadi et al. Materialization Strategies in a Column-Oriented DBMS. ICDE 2007.Google ScholarCross Ref
- S. Agrawal et al. Database Tuning Advisor for Microsoft SQL Server. VLDB 2004.Google Scholar
- P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. CIDR 2005.Google Scholar
- N. Bruno and S. Chaudhuri. To Tune or not to Tune? A Lightweight Physical Design Alerter. VLDB 2006. Google ScholarDigital Library
- S. Harizopoulos et al. Performance Tradeoffs in Read-Optimized Databases. VLDB 2006. Google ScholarDigital Library
- S. Idreos, M. Kersten, and S. Manegold. Database Cracking. CIDR 2007.Google Scholar
- S. Idreos, M. Kersten, and S. Manegold. Updating a Cracked Database. SIGMOD 2007. Google ScholarDigital Library
- M. Kersten and S. Manegold. Cracking the Database Store. CIDR 2005.Google Scholar
- S. Manegold et al. Cache-Conscious Radix-Decluster Projections. VLDB 2004. Google ScholarDigital Library
- K. Schnaitter et al. COLT: Continuous On-Line Database Tuning. SIGMOD 2006. Google ScholarDigital Library
- M. Stonebraker et al. C-Store: A Column Oriented DBMS. VLDB 2005. Google ScholarDigital Library
- D. C. Zilio et al. DB2 Design Advisor: Integrated Automatic Physical Database Design. VLDB 2004. Google ScholarDigital Library
- TPC Benchmark H. http://www.tpc.org/tpch/.Google Scholar
- MonetDB. http://monetdb.cwi.nl/.Google Scholar
Index Terms
- Self-organizing tuple reconstruction in column-stores
Recommendations
Column-stores vs. row-stores: how different are they really?
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataThere has been a significant amount of excitement and recent work on column-oriented database systems ("column-stores"). These database systems have been shown to perform more than an order of magnitude better than traditional row-oriented database ...
Updating a cracked database
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataA cracked database is a datastore continuously reorganized based on operations being executed. For each query, the data of interest is physically reclustered to speed-up future access to the same, overlapping or even disjoint data. This way, a cracking ...
An experimental evaluation and analysis of database cracking
Database cracking has been an area of active research in recent years. The core idea of database cracking is to create indexes adaptively and incrementally as a side product of query processing. Several works have proposed different cracking techniques ...
Comments