ABSTRACT
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work describes the benefits of compression for numerical attributes, where data is stored in compressed format on disk. Despite the abundance of string-valued attributes in relational schemas there is little work on compression for string attributes in a database context. Moreover, none of the previous work suitably addresses the role of the query optimizer: During query execution, data is either eagerly decompressed when it is read into main memory, or data lazily stays compressed in main memory and is decompressed on demand only
In this paper, we present an effective approach for database compression based on lightweight, attribute-level compression techniques. We propose a IIierarchical Dictionary Encoding strategy that intelligently selects the most effective compression method for string-valued attributes. We show that eager and lazy decompression strategies produce sub-optimal plans for queries involving compressed string attributes. We then formalize the problem of compression-aware query optimization and propose one provably optimal and two fast heuristic algorithms for selecting a query plan for relational schemas with compressed attributes; our algorithms can easily be integrated into existing cost-based query optimizers. Experiments using TPC-H data demonstrate the impact of our string compression methods and show the importance of compression-aware query optimization. Our approach results in up to an order speed up over existing approaches.
- 1.Transact on processing performance council TPC-H benchmark, http://www.tpc.org 1999.Google Scholar
- 2.Predator DMBS. http://www.cs.cornel l.edu/database/predator, Cornel l Univ., Computer Science Dept.,2000.Google Scholar
- 3.S.Amer-Yahia and T.Johnson.Optimizing queres on compressed b tmaps.In Proc.of VLDB pages 329 -338,2000. Google ScholarDigital Library
- 4.G.Antoshenkov,D.B.Lomet,and J.Murray.Order preserving compression.In Proc.of ICDE pages 655 -663,1996. Google ScholarDigital Library
- 5.C.Blake and C.Merz.UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html 1998.Google Scholar
- 6.P.A.Boncz,S.Manegold,and M.L.Kersten. Database architecture opt m zed for the new bottleneck:Memory access.In Proc.of VLDB pages 54 -65,1999. Google ScholarDigital Library
- 7.S.Chaudhur and K.Shim.Opt m zat on of quer es w th user-de .ned predicates.TODS 24(2):177 -228, 1999. Google ScholarDigital Library
- 8.Z.Chen and P.Seshadr .An algebra c compression framework for query results.In Proc.of ICDE pages 177 -188,2000. Google ScholarDigital Library
- 9.J.G.Cleary and I.H.W tten.Data compression using adaptive coding and partial string matching. IEEE Trans. on Communications COM-32(4),pages 396 -402,April 1984.Google Scholar
- 10.G.Cormack.Data compression n a database system. Commnications of the ACM pages 1336 -1342,Dec. 1985. Google ScholarDigital Library
- 11.S.J.Eggers,F.Olken,and A.Shoshani.A compress on techn que for large statist cal data-bases. In Proc.of VLDB pages 424 -434,1981.Google Scholar
- 12.J.Goldste n,R.Ramakr shnan,and U.Shaft. Compressing relations and indexes.In Proc.of ICDE pages 370 -379,1998. Google ScholarDigital Library
- 13.J.Goldste n,R.Ramakr shnan,and U.Shaft. Squeezing the most out of relat onal database systems. In Proc.of ICDE page 81,2000. Google ScholarDigital Library
- 14.G.Graefe.Opt ons n physical databases.SIGMOD Record 22(3),pages 76 -83,Sept.1993. Google ScholarDigital Library
- 15.G.Graefe and L.Shapiro.Data compression and database performance.In ACM/IEEE-CS Symp. On Applied Computing pages 22 -27,April 1991.Google Scholar
- 16.R.Greer.Daytona and the fourth-generat on language cymbal.In Proc.of SIGMOD pages 525 -526,1999. Google ScholarDigital Library
- 17.J.M.Hellerste n and M.Stonebraker.Predicate migration:Optimizing queries with expensive pred cates.In Proc. of SIGMOD pages 267 -276,1993. Google ScholarDigital Library
- 18.D.Hu .man.A method for the construct on of m nimum-redundanc codes.In Proc. IRE, 40(9), pages 1098 -1101,Sept.1952.Google Scholar
- 19.B.R.Iyer and D.W lh te.Data compression support n databases.In Proc.of VLDB pages 695 -704,1994. Google ScholarDigital Library
- 20.T.J.Lehman and M.J.Carey.Query processing n man memory database management systems.In Proc. of SIGMOD,,pages 239 -250,1986. Google ScholarDigital Library
- 21.J.L ,D.Rotem,and J.Srivastava.Aggregat on algorithms for very large compressed data warehouses. In Proc. of VLDB pages 651 -662,1999. Google ScholarDigital Library
- 22.H.Lefke and D.Suciu.Xmill:Anecient compressor for XML data.In Proc.of SIGMOD pages 153 -164, 2000. Google ScholarDigital Library
- 23.W.K.Ng and C.V.Rav shankar.Relat onal database compression using augmented vector quant zat on.In Proc. of ICDE pages 540 -549,1995. Google ScholarDigital Library
- 24.G.Ray,J.R.Harista,and S.Seshadri.Database compression:A performance enhancement tool.In the 7th Int'l Conf. on Management of Data (COMAD), Pune,India,1995.Google Scholar
- 25.M.A.Roth and S.J.V.Horn.Database compression. SIGMOD Record 22(3):31 -39,1993. Google ScholarDigital Library
- 26.P.G.Sel nger,M.M.Astrahan,D.D.Chamberl n, R.A.Lorie,and T.G.Price.Access path selection n a relat onal database management system.In Proc. of SIGMOD pages 23 -34,1979. Google ScholarDigital Library
- 27.D.Severance.A pract t oner 's guide to database compression.Information Systems 8(1),pages 51 -62, 1983.Google Scholar
- 28.T.Welch.A technique for high-performance data compression.IEEE Computer 17(6),pages 8 -19,June 1984.Google Scholar
- 29.T.Westmann,D.Kossmann,S.Helmer,and G.Moerkotte.The mplementation and performance of compressed databases.SIGMOD Record 29(3), Sept.2000. Google ScholarDigital Library
- 30.I.H.W tten,A.Mo .at,and T.C.Bell.Managing Giga Bytes - Compressing and Indexing Documents and Images Morgan Kaufmann Publ shers,Inc,1999. Google ScholarDigital Library
- 31.I.H.W tten,R.Neal,and J.Cleary.Arithmetic coding for data compression.Communications of the ACM, 30(6),pages 520 -540,June 1987. Google ScholarDigital Library
- 32.J.Ziv and A.Lempel.On the complexity of .nite sequences.IEEE Trans. on Information Theory, 22(1),pages 75 -81,1976.Google ScholarCross Ref
- 33.J.Ziv and A.Lempel.A universal algorithm for sequent al data compression.IEEE Trans. on Information Theory, 22(1),pages 337 -343,1977.Google ScholarDigital Library
Index Terms
- Query optimization in compressed database systems
Recommendations
Query optimization in compressed database systems
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work ...
Aggregate-Join Query Processing in Parallel Database Systems
HPC '00: Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2Queries containing aggregate functions often combine multiple tables through join operations. We call these queries Aggregate-Join queries. In parallel processing of such queries, it must be decided which attribute to be used as a partitioning attribute,...
Comments