Abstract
Move-to-Front, Distance Coding and Inversion Frequencies are three somewhat related techniques used to process the output of the Burrows-Wheeler Transform. In this paper we analyze these techniques from the point of view of how effective they are in the task of compressing low-entropy strings, that is, strings which have many regularities and are therefore highly compressible. This is a non-trivial task since many compressors have non-constant overheads that become non-negligible when the input string is highly compressible.
Because of the properties of the Burrows-Wheeler transform, being locally optimal ensures an algorithm compresses low-entropy strings effectively. Informally, local optimality implies that an algorithm is able to effectively compress an arbitrary partition of the input string. We show that in their original formulation neither Move-to-Front, nor Distance Coding, nor Inversion Frequencies is locally optimal. Then, we describe simple variants of the above algorithms which are locally optimal. To achieve local optimality with Move-to-Front it suffices to combine it with Run Length Encoding. To achieve local optimality with Distance Coding and Inversion Frequencies we use a novel “escape and re-enter” strategy. Since we build on previous results, our analyses are simple and shed new light on the inner workings of the three techniques considered in this paper.
Partially supported by Italian MIUR Italy-Israel FIRB Project “Pattern Discovery Algorithms in Discrete Structures, with Applications to Bioinformatics”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arnavut, Z., Magliveras, S.: Block sorting and compression. In: Procs of IEEE Data Compression Conference (DCC), pp. 181–190 (1997)
Bentley, J., Sleator, D., Tarjan, R., Wei, V.: A locally adaptive data compression scheme. Communications of the ACM 29(4), 320–330 (1986)
Binder, E.: Distance coder, Usenet group (2000) comp.compression
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Deorowicz, S.: Second step algorithms in the Burrows-Wheeler compression algorithm. Software: Practice and Experience 32(2), 99–111 (2002)
Fenwick, P.: Burrows-Wheeler compression with variable length integer codes. Software: Practice and Experience 32, 1307–1316 (2002)
Ferragina, P., Giancarlo, R., Manzini, G.: The engineering of a compression boosting library: Theory vs practice in bwt compression. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 756–767. Springer, Heidelberg (2006)
Ferragina, P., Giancarlo, R., Manzini, G.: The myriad virtues of wavelet trees. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 561–572. Springer, Heidelberg (2006)
Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM 52, 688–713 (2005)
Foschini, L., Grossi, R., Gupta, A., Vitter, J.: Fast compression with a static model in high-order entropy. In: Procs of IEEE Data Compression Conference (DCC), pp. 62–71 (2004)
Giancarlo, R., Sciortino, M.: Optimal partitions of strings: A new class of Burrows-Wheeler compression algorithms. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 129–143. Springer, Heidelberg (2003)
Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of Burrows-Wheeler based compression. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, Springer, Heidelberg (2006)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys (to Appear)
Ryabko, B.Y.: Data compression by means of a ’book stack’. Prob.Inf.Transm, 16(4) (1980)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gagie, T., Manzini, G. (2007). Move-to-Front, Distance Coding, and Inversion Frequencies Revisited. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-73437-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)