Abstract
The Burrows-Wheeler Transform (BWT) is a tool of fundamental importance in Data Compression and, recently, has found many applications well beyond its original purpose. The main goal of this paper is to highlight the mathematical and combinatorial properties on which the outstanding versatility of the BWT is based, i.e., its reversibility and the clustering effect on the output. Such properties have aroused curiosity and fervent interest in the scientific world both for theoretical aspects and for practical effects. In particular, in this paper we are interested both to survey the theoretical research issues which, by taking their cue from Data Compression, have been developed in the context of Combinatorics on Words, and to focus on those combinatorial results useful to explore the applicative potential of the Burrows-Wheeler Transform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching, 1st edn. Springer Publishing Company, Incorporated (2008)
Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theoret. Comput. Sci. 483, 134–148 (2013)
Bonomo, S., Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: Suffixes, Conjugates and Lyndon words. In: Béal, M.-P., Carton, O. (eds.) DLT 2013. LNCS, vol. 7907, pp. 131–142. Springer, Heidelberg (2013)
Burrows, M., Wheeler, D.J.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)
Cai, H., Kulkarni, S.R., Verdú, S.: Universal entropy estimation via block sorting. IEEE Transactions on Information Theory 50(7), 1551–1561 (2004)
Cox, A.J., Bauer, M.J., Jakobi, T., Rosone, G.: Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform. Bioinformatics 28(11), 1415–1419 (2012)
Cox, A.J., Jakobi, T., Rosone, G., Schulz-Trieglaff, O.B.: Comparing DNA sequence collections by direct comparison of compressed text indexes. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS (LNBI), vol. 7534, pp. 214–224. Springer, Heidelberg (2012)
Crochemore, M., Désarménien, J., Perrin, D.: A note on the Burrows-Wheeler transformation. Theoret. Comput. Sci. 332, 567–572 (2005)
de Luca, A.: Combinatorics of standard sturmian words. In: Mycielski, J., Rozenberg, G., Salomaa, A. (eds.) Structures in Logic and Computer Science. LNCS, vol. 1261, Springer, Heidelberg (1997)
de Luca, A., Mignosi, F.: Some combinatorial properties of sturmian words. Theoret. Comput. Sci. 136(2), 361–385 (1994)
Droubay, X., Justin, J., Pirillo, G.: Episturmian words and some constructions of de Luca and Rauzy. Theoret. Comput. Sci. 255(1-2), 539–553 (2001)
Effros, M., Visweswariah, K., Kulkarni, S.R., Verdú, S.: Universal lossless source coding with the Burrows Wheeler Transform. IEEE Transactions on Information Theory 48(5), 1061–1081 (2002)
Ferenczi, S., Zamboni, L.Q.: Clustering Words and Interval Exchanges. Journal of Integer Sequences 16(2), Article 13.2.1 (2013)
Ferragina, P., Gagie, T., Manzini, G.: Lightweight Data Indexing and Compression in External Memory. Algorithmica 63(3), 707–730 (2012)
Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. J. ACM 52(4), 688–713 (2005)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS 2000, pp. 390–398. IEEE Computer Society (2000)
Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: SODA 2001, pp. 269–278. SIAM (2001)
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 552–581 (2005)
Ferragina, P., Nitto, I., Venturini, R.: On optimally partitioning a text to improve its compression. Algorithmica 61, 51–74 (2011)
Gessel, I.M., Reutenauer, C.: Counting permutations with given cycle structure and descent set. J. Combin. Theory Ser. A 64(2), 189–215 (1993)
Giancarlo, R., Sciortino, M.: Optimal partitions of strings: A new class of Burrows-Wheeler compression algorithms. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 129–143. Springer, Heidelberg (2003)
Gil, J.Y., Scott, D.A.: A bijective string sorting transform. CoRR (2012); abs/1201.3077
Hon, W.-K., Ku, T.-H., Lu, C.-H., Shah, R., Thankachan, S.V.: Efficient Algorithm for Circular Burrows-Wheeler Transform. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 257–268. Springer, Heidelberg (2012)
Kaplan, H., Landau, S., Verbin, E.: A simpler analysis of Burrows-Wheeler-based compression. Theoret. Comput. Sci. 387(3), 220–235 (2007)
Kaplan, H., Verbin, E.: Most burrows-wheeler based compressors are not optimal. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 107–118. Springer, Heidelberg (2007)
Knuth, D., Morris, J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)
Kufleitner, M.: On bijective variants of the Burrows-Wheeler transform, pp. 65–79 (2009)
Likhomanov, K.M., Shur, A.M.: Two combinatorial criteria for BWT images. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 385–396. Springer, Heidelberg (2011)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge Univ. Press (2002)
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 178–189. Springer, Heidelberg (2005)
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: An extension of the Burrows-Wheeler Transform. Theoret. Comput. Sci. 387(3), 298–312 (2007)
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M.: A new combinatorial approach to sequence comparison. Theory Comput. Syst. 42(3), 411–429 (2008)
Mantaci, S., Restivo, A., Sciortino, M.: Burrows-Wheeler transform and Sturmian words. Information Processing Letters 86, 241–246 (2003)
Mantaci, S., Restivo, A., Sciortino, M.: Distance measures for biological sequences: Some recent approaches. Int. J. Approx. Reasoning 47(1), 109–124 (2008)
Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)
Ng, K.-H., Ho, C.-K., Phon-Amnuaisuk, S.: A hybrid distance measure for clustering expressed sequence tags originating from the same gene family. PLoS ONEÂ 7(10) (2012)
Jenkinson, O., Zamboni, L.Q.: Characterisations of balanced words via orderings. Theoret. Comput. Sci. 310(1), 247–271 (2004)
Pak, I., Redlich, A.: Long cycles in abc-permutations. Functional Analysis and Other Mathematics 2, 87–92 (2008)
Restivo, A., Rosone, G.: Burrows-Wheeler transform and palindromic richness. Theoret. Comput. Sci. 410(30-32), 3018–3026 (2009)
Restivo, A., Rosone, G.: Balancing and clustering of words in the Burrows-Wheeler transform. Theoret. Comput. Sci. 412(27), 3019–3032 (2011)
Simpson, J., Puglisi, S.J.: Words with simple Burrows-Wheeler transforms. Electronic Journal of Combinatorics 15 article R83 (2008)
Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), i367–i373 (2010)
Vinga, S., Almeida, J.: Alignment-free sequence comparison a review. Bioinformatics 19(4), 513–523 (2003)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)
Yang, L., Zhang, X., Wang, T.: The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform. Journal of Theoretical Biology 262(4), 742–749 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rosone, G., Sciortino, M. (2013). The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds) The Nature of Computation. Logic, Algorithms, Applications. CiE 2013. Lecture Notes in Computer Science, vol 7921. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39053-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-39053-1_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39052-4
Online ISBN: 978-3-642-39053-1
eBook Packages: Computer ScienceComputer Science (R0)