Skip to main content

Efficient LZ78 Factorization of Grammar Compressed Text

  • Conference paper
String Processing and Information Retrieval (SPIRE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7608))

Included in the following conference series:

Abstract

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in \(O(n\sqrt{N}+m\log N)\) time and \(O(n\sqrt{N}+m)\) space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the \(n\sqrt{N}\) term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N − α) where α ≥ 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m = O(N/log σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Information and Computation 119(2), 258–282 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. System Sci. 48(2), 214–230 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bille, P., Fagerberg, R., Gørtz, I.L.: Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts. ACM Transactions on Algorithms 6(1) (2009)

    Google Scholar 

  5. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proc. SODA 2011, pp. 373–389 (2011)

    Google Scholar 

  6. Cilibrasi, R., Vitányi, P.M.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)

    Article  Google Scholar 

  7. Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32(6), 1654–1673 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. FOCS 1997, pp. 137–143 (1997)

    Google Scholar 

  9. Freschi, V., Bogliolo, A.: A faster algorithm for the computation of string convolutions using LZ78 parsing. Information Processing Letters 110(14-15), 609–613 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Proc. SODA 2011, pp. 362–372 (2011)

    Google Scholar 

  11. Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Proc. STACS 2012, pp. 624–635 (2012)

    Google Scholar 

  12. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram Mining on SLP Compressed Strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 278–289. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Speeding Up q-Gram Mining on Grammar-Based Compressed Texts. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 220–231. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Jansson, J., Sadakane, K., Sung, W.-K.: Compressed Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space. In: Arvind, V., Prasad, S. (eds.) FSTTCS 2007. LNCS, vol. 4855, pp. 424–435. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Jeż, A.: Faster Fully Compressed Pattern Matching by Recompression. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 533–544. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Kida, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: A unifying framework for compressed pattern matching. Theor. Comput. Sci. 298(1), 253–272 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  17. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proc. DCC 1999, pp. 296–305. IEEE Computer Society (1999)

    Google Scholar 

  18. Li, M., Sleep, R.: Genre classification via an LZ78-based string kernel. In: Proc. ISMIR 2005, pp. 252–259 (2005)

    Google Scholar 

  19. Li, M., Sleep, R.: An LZ78 Based String Kernel. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 678–689. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  20. Li, M., Zhu, Y.: Image Classification Via LZ78 Based String Kernel: A Comparative Study. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 704–712. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of ACM 23(2), 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  22. Nevill-Manning, C.G., Witten, I.H., Maulsby, D.L.: Compression by induction of hierarchical grammars. In: Proc. DCC 1994, pp. 244–253 (1994)

    Google Scholar 

  23. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. Shibuya, T.: Constructing the suffix tree of a tree with a large alphabet. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E86-A(5), 1061–1066 (2003)

    Google Scholar 

  25. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  26. Weiner, P.: Linear pattern-matching algorithms. In: Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pp. 1–11. Institute of Electrical Electronics Engineers, New York (1973)

    Chapter  Google Scholar 

  27. Westbrook, J.: Fast Incremental Planarity Testing. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 342–353. Springer, Heidelberg (1992)

    Chapter  Google Scholar 

  28. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)

    Article  MathSciNet  Google Scholar 

  29. Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bannai, H., Inenaga, S., Takeda, M. (2012). Efficient LZ78 Factorization of Grammar Compressed Text. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34109-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34108-3

  • Online ISBN: 978-3-642-34109-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics