skip to main content
research-article

Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

Published:28 December 2009Publication History
Skip Abstract Section

Abstract

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck.

References

  1. Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amir, A. and Benson, G. 1992a. Efficient two-dimensional compressed matching. In Proceedings of the 2nd Data Compression Conference. 279--288.Google ScholarGoogle Scholar
  3. Amir, A. and Benson, G. 1992b. Two-Dimensional periodicity and its applications. In Proceedings of the 3rd Symposium on Discrete Algorithms. 440--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amir, A., Benson, G., and Farach, M. 1996. Let sleeping files lie: Pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52, 2, 299--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bille, P. 2006. New algorithms for regular expression matching. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4051. 643--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bille, P. and Farach-Colton, M. 2005. Fast and compact regular expression matching. Submitted for publication. Preprint arxiv.org/cs/0509069.Google ScholarGoogle Scholar
  7. Cole, R. and Hariharan, R. 2002. Approximate string matching: A simpler faster algorithm. SIAM J. Comput. 31, 6, 1761--1782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Crochemore, M., Landau, G. M., and Ziv-Ukelson, M. 2003. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32, 6, 1654--1673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dietzfelbinger, M., Karlin, A., Mehlhorn, K., auf der Heide, F. M., Rohnert, H., and Tarjan, R. 1994. Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput. 23, 4, 738--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Farach, M. and Thorup, M. 1998. String matching in Lempel-Ziv compressed strings. Algorithmica 20, 4, 388--404.Google ScholarGoogle ScholarCross RefCross Ref
  11. Glushkov, V. M. 1961. The abstract theory of automata. Russian Math. Surv. 16, 5, 1--53.Google ScholarGoogle ScholarCross RefCross Ref
  12. Kärkkäinen, J., Navarro, G., and Ukkonen, E. 2003. Approximate string matching on Ziv-Lempel compressed text. J. Discr. Algor. 1, 3-4, 313--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., and Arikawa, S. 1998. Multiple pattern matching in LZW compressed text. In Proceedings of the 8th Data Compression Conference. 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Landau, G. M. and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algor. 10, 2, 157--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mäkinen, V., Ukkonen, E., and Navarro, G. 2003. Approximate matching of run-length compressed strings. Algorithmica 35, 4, 347--369.Google ScholarGoogle ScholarCross RefCross Ref
  16. Matsumoto, T., Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 2000. Bit-Parallel approach to approximate string matching in compressed texts. In Proceedings of the 7th International Symposium on String Processing and Information Retrieval. 221--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. McNaughton, R. and Yamada, H. 1960. Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. 9, 1, 39--47.Google ScholarGoogle ScholarCross RefCross Ref
  18. Myers, E. W. 1992. A four-russian algorithm for regular expression pattern matching. J. ACM 39, 2, 430--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Navarro, G. 2001. A guided tour to approximate string matching. ACM Comput. Surv. 33, 1, 31--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Navarro, G. 2003. Regular expression searching on compressed text. J. Discr. Algor. 1, 5-6, 423--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Navarro, G., Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 2001. Faster approximate string matching over compressed text. In Proceedings of the 11th Data Compression Conference. 459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Navarro, G. and Raffinot, M. 1998. A general practical approach to pattern matching over Ziv-Lempel compressed text. Tech. rep. TR/DCC-98-12, Department of Computer Science, University of Chile.Google ScholarGoogle Scholar
  23. Sellers, P. 1980. The theory and computation of evolutionary distances: pattern recognition. J. Algor. 1, 4, 359--373.Google ScholarGoogle ScholarCross RefCross Ref
  24. Thompson, K. 1968. Programming techniques: Regular expression search algorithm. Comm. ACM 11, 6, 419--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Welch, T. A. 1984. A technique for high-performance data compression. IEEE Comput. 17, 6, 8--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23, 3, 337--343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ziv, J. and Lempel, A. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24, 5, 530--536.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Algorithms
        ACM Transactions on Algorithms  Volume 6, Issue 1
        December 2009
        374 pages
        ISSN:1549-6325
        EISSN:1549-6333
        DOI:10.1145/1644015
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 December 2009
        • Accepted: 1 April 2008
        • Received: 1 May 2007
        Published in talg Volume 6, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader