Abstract
We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck.
- Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing, Boston, MA. Google ScholarDigital Library
- Amir, A. and Benson, G. 1992a. Efficient two-dimensional compressed matching. In Proceedings of the 2nd Data Compression Conference. 279--288.Google Scholar
- Amir, A. and Benson, G. 1992b. Two-Dimensional periodicity and its applications. In Proceedings of the 3rd Symposium on Discrete Algorithms. 440--452. Google ScholarDigital Library
- Amir, A., Benson, G., and Farach, M. 1996. Let sleeping files lie: Pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52, 2, 299--307. Google ScholarDigital Library
- Bille, P. 2006. New algorithms for regular expression matching. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 4051. 643--654. Google ScholarDigital Library
- Bille, P. and Farach-Colton, M. 2005. Fast and compact regular expression matching. Submitted for publication. Preprint arxiv.org/cs/0509069.Google Scholar
- Cole, R. and Hariharan, R. 2002. Approximate string matching: A simpler faster algorithm. SIAM J. Comput. 31, 6, 1761--1782. Google ScholarDigital Library
- Crochemore, M., Landau, G. M., and Ziv-Ukelson, M. 2003. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32, 6, 1654--1673. Google ScholarDigital Library
- Dietzfelbinger, M., Karlin, A., Mehlhorn, K., auf der Heide, F. M., Rohnert, H., and Tarjan, R. 1994. Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput. 23, 4, 738--761. Google ScholarDigital Library
- Farach, M. and Thorup, M. 1998. String matching in Lempel-Ziv compressed strings. Algorithmica 20, 4, 388--404.Google ScholarCross Ref
- Glushkov, V. M. 1961. The abstract theory of automata. Russian Math. Surv. 16, 5, 1--53.Google ScholarCross Ref
- Kärkkäinen, J., Navarro, G., and Ukkonen, E. 2003. Approximate string matching on Ziv-Lempel compressed text. J. Discr. Algor. 1, 3-4, 313--338. Google ScholarDigital Library
- Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., and Arikawa, S. 1998. Multiple pattern matching in LZW compressed text. In Proceedings of the 8th Data Compression Conference. 103--112. Google ScholarDigital Library
- Landau, G. M. and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algor. 10, 2, 157--169. Google ScholarDigital Library
- Mäkinen, V., Ukkonen, E., and Navarro, G. 2003. Approximate matching of run-length compressed strings. Algorithmica 35, 4, 347--369.Google ScholarCross Ref
- Matsumoto, T., Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 2000. Bit-Parallel approach to approximate string matching in compressed texts. In Proceedings of the 7th International Symposium on String Processing and Information Retrieval. 221--228. Google ScholarDigital Library
- McNaughton, R. and Yamada, H. 1960. Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. 9, 1, 39--47.Google ScholarCross Ref
- Myers, E. W. 1992. A four-russian algorithm for regular expression pattern matching. J. ACM 39, 2, 430--448. Google ScholarDigital Library
- Navarro, G. 2001. A guided tour to approximate string matching. ACM Comput. Surv. 33, 1, 31--88. Google ScholarDigital Library
- Navarro, G. 2003. Regular expression searching on compressed text. J. Discr. Algor. 1, 5-6, 423--443. Google ScholarDigital Library
- Navarro, G., Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 2001. Faster approximate string matching over compressed text. In Proceedings of the 11th Data Compression Conference. 459. Google ScholarDigital Library
- Navarro, G. and Raffinot, M. 1998. A general practical approach to pattern matching over Ziv-Lempel compressed text. Tech. rep. TR/DCC-98-12, Department of Computer Science, University of Chile.Google Scholar
- Sellers, P. 1980. The theory and computation of evolutionary distances: pattern recognition. J. Algor. 1, 4, 359--373.Google ScholarCross Ref
- Thompson, K. 1968. Programming techniques: Regular expression search algorithm. Comm. ACM 11, 6, 419--422. Google ScholarDigital Library
- Welch, T. A. 1984. A technique for high-performance data compression. IEEE Comput. 17, 6, 8--19. Google ScholarDigital Library
- Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23, 3, 337--343.Google ScholarDigital Library
- Ziv, J. and Lempel, A. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24, 5, 530--536.Google ScholarDigital Library
Index Terms
- Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
Recommendations
Efficient regular expression matching over hybrid dictionary-based compressed data
AbstractAs a core technique of deep packet inspection, regular expression matching effectively extracts valuable contents from network traffic. Its performance is crucial to network security and big data applications. However, mobile internet ...
Compressed Parameterized Pattern Matching
DCC '13: Proceedings of the 2013 Data Compression ConferenceTraditional pattern matching between strings, from the alphabet $\Sigma$, is well defined for both uncompressed and compressed sequences. Prior to this work, parameterized pattern matching (p-matching) was defined predominately by the matching between ...
Approximate string matching on Ziv-Lempel compressed text
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences ...
Comments