Abstract
This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |∑|) time and O(|∑|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, where n is the compressed text length, m is the pattern length, and r is the number of the pattern occurrences. Experimental results show that it runs approximately 1.5 times faster than a decompression followed by a simple search using the Shift-And algorithm. Moreover, the algorithm can be extended to the generalized pattern matching, to the pattern matching with k mismatches, and to the multiple pattern matching, like the Shift-And algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, December 1987.
A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. Data Compression Conference, page 279, 1992.
A. Amir and G. Benson. Two-dimensional periodicity and its application. In Proc. 3rd Symposium on Discrete Algorithms, page 440, 1992.
A. Amir, G. Benson, and M. Farach. Optimal two-dimensional compressed matching. In Proc. 21st International Colloquium on Automata, Languages and Programming, 1994.
A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Computer and System Sciences, 52:299–307, 1996.
A. Amir, G.M. Landau, and U. Vishkin. Efficient pattern matching with scaling. Journal of Algorithms, 13(1):2–32, 1992.
R. Baeza-Yaltes and G.H. Gonnet. A new approach to text searching. Comm. ACM, 35(10):74–82, October 1992.
T. Eilam-Tzoreff and U. Vishkin. Matching patterns in a string subject to multilinear transformations. In Proc. International Workshop on Sequences, Combinatorics, Compression, Security and Transmission, 1988.
M. Farach and M. Thorup. String-matching in Lempel-Ziv compressed strings. In 27th ACM STOC, pages 703–713, 1995.
Z. Galil and R. Giancarlo. Data structures and algorithms for approximate string matching. Journal of Complexity, 4:33–72, 1988.
L. Gąsieniec, M. Karpinski, W. Plandowski, and W. Rytter. Efficient algorithms for Lempel-Ziv encoding. In Proc. 4th Scandinavian Workshop on Algorithm Theory, volume 1097 of Lecture Notes in Computer Science, pages 392–403. Springer-Verlag, 1996.
M. Karpinski, W. Rytter, and A. Shinohara. An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing, 4:172–186, 1997.
T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. In J.A. Atorer and M. Cohn, editors, Proc. of Data Compression Conference’ 98, pages 103–112. IEEE Computer Society, March 1998.
U. Manber. A text compression scheme that allows fast searching directly in the compressed file. In Proc. 5th Annu. Symp. Combinatorial Pattern Matching, volume 807 of Lecture Notes in Computer Science, pages 113–124. Springer-Verlag, 1994.
M. Miyazaki, A. Shinohara, and M. Takeda. An improved pattern matching algorithm for strings in terms of straight-line programs. In Proc. 8th Annu. Symp. Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 1–11. Springer-Verlag, 1997.
T.A. Welch. A technique for high performance data compression. IEEE Comput., 17:8–19, June 1984.
S. Wu and U. Manber. Fast text searching allowing errors. Comm. ACM, 35(10):83–91, October 1992.
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory, IT-23(3):337–349, May 1977.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kida, T., Takeda, M., Shinohara, A., Arikawa, S. (1999). Shift-And Approach to Pattern Matching in LZW Compressed Text. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_1
Download citation
DOI: https://doi.org/10.1007/3-540-48452-3_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66278-5
Online ISBN: 978-3-540-48452-3
eBook Packages: Springer Book Archive