Abstract
We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches.
This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size ω, that is, m(⌈log2(k+1)⌉+1)≤ω. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step.
Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2 + k+4/m−k).
Preview
Unable to display preview. Download preview PDF.
References
K. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, December 1987.
A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18:333–340, 1975.
T. Akutsu. Approximate string matching with don't care characters. In M. Crochemore and D. Gusfield, editors, Lecture Notes in Computer Science, volume 807 of Combinatorial Pattern Matching (5th Annual Symposium, CPM94), pages 229–242. Springer-Verlag, 1994.
R. Baeza-Yates and G. H. Gonnet. Fast string matching with k mismatches. Technical Report CS-88-36, Data Structuring Group, September 1988.
R. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. 16th International colloquium on Automata, Languages and Programming. Stresa, Italy, July 1989.
R. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74–82, October 1992.
R. Baeza-Yates and C. H. Perleberg. Fast and practical approximate string matching. In Lecture Notes in Computer Science, volume 644 of Combinatorial Pattern Matching (3 th Annual Symposium, CPM92), pages 185–191. Springer-Verlag, 1992.
A. A. Bertossi and F. Logi. Parallel string matching with variable length don't cares. Journal of parallel and distributed computing, 22:229–234, 1994.
R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, October 1977.
M. J. Fischer and M. S. Paterson. String-matching and other products. In R. Karp, editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, R.I., 1974.
Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17:52–54, 1986.
R. Grossi and F. Luccio. Simple and efficient string matching with k mismatches. Inf. Proc. Letters, 3(33):113–120, November 1989.
D. E. Knuth, J. H. Morris, and V. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6:323–350, June 1977.
G. Kucherov and M. Rusinowitch. Matching a set of strings with variable length don't cares. In Z. Galil and E. Ukkonen, editors, Lecture Notes in Computer Science, volume 937 of 6th annual symposium, CPM95, pages 230–247. Espoo,Finland, Springer, July 1995.
G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoret. Comput. Sci., (43):239–249, 1986.
U. Manber and R. Baeza-Yates. An algorithm for string matching with a sequence of don't cares. Information Proceeding Letters, 37:133–136, 1991.
R. Y. Pinter. Efficient string matching whith don't-care patterns. In A. Apostolico and E.-V. Z. Galil, editors, Combinatorial Algorithms on Words, volume F12, pages 11–29. Springer-Verlag, 1985.
J. Tarhio and E. Ukkonen. Boyer-moore approach to approximate string matching. In J. R. Gilbert and R. G. Karlsson, editors, Lecture Notes in Computer Science, volume 447 of 2nd Scandinavian Workshop in Algorithmic Theory, SWAT'90, pages 348–359. Bergen, Norway, Springer-Verlag, July 1990.
E. Ukkonen. Approximate string-matching over suffix trees. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Lecture Notes in Computer Science, volume 684 of Combinatorial Pattern Matching (4 th Annual Symposium, CPM93), pages 240–249. Springer-Verlag, 1993.
S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83–91, October 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
El-Mabrouk, N., Crochemore, M. (1996). Boyer-Moore strategy to efficient approximate string matching. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_2
Download citation
DOI: https://doi.org/10.1007/3-540-61258-0_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive