Skip to main content

Boyer-Moore strategy to efficient approximate string matching

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1075))

Included in the following conference series:

Abstract

We propose a simple but efficient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches.

This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size ω, that is, m(⌈log2(k+1)⌉+1)≤ω. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step.

Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2 + k+4/m−k).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, December 1987.

    Article  Google Scholar 

  2. A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18:333–340, 1975.

    Google Scholar 

  3. T. Akutsu. Approximate string matching with don't care characters. In M. Crochemore and D. Gusfield, editors, Lecture Notes in Computer Science, volume 807 of Combinatorial Pattern Matching (5th Annual Symposium, CPM94), pages 229–242. Springer-Verlag, 1994.

    Google Scholar 

  4. R. Baeza-Yates and G. H. Gonnet. Fast string matching with k mismatches. Technical Report CS-88-36, Data Structuring Group, September 1988.

    Google Scholar 

  5. R. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. 16th International colloquium on Automata, Languages and Programming. Stresa, Italy, July 1989.

    Google Scholar 

  6. R. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74–82, October 1992.

    Google Scholar 

  7. R. Baeza-Yates and C. H. Perleberg. Fast and practical approximate string matching. In Lecture Notes in Computer Science, volume 644 of Combinatorial Pattern Matching (3 th Annual Symposium, CPM92), pages 185–191. Springer-Verlag, 1992.

    Google Scholar 

  8. A. A. Bertossi and F. Logi. Parallel string matching with variable length don't cares. Journal of parallel and distributed computing, 22:229–234, 1994.

    Article  Google Scholar 

  9. R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, October 1977.

    Google Scholar 

  10. M. J. Fischer and M. S. Paterson. String-matching and other products. In R. Karp, editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, R.I., 1974.

    Google Scholar 

  11. Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17:52–54, 1986.

    Article  Google Scholar 

  12. R. Grossi and F. Luccio. Simple and efficient string matching with k mismatches. Inf. Proc. Letters, 3(33):113–120, November 1989.

    Google Scholar 

  13. D. E. Knuth, J. H. Morris, and V. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6:323–350, June 1977.

    Article  Google Scholar 

  14. G. Kucherov and M. Rusinowitch. Matching a set of strings with variable length don't cares. In Z. Galil and E. Ukkonen, editors, Lecture Notes in Computer Science, volume 937 of 6th annual symposium, CPM95, pages 230–247. Espoo,Finland, Springer, July 1995.

    Google Scholar 

  15. G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoret. Comput. Sci., (43):239–249, 1986.

    Google Scholar 

  16. U. Manber and R. Baeza-Yates. An algorithm for string matching with a sequence of don't cares. Information Proceeding Letters, 37:133–136, 1991.

    Google Scholar 

  17. R. Y. Pinter. Efficient string matching whith don't-care patterns. In A. Apostolico and E.-V. Z. Galil, editors, Combinatorial Algorithms on Words, volume F12, pages 11–29. Springer-Verlag, 1985.

    Google Scholar 

  18. J. Tarhio and E. Ukkonen. Boyer-moore approach to approximate string matching. In J. R. Gilbert and R. G. Karlsson, editors, Lecture Notes in Computer Science, volume 447 of 2nd Scandinavian Workshop in Algorithmic Theory, SWAT'90, pages 348–359. Bergen, Norway, Springer-Verlag, July 1990.

    Google Scholar 

  19. E. Ukkonen. Approximate string-matching over suffix trees. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Lecture Notes in Computer Science, volume 684 of Combinatorial Pattern Matching (4 th Annual Symposium, CPM93), pages 240–249. Springer-Verlag, 1993.

    Google Scholar 

  20. S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83–91, October 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dan Hirschberg Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

El-Mabrouk, N., Crochemore, M. (1996). Boyer-Moore strategy to efficient approximate string matching. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-61258-0_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61258-2

  • Online ISBN: 978-3-540-68390-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics