Abstract
We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size m×m in a text of size n×n with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O(m 2 n 2), this is in general too expensive for small k. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is O(n 2 klogσ m /m 2) for k < m(m+l)/(5logσ m), which is optimal and matches the best previous result which allows only substitutions. For higher error levels, we present an algorithm with time complexity O(n 2 k/(w√σ)) (where w is the size in bits of the computer word and σ is the alphabet size). This algorithm works for k < m(m+1)(l−e/√σ), where e=2.718..., a limit which is not possible to improve. These are the first good expected-case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text has a different length.
Support from Fondecyt grants 1-95-0622 and 1-96-0881 are gratefully acknowledged.
Preview
Unable to display preview. Download preview PDF.
References
A. Amir and M. Farach. Efficient 2-dimensional approximate matching of nonrectangular figures. In Proc. SODA '91, pages 212–223, 1991.
A. Amir and G. Landau. Fast parallel and serial multidimensional approximate array matching. Theoretical Computer Science, 81:97–115, 1991.
R. Baeza-Yates. Similarity in two dimensional strings. Dept. of Computer Science, University of Chile, 1996.
R. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Proc. CPM'96, LNCS 1075, pages 1–23, 1996. ftp://ftp.dcc.uchile.cl/-pub/users/gnavarro/cpm96.ps.gz.
R. Baeza-Yates and G. Navarro. Multiple approximate string matching. In Proc. WADS'97, LNCS 1272, pages 174–184, 1997. ftp://ftp.dcc.uchile.cl/pub/-users/gnavarro/vads97.ps.gz.
R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, LNCS 644, pages 185–192, 1992.
R. Baeza-Yates and M. Régnier. Fast two dimensional pattern matching. Information Processing Letters, 45:51–57, 1993.
T. Baker. A technique for extending rapid exact string matching to arrays of more than one dimension. SIAM Journal on Computing, 7:533–541, 1978.
R. Bird. Two dimensional pattern matching. Inf. Proc. Letters, 6:168–170, 1977.
W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. CPM'92, LNCS 644, pages 172–181, 1992.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, Oxford, UK, 1994.
J. Karkkäinen and E. Ukkonen. Two and higher dimensional pattern matching in optimal expected time. In Proc. SODA '94, pages 715–723. SIAM, 1994.
K. Krithivasan. Efficient two-dimensional parallel and serial approximate pattern matching. Technical Report CAR-TR-259, University of Maryland, 1987.
K. Krithivasan and R. Sitalakshmi. Efficient two-dimensional pattern matching in the presence of errors. Information Sciences, 43:169–184, 1987.
G. Landau and U. Vishkin. Fast string matching with k differences. J. of Computer Systems Science, 37:63–78, 1988.
R. Muth and U. Manber. Approximate multiple string search. In Proc. CPM'96, LNCS 1075, pages 75–86, 1996.
G. Navarro. Multiple approximate string matching by counting. In Proc. WSP'97, pages 125–139, 1997. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/-wsp97.1.ps.gz.
K. Park. Analysis of two dimensional approximate pattern matching algorithms. In Proc. CPM'96, LNCS 1075, pages 335–347, 1996.
P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.
E. Sutinen and J. Tarhio. On using g-gram locations in approximate string matching. In Proc. ESA '95, LNCS 979, 1995.
Esko Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.
S. Wu and U. Manber. Fast text searching allowing errors. CACM, 35(10):83–91, October 1992.
S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R., Navarro, G. (1998). Fast two-dimensional approximate pattern matching. In: Lucchesi, C.L., Moura, A.V. (eds) LATIN'98: Theoretical Informatics. LATIN 1998. Lecture Notes in Computer Science, vol 1380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054334
Download citation
DOI: https://doi.org/10.1007/BFb0054334
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64275-6
Online ISBN: 978-3-540-69715-2
eBook Packages: Springer Book Archive