Fast-Match: Fast Affine Template Matching

Korman, Simon; Reichman, Daniel; Tsur, Gilad; Avidan, Shai

doi:10.1007/s11263-016-0926-1

Fast-Match: Fast Affine Template Matching

Published: 12 July 2016

Volume 121, pages 111–125, (2017)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Simon Korman^1,2,
Daniel Reichman^3,4,
Gilad Tsur^4,5 &
…
Shai Avidan⁶

1627 Accesses
38 Citations
Explore all metrics

Abstract

Fast-Match is a fast algorithm for approximate template matching under 2D affine transformations that minimizes the Sum-of-Absolute-Differences (SAD) error measure. There is a huge number of transformations to consider but we prove that they can be sampled using a density that depends on the smoothness of the image. For each potential transformation, we approximate the SAD error using a sublinear algorithm that randomly examines only a small number of pixels. We further accelerate the algorithm using a branch-and-bound-like scheme. As images are known to be piecewise smooth, the result is a practical affine template matching algorithm with approximation guarantees, that takes a few seconds to run on a standard machine. We perform several experiments on three different datasets, and report very good results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Registration by Boundary Sampling and Linear Programming

A Flexible Scheme for Constructing (Quasi-)Invariant Signal Representations

Morphological Template Matching in Color Images

Notes

The algorithm is not restricted to square images but we discuss these for simplicity throughout the article.
The symbol $\tilde{\varTheta }$ hides (low order) logarithmic factors.
Arguments are similar for orientation-reversing transformations (which include reflection).
Source-code and extended results are available at www.eng.tau.ac.il/~simonk/FastMatch.
Unlike our method, such feature based methods do not directly produce a geometric mapping. These can be found, based on good quality sets of matching points, using robust methods such as RANSAC Fischler and Bolles (1981) by assuming a known geometric model that relates the images (e.g. affine).
Note that the 20 % has a slightly different meaning in the different criterions.
Note that in the 3 distortion types, the lowest degradation level is equivalent to no degradation at all.
ASIFT is based on SIFT, which has been shown in Mikolajczyk and Schmid (2005) to be prominent in its resilience to image blur, with respect to other descriptors.
Note that because we are approximating a projective transformation using an affine one (which means matching a general quadrilateral using a parallelogram), the optimal overlap error may be far >0.
This issue has been extensively discussed in Mikolajczyk et al. (2005).

References

Alexe, B., Petrescu, V., & Ferrari, V. (2011). Exploiting spatial overlap to efficiently compute appearance distances between image windows. In Advances in neural information processing systems (NIPS) (pp. 2735–2743).
Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching. Journal of the ACM, 45(6), 891–923.
Article MathSciNet MATH Google Scholar
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision (IJCV), 56(3), 221–255.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision (IJCV), 88(2), 303–338.
Article Google Scholar
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Article MathSciNet Google Scholar
Fredriksson, K. (2001). Rotation invariant template matching. Ph.D. thesis, University of Helsinki.
Fuh, C. S., & Maragos, P. (1991). Motion displacement estimation using an affine model for image matching. Optical Engineering, 30(7), 881–887.
Article Google Scholar
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
MATH Google Scholar
Kim, H.Y., & De Araújo, S.A. (2007). Grayscale template-matching invariant to rotation, scale, translation, brightness and contrast. In Advances in image and video technology (AIVT) (pp. 100–113). Springer
Kleiner, I., Keren, D., Newman, I., & Ben-Zwi, O. (2011). Applying property testing to an image partitioning problem. Pattern Analysis and Machine Intelligence (PAMI), 33(2), 256–265.
Article Google Scholar
Korman, S., Reichman, D., & Tsur, G. (2011). Tight approximation of image matching. arXiv preprint arXiv:1111.1713.
Korman, S., Reichman, D., Tsur, G., & Avidan, S. Fast-Match webpage. www.eng.tau.ac.il/~simonk/FastMatch.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2), 91–110.
Article Google Scholar
Lucas, B.D., Kanade, T., et al. (1981) An iterative image registration technique with an application to stereo vision. In IJCAI (Vol. 81, pp. 674–679).
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. Pattern Analysis and Machine Intelligence (PAMI), 27(10), 1615–1630.
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., et al. (2005). A comparison of affine region detectors. International Journal of Computer Vision (IJCV), 65(1–2), 43–72.
Article Google Scholar
Morel, J. M., & Yu, G. (2009). ASIFT: A new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, 2(2), 438–469.
Article MathSciNet MATH Google Scholar
Muja, M., & Lowe, D. G. (2014). Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2227–2240.
Article Google Scholar
Ouyang, W., Tombari, F., Mattoccia, S., Di Stefano, L., & Cham, W. K. (2012). Performance evaluation of full search equivalent pattern matching algorithms. Pattern Analysis and Machine Intelligence (PAMI), 34(1), 127–143.
Article Google Scholar
Pele, O., & Werman, M. (2007). Accelerating pattern matching or how much can you slide?. In Asian conference on computer vision (ACCV) (pp. 435–446). Springer.
Raskhodnikova, S. (2003). Approximate testing of visual properties. In Workshop on randomization and approximation techniques in computer science, (RANDOM) (pp. 370–381).
Rucklidge, W. (1997). Efficient guaranteed search for gray-level patterns. In Computer vision and pattern recognition (CVPR) (pp. 717–723). IEEE.
Seitz, S.M., & Baker, S. (2009) Filter flow. In International conference on computer vision (ICCV) (pp. 143–150). IEEE.
Shao, H., Svoboda, T., & Van Gool, L. (2003). Zubud-zurich buildings database for image based recognition. Swiss Federal Institute of Technology, Switzerland, Technical Report (Vol. 260).
Tian, Y., & Narasimhan, S. G. (2012). Globally optimal estimation of nonrigid image distortion. International Journal of Computer Vision (IJCV), 98(3), 279–302.
Article MathSciNet MATH Google Scholar
Tsai, D. M., & Chiang, C. H. (2002). Rotation-invariant pattern matching using wavelet decomposition. Pattern Recognition Letters, 23(1), 191–201.
Tsur, G., & Ron, D. (2010). Testing properties of sparse images. In Symposium on Foundations of Computer Science (FOCS) (pp. 468–477). IEEE.
van der Schaaf, A., & van Hateren, J. (1996). Modelling the power spectra of natural images: Statistics and information. Vision Research, 36(17), 2759–2770.
Article Google Scholar
Wang, Q., & You, S. (2007). Real-time image matching based on multiple view kernel projection. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8). IEEE.
Yao, C. H., & Chen, S. Y. (2003). Retrieval of translated, rotated and scaled color textures. Pattern Recognition, 36(4), 913–929.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of California, Los Angeles, CA, USA
Simon Korman
Tel Aviv University, Tel Aviv, Israel
Simon Korman
Institute of Cognitive and Brain Sciences, University of California, Berkeley, CA, USA
Daniel Reichman
Weizmann Institute, Rehovot, Israel
Daniel Reichman & Gilad Tsur
Yahoo Research Labs, 31905, Haifa, Israel
Gilad Tsur
School of Electrical Engineering, Tel Aviv University, Tel Aviv, Israel
Shai Avidan

Authors

Simon Korman
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Reichman
View author publications
You can also search for this author in PubMed Google Scholar
Gilad Tsur
View author publications
You can also search for this author in PubMed Google Scholar
Shai Avidan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Korman.

Additional information

Communicated by Xiaoou Tang.

This work was supported by the Israel Science Foundation (Grant No. 873/08, in part) and the Ministry of Science and Technology.

Appendix: Proof of Theorem 1

We first restate Theorem 1 for completeness:

Let $I_1,I_2$ be images with dimensions $n_1$ and $n_2$ and let $\delta $ be a constant in (0, 1]. For a transformation $T'$, let T be the closest transformation to $T'$ in the net ${\mathcal {N}}_{\delta }$ (which is a $\frac{\delta \cdot n_1^2}{{\mathcal {V}}}$ -cover). It holds that: $\;|\varDelta _{T'}(I_1,I_2)-\varDelta _T(I_1,I_2)| \le O(\delta )$.

To understand why the claim holds we refer the reader to Fig. 12. Two close transformations $T, T'$ map the template to two close parallelograms in the target image. Most of the error of the mapping $T'$ is with respect to the area in the intersection of these parallelograms (the yellow region in Fig. 12). This error cannot be greater than the total variation multiplied by the distance between the transformations T and $T'$, as shown below. The rest of the error originates in the area mapped to by $T'$ that is not in the intersection (the green region). The size of this area is also bounded by the distance between the transformations. Thus, the distance between the transformations, and the total variation, bound the difference in error between T and $T'$. This is formalized in the remainder of the section.

For convenience, throughout the discussion of the algorithm’s guarantees we consider points in a continuous image plane instead of discrete pixels. Analyzing the problem in the continuous domain makes the theorem simpler to prove, avoiding several complications that arise due to the discrete sampling, most notable, that several pixels might be mapped to a single pixel. We refer the reader to a (slightly more involved) proof in the discrete domain, which we made available in a previous manuscript (Korman et al. 2011).

In order to switch to the continuous domain, we give some definitions and state some claims for points in the image plane. We begin by relating the intensity of points to that of pixels.

Definition 1

The intensity of a point $p = (x,y)$ in the image plane (denoted $I_1(p)$) is defined as that of the pixel $q = ([x],[y])$, where $[\cdot ]$ refers to the ‘floor’ operation. The point p is said to land in q.

We now define the variation of a point and relate it to the variation of a pixel.

Definition 2

The variation of a point p, which we denote v(p), is $\max _{q~:~d(p,q) \le 1} |I_1(p)-I_1(q)|$. Note that this is upper-bounded by the variation of the pixel that p lands in. For convenience of computation (this does not change the asymptotic results), for points p that have a distance of <1 from the boundary of the image, we define $v(p) = 1$.

Finally, we define the total variation of an image in terms of the total variation of points in the image plane.

Definition 3

The total variation of an image (or template) $I_1$ is $\int _{I_1}v(p)$. We denote this value ${\mathcal {V}}$. Note that this is upper bounded by the total variation computed over the pixels.

Our strategy towards proving Theorem 1 involves two ideas. First, instead of working with the pair of transformations T and $T'$, we will more conveniently (and we show the equivalence) work with the identity transformation I and the concatenated transformation $T'^{-1}T$. Second, note that in Theorem 1, we bound the difference in error between transformations T and $T'$, which are $\delta n_1$ apart. A simplifying approach, is to ‘relate’ the transformations T and $T'$ through a series of transformations $\{T_i\}_{i=1}^m$ (where $T_0 = T$ and $T_m = T'$), which are each at most at a unit distance apart, with $m = O(\frac{\delta \cdot n_1^2}{{\mathcal {V}}})$. Thus, in Claim 6 we handle the case of transformations that are a unit distance apart.

In the following lemmas we introduce a constant u, such that if $\ell _\infty (T,T') \le u$ it holds that $\ell _\infty (T^{-1},T'^{-1}) \le 1$.

Claim 4

Given affine transformations $T, T'$ with scaling factors in the range [1 / c, c] such that $\ell _\infty (T,T')\le \frac{\delta \cdot n_1^2}{{\mathcal {V}}}$, it holds that $\ell _\infty (T^{-1},T'^{-1}) = O(\frac{\delta \cdot n_1^2}{{\mathcal {V}}})$.

Proof

To see that the claim above holds, consider a point q and we will show that $||T'^{-1}(q) - T^{-1}(q)|| \le c \frac{\delta \cdot n_1^2}{{\mathcal {V}}}$ (see Fig. 13). Let $p' = T'^{-1}(q)$ and let $p = T^{-1}(q)$. We wish to bound $||p'-p||$. Let $r = T(p')$. We get $||p' - p|| = ||T^{-1}r - T^{-1}q|| = ||T^{-1}(r-q)|| \le c||r-q|| \le c\frac{\delta \cdot n_1^2}{{\mathcal {V}}}$. $\square $

Claim 5

There exists a value $u\in (0,1)$ such that for any affine transformations $T, T'$ where $\ell _\infty (T,T')\le u$ and for any point $p\in I_1$, it holds that $||p, T'^{-1}(T(p))|| \le 1$.

The correctness of Claim 5 follows directly from Claim 4 by noting that $p = T^{-1}(T(p))$.

Claim 6

Let $I_1,I_2$ be images with dimensions $n_1$ and $n_2$. There exists a constant $u\in (0,1)$ for which the following holds. For any two affine transformations T and $T'$ such that $\ell _\infty (T,T')\le u$:

$$\begin{aligned} |\varDelta _{T'}(I_1,I_2)-\varDelta _T(I_1,I_2)| \le O\Big (\frac{{{\mathcal {V}}}}{n_1{}^2}\Big ){.} \end{aligned}$$

Note that the value $O\Big (\frac{{{\mathcal {V}}}}{n_1{}^2}\Big )$, bounds the difference in error for two transformations that have unit distance. This scales to the value $O(\delta )$ that appears in Claim 1 for transformations that have a distance of $\frac{\delta \cdot n_1^2}{{\mathcal {V}}}$.

Proof

Using the triangle inequality we can write:

$$\begin{aligned}&\Big |\varDelta _{T'}(I_1,I_2) - \varDelta _T(I_1,I_2)\Big | \\&\quad =\Big | \int _{I_1} |I_1(p) - I_2(T'(p))| - \int _{I_1} |I_1(p) - I_2(T(p))| \Big | \\&\quad \le \int _{I_1} | |I_1(p) - I_2(T'(p))| - |I_1(p) - I_2(T(p))| | \end{aligned}$$

where integrals go over points p in the template $I_1$.

We now bound this sum. As we know that $\ell _\infty (T, T') \le u$, we know that only points that have a distance of at most 1 (as $u \le 1$) from the boundary of $I_1$ are mapped to ‘new’ areas of $I_2$ - areas to which no point from $I_1$ was mapped before. Each of these points has an error of 1 at worst (this is the greatest distance possible between intensities from 0 to 1). The total area of such points is $O(n_1)$, and thus they contribute $O(n_1)$ to the difference between $\varDelta _{T'}(I_1,I_2)$ and $\varDelta _T(I_1,I_2)$, before normalization. This is equal to their contribution to the total variation.

For the remaining points (that have distance >1 from the boundary of $I_1$), under T each such point p is mapped to a point T(p), and the pre-image of that point $T'^{-1}(T(p))$, is in the area of $I_1$. Instead of considering the value $E_{T, I_1, I_2}(p)$ for each such point p in $I_1$, consider instead the error over each point $q = T(p)$ in $I_2$ that has points mapped to it both by T and by $T'$. The distance between p and $T'^{-1}(T(p))$ is at most 1 (as seen in Claim 5), and the value of p and of $T'^{-1}(T(p))$ differ by at most v(p), and thus $|I_2(q)-I_1(p)| - |I_2(q)-I_1(T'^{-1}(q))| \le v(p)$ (By a triangle inequality). Thus, for points that have a distance >1 from the boundary of $I_1$, the affect on the difference $|\varDelta _{T'}(I_1,I_2)-\varDelta _T(I_1,I_2)|$ for each point p is at most v(p) and thus the total contribution is bounded by ${{\mathcal {V}}}$.

Summing both contributions and normalizing by $n_1{}^2$ we obtain ${|\varDelta _{T'}(I_1,I_2) - \varDelta _T(I_1,I_2)| = O ({{\mathcal {V}}} / n_1{}^2)}$ as required. $\square $

However, not all transformations have a distance of u from the net. We now turn to the goal of this section, proving Theorem 1.

Proof

As T is the closest transformation to $T'$ it holds that $\ell _\infty (T,T')$ $\le \frac{\delta \cdot n_1^2}{{\mathcal {V}}}$. Furthermore, from the construction that is summarized in Claim 2 we have that $T = Tr R_2 S R_1$ and $T' \in {\mathcal {N}}_\delta $ where $T' = Tr' R_2' S' R_1'$ such that $d(Tr, Tr') \le \frac{\delta \cdot n_1^2}{{\mathcal {V}}}, \ldots d(R_1, R_1') \le \frac{\delta \cdot n_1^2}{{\mathcal {V}}}$. Now consider a series of transformations $\{T_i\}_{i=1}^m$ where $T_0 = T$ and $T_m = T'$. For each transformation $T_{i +1}$ it will hold that $\ell _\infty (T_i, T_{i +1}) $ $\le u$ (for the constant u from Claim 6). For such a series, repeated use of Claim 6 (and of the triangle inequality) will give us that

$$\begin{aligned} |\varDelta (T) - \varDelta (T')| = |\varDelta (T_0) - \varDelta (T_m)| \le O\Big (\frac{m{{\mathcal {V}}}}{n_1{}^2}\Big ){.} \end{aligned}$$

To construct such a series of transformations we first add (or subtract) u from the translation matrix until it changes from Tr to $Tr'$. This takes $O(\frac{\delta \cdot n_1^2}{{\mathcal {V}}})$ steps. We then change the rotation matrix beginning with $R_2$ by $u/n_1$ for $O(\frac{\delta \cdot n_1^2}{{\mathcal {V}}})$ steps until we get to $R'_2$. We proceed like this and after $m = O(\frac{\delta \cdot n_1^2}{{\mathcal {V}}})$ steps transition from T to $T'$, giving us the required bound of $O(\delta )$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Korman, S., Reichman, D., Tsur, G. et al. Fast-Match: Fast Affine Template Matching. Int J Comput Vis 121, 111–125 (2017). https://doi.org/10.1007/s11263-016-0926-1

Download citation

Received: 12 January 2015
Accepted: 23 June 2016
Published: 12 July 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11263-016-0926-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast-Match: Fast Affine Template Matching

Abstract

Access this article

Similar content being viewed by others

Fast Registration by Boundary Sampling and Linear Programming

A Flexible Scheme for Constructing (Quasi-)Invariant Signal Representations

Morphological Template Matching in Color Images

Notes

References