Skip to main content
Log in

Accelerated parametric chamfer alignment using a parallel, pipelined GPU realization

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Parametric chamfer alignment (PChA) is commonly employed for aligning an observed set of points with a corresponding set of reference points. PChA estimates optimal geometric transformation parameters that minimize an objective function formulated as the sum of the squared distances from each transformed observed point to its closest reference point. A distance transform enables efficient computation of the (squared) distances, and the objective function minimization is commonly performed via the Levenberg–Marquardt (LM) nonlinear least squares iterative optimization algorithm. The point-wise computations of the objective function, gradient, and Hessian approximation required for the LM iterations make PChA computationally demanding for large-scale datasets. We propose an acceleration of the PChA via a parallelized and pipelined realization that is particularly well suited for large-scale datasets and for modern GPU architectures. Specifically, we partition the observed points among the GPU blocks and decompose the expensive LM calculations in correspondence with the GPU’s single-instruction multiple-thread architecture to significantly speed up this bottleneck step for PChA on large-scale datasets. Additionally, by reordering computations, we propose a novel pipelining of the LM algorithm that offers further speedup by exploiting the low arithmetic latency of the GPU compared with its high global memory access latency. Results obtained on two different platforms for both 2D and 3D large-scale point datasets from our ongoing research demonstrate that the proposed PChA GPU implementation provides a significant speedup over its single CPU counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Later in this paper, in Fig. 11, we provide results from profiling a single CPU implementation of PChA that empirically demonstrates that the components of PChA that we select for GPU implementation represent a substantial part of the execution time for the complete single implementation.

  2. We augment the points in the OS to ensure \(N_p \,=\, N_bN_t\).

  3. For clarity, we omit the final per-grid reduction summation operations from Algorithm 3 and assume that the kernel in Algorithm 2 will return \(\{\mathbf {g},\; {\mathbf {H}},\; f\}\).

  4. The CPU(1) implementation offers performance close to but not identical to that of a CPU implementation that is obtained by completely eliminating the OpenMP compiler directives from the code.

  5. The DT is calculated using the method in [25].

References

  1. Liu, M.Y, Tuzel, O., Veeraraghavan, A., Chellappa, R.,: Fast directional chamfer matching. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1696–1703, June 2010

  2. Jiang, H., Holton, K.S., Robb, R.A.: Image registration of multimodality 3-D medical images by chamfer matching. In: SPIE/IS&T 1992 Symposium on Electronic Imaging: Science and Technology, pp. 356–366. International Society for Optics and Photonics, 1992

  3. Chi, Y.T.,Shahed, S.M.N.,Ho, J., Yang, M.H.: Higher dimensional affine registration and vision applications. In: Proceedings European Conference on Computer Vision, pp. 256–269

  4. Boughorbel, Faysal, Mercimek, Muharrem, Koschan, Andreas, Abidi, Mongi: A new method for the registration of three-dimensional point-sets: the Gaussian fields framework. Comput. Vis. Image Underst. 28(1), 124–137 (2010)

    Article  Google Scholar 

  5. Gressin, Adrien, Mallet, Clment, Demantk, Jrme, David, Nicolas: Towards 3D lidar point cloud registration improvement using optimal neighborhood knowledge. J. Photogramm. Remote Sens. 79, 240–251 (2013)

    Article  Google Scholar 

  6. Danelljan, M., Meneghetti, G., Shahbaz Khan, F., Felsberg, M.: A probabilistic framework for color-based point set registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1818–1826, June 2016

  7. Ding, L., Elliethy, A., Freedenberg, E., Wolf-John-son, S.A., Romphf, J., Christensen, P., Sharma, G.: Comparative analysis of homologous buildings using range imaging. In: IEEE International Conference on Image Processing, pp. 4378–4382, Sep 2016

  8. Elliethy, A., Sharma, G.: Vector road map registration to oblique wide area motion imagery by exploiting vehicles movements. In: IS&T Electronic Imaging: Video Surveillance and Transportation Imaging Applications, pp. VSTIA–520.1–8, San Francisco, Cal-ifornia, 2016a. URL http://ist.publisher.ingentaconnect.com/contentone/ist/ei/2016/00002016/00000003/art00008

  9. Elliethy, A., Sharma, G.: Automatic registration of vector road maps with wide area motion imagery by exploiting vehicle detections. IEEE Trans. Image Process. 25(11), 5304–5315 (2016). doi:10.1109/TIP.2016.2601265

    Article  MathSciNet  MATH  Google Scholar 

  10. Besl, P.J., McKay, H.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)

    Article  Google Scholar 

  11. Zhang, Zhengyou: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)

    Article  Google Scholar 

  12. Myronenko, A., Song, X.: Point set registration: coherent point drift. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2262–2275 (2010)

    Article  Google Scholar 

  13. Sofka, M., Yang, G., Stewart C.V.: Simultaneous covariance driven correspondence (CDC) and transformation estimation in the expectation maximization framework. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007

  14. Fitzgibbon, A.W.: Robust registration of 2D and 3D point sets. Image Vis. Comput. 21(1314), 1145–1153 (2003)

    Article  Google Scholar 

  15. Rouhani, M., Sappa, A.D.: Correspondence free registration through a point-to-model distance minimization. In: IEEE International Conference Computer Vision, pp. 2150–2157, Nov 2011

  16. Borgefors, Gunilla: Distance transformations in digital images. Comput. Vis. Graph. Image Proc. 34(3), 344–371 (1986)

    Article  Google Scholar 

  17. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  18. Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: two new techniques for image matching. In: Proceeding International Joint Conference on Artificial Intelligence, pp. 659–663, 1977

  19. C. Sigg, R. Peikert, and M. Gross. Signed distance transform using graphics hardware. In: IEEE Visualization, pp. 83–90, Oct 2003

  20. Cao, T.T., Tang, K., Mohamed, A., Tan, T.S.: Parallel banding algorithm to compute exact distance transform with the GPU. In: Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 83–90, ACM, New York, 2010

  21. Zhu, Xiang, Zhang, Dianwen: Efficient parallel Levenberg–Marquardt model fitting towards real-time automated parametric imaging microscopy. PloS one 8(10), e76665 (2013)

    Article  Google Scholar 

  22. Li, B., Young, A.A, Cowan, B.R.: GPU accelerated non-rigid registration for the evaluation of cardiac function. In: Medical Image Computing and Computer-Assisted Intervention, pp. 880–887. Springer, Berlin, 2008

    Chapter  Google Scholar 

  23. Amorim, R., Haase, G., Liebmann, M., Weber dos Santos, R.: Comparing CUDA and OpenGL implementations for a Jacobi iteration. In: IEEE International Conference High Performance Computing Simulation (2009)

  24. Architectural Biometrics Project. https://architecturalbiometrics.com/

  25. Felzenszwalb. P. Huttenlocher, D.: Distance transforms of sampled functions. Technical Report TR2004-1963, Cornell University (2004). URL https://ecommons.cornell.edu/handle/1813/5663

  26. Kirk, D.B., Wen-mei, W.H.: Programming Massively Parallel Processors: A Hands-on Approach, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

  27. Harris M.: Optimizing parallel reduction in CUDA. NVIDIA Corporation (2007). http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

  28. The OpenMP API specification for parallel programming. http://www.openmp.org/

  29. University of Rochester, BlueHive Cluster. https://info.circ.rochester.edu/BlueHive/System_Overview.html

  30. CorvusEye\(^{\text{TM}}\)1500 Data Sheet. http://www.exelisinc.com/solutions/corvuseye1500/Documents/CorvusEye500DataSheetAUG14

  31. Szeliski, R., Shum, H.Y.: Creating full view panoramic image mosaics and environment maps. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’97, pp. 251–258, 1997

Download references

Acknowledgements

We thank Bernard Brower of Harris Corporation for making available the CorvusEye [30] WAMI datasets used for demonstrating PChA on real-world 2D datasets. We also thank our colleagues from the Architectural Bio-metrics project for providing the 3D datasets of building models and lidar scans that are used in our evaluation. We also thank the Center for Integrated Research Computing (CIRC), University of Rochester, for providing access to computational resources for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Elliethy.

Appendix: Compositional approach for projective transformation estimation

Appendix: Compositional approach for projective transformation estimation

The Jacobian matrix elements \(\{J^c_{j,l}\}\) associated with the projective transformation in Table 1 require a division operation per-element, which is computationally expensive. To simplify the Jacobian calculations, we adopt a compositional approach [31] that eliminates the division operations and also enables further simplifications. The compositional approach for our 2D point set alignment by projective transformation proceeds as follows:

  • The projective transformation defined by the current estimate \({\varvec{\alpha }}_t\) of the parameters is applied to each OS point \({\mathbf {p}}_j\) to obtain a corresponding warped point \({\mathbf {p'}}_j \,=\, {\mathcal {T}}_{{\varvec{\alpha }}_t}\left( {\mathbf {p}}_j\right)\).

  • Each LM iteration, then estimates the incremental parameter update \({\varvec{\delta }}\) that minimizes

    $$\begin{aligned} f({\varvec{\delta }}) \,=\, \sum \limits _{j=1}^{N_p} \left|| {\mathbf {r}}\left( {\mathcal {T}}_{\left( {\varvec{\alpha }}^I+{\varvec{\delta }}\right) }\left( {\mathbf {p'}}_j\right) \right) \right||^2 , \end{aligned}$$
    (12)

    where \({\varvec{\alpha }}^I\,=\,\left[ 1,0,0,0,1,0,0,0 \right]\) is the parameter vector that corresponds to the identity transformation, i.e., \({\mathcal {T}}_{{\varvec{\alpha }}^I} \left( {\mathbf {p'}}_j \right) =\,{\mathbf {p'}}_j\).

  • The updated projective transformation is obtained as

    $$\begin{aligned} {\mathcal {T}}_{{\varvec{\alpha }}_{t+1}} \,=\, {\mathcal {T}}_{{\varvec{\alpha }}_t} \circ {\mathcal {T}}_{{\varvec{\delta }}}, \end{aligned}$$
    (13)

    where \(\circ\) denotes composition, or equivalently multiplication of the corresponding matrix representations.

Considerable simplification of the Jacobian matrix calculation is obtained because the calculation is performed at \({\varvec{\alpha }}^I\), where the term \(w_j\) in Table 1 becomes unity, eliminating the need for division operations. Specifically, the Jacobian matrix \({\mathbf {J}}_j\) at the transformed point \({\mathbf {p'}}_j\equiv {\mathcal {T}}_{{\varvec{\alpha }}^I}({\mathbf {p'}}_j)\), is computed as

$$\begin{aligned} {\mathbf {J}}_j \,=\, \frac{\partial {\mathcal {T}}_{{\varvec{\alpha }}}({\mathbf {p'}}_j)}{\partial {\varvec{\alpha }}}\bigg |_{{\varvec{\alpha }}\,=\,{\varvec{\alpha }}^I} \,=\, \begin{pmatrix} x'&{}y'&{}1&{}0&{}0&{}0&{}-x'^2&{}-x'y'\\ 0&{}0&{}0&{}x'&{}y'&{}1&{}-x'y'&{}-y'^2 \end{pmatrix}. \end{aligned}$$
(14)

The Hessian matrix approximation elements are shown in Table 2, where additional simplifications are also incorporated.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elliethy, A., Sharma, G. Accelerated parametric chamfer alignment using a parallel, pipelined GPU realization. J Real-Time Image Proc 16, 1661–1680 (2019). https://doi.org/10.1007/s11554-017-0668-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-017-0668-5

Keywords

Navigation