Abstract
Parametric chamfer alignment (PChA) is commonly employed for aligning an observed set of points with a corresponding set of reference points. PChA estimates optimal geometric transformation parameters that minimize an objective function formulated as the sum of the squared distances from each transformed observed point to its closest reference point. A distance transform enables efficient computation of the (squared) distances, and the objective function minimization is commonly performed via the Levenberg–Marquardt (LM) nonlinear least squares iterative optimization algorithm. The point-wise computations of the objective function, gradient, and Hessian approximation required for the LM iterations make PChA computationally demanding for large-scale datasets. We propose an acceleration of the PChA via a parallelized and pipelined realization that is particularly well suited for large-scale datasets and for modern GPU architectures. Specifically, we partition the observed points among the GPU blocks and decompose the expensive LM calculations in correspondence with the GPU’s single-instruction multiple-thread architecture to significantly speed up this bottleneck step for PChA on large-scale datasets. Additionally, by reordering computations, we propose a novel pipelining of the LM algorithm that offers further speedup by exploiting the low arithmetic latency of the GPU compared with its high global memory access latency. Results obtained on two different platforms for both 2D and 3D large-scale point datasets from our ongoing research demonstrate that the proposed PChA GPU implementation provides a significant speedup over its single CPU counterpart.
Similar content being viewed by others
Notes
Later in this paper, in Fig. 11, we provide results from profiling a single CPU implementation of PChA that empirically demonstrates that the components of PChA that we select for GPU implementation represent a substantial part of the execution time for the complete single implementation.
We augment the points in the OS to ensure \(N_p \,=\, N_bN_t\).
For clarity, we omit the final per-grid reduction summation operations from Algorithm 3 and assume that the kernel in Algorithm 2 will return \(\{\mathbf {g},\; {\mathbf {H}},\; f\}\).
The CPU(1) implementation offers performance close to but not identical to that of a CPU implementation that is obtained by completely eliminating the OpenMP compiler directives from the code.
The DT is calculated using the method in [25].
References
Liu, M.Y, Tuzel, O., Veeraraghavan, A., Chellappa, R.,: Fast directional chamfer matching. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1696–1703, June 2010
Jiang, H., Holton, K.S., Robb, R.A.: Image registration of multimodality 3-D medical images by chamfer matching. In: SPIE/IS&T 1992 Symposium on Electronic Imaging: Science and Technology, pp. 356–366. International Society for Optics and Photonics, 1992
Chi, Y.T.,Shahed, S.M.N.,Ho, J., Yang, M.H.: Higher dimensional affine registration and vision applications. In: Proceedings European Conference on Computer Vision, pp. 256–269
Boughorbel, Faysal, Mercimek, Muharrem, Koschan, Andreas, Abidi, Mongi: A new method for the registration of three-dimensional point-sets: the Gaussian fields framework. Comput. Vis. Image Underst. 28(1), 124–137 (2010)
Gressin, Adrien, Mallet, Clment, Demantk, Jrme, David, Nicolas: Towards 3D lidar point cloud registration improvement using optimal neighborhood knowledge. J. Photogramm. Remote Sens. 79, 240–251 (2013)
Danelljan, M., Meneghetti, G., Shahbaz Khan, F., Felsberg, M.: A probabilistic framework for color-based point set registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1818–1826, June 2016
Ding, L., Elliethy, A., Freedenberg, E., Wolf-John-son, S.A., Romphf, J., Christensen, P., Sharma, G.: Comparative analysis of homologous buildings using range imaging. In: IEEE International Conference on Image Processing, pp. 4378–4382, Sep 2016
Elliethy, A., Sharma, G.: Vector road map registration to oblique wide area motion imagery by exploiting vehicles movements. In: IS&T Electronic Imaging: Video Surveillance and Transportation Imaging Applications, pp. VSTIA–520.1–8, San Francisco, Cal-ifornia, 2016a. URL http://ist.publisher.ingentaconnect.com/contentone/ist/ei/2016/00002016/00000003/art00008
Elliethy, A., Sharma, G.: Automatic registration of vector road maps with wide area motion imagery by exploiting vehicle detections. IEEE Trans. Image Process. 25(11), 5304–5315 (2016). doi:10.1109/TIP.2016.2601265
Besl, P.J., McKay, H.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)
Zhang, Zhengyou: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)
Myronenko, A., Song, X.: Point set registration: coherent point drift. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2262–2275 (2010)
Sofka, M., Yang, G., Stewart C.V.: Simultaneous covariance driven correspondence (CDC) and transformation estimation in the expectation maximization framework. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007
Fitzgibbon, A.W.: Robust registration of 2D and 3D point sets. Image Vis. Comput. 21(1314), 1145–1153 (2003)
Rouhani, M., Sappa, A.D.: Correspondence free registration through a point-to-model distance minimization. In: IEEE International Conference Computer Vision, pp. 2150–2157, Nov 2011
Borgefors, Gunilla: Distance transformations in digital images. Comput. Vis. Graph. Image Proc. 34(3), 344–371 (1986)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: two new techniques for image matching. In: Proceeding International Joint Conference on Artificial Intelligence, pp. 659–663, 1977
C. Sigg, R. Peikert, and M. Gross. Signed distance transform using graphics hardware. In: IEEE Visualization, pp. 83–90, Oct 2003
Cao, T.T., Tang, K., Mohamed, A., Tan, T.S.: Parallel banding algorithm to compute exact distance transform with the GPU. In: Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 83–90, ACM, New York, 2010
Zhu, Xiang, Zhang, Dianwen: Efficient parallel Levenberg–Marquardt model fitting towards real-time automated parametric imaging microscopy. PloS one 8(10), e76665 (2013)
Li, B., Young, A.A, Cowan, B.R.: GPU accelerated non-rigid registration for the evaluation of cardiac function. In: Medical Image Computing and Computer-Assisted Intervention, pp. 880–887. Springer, Berlin, 2008
Amorim, R., Haase, G., Liebmann, M., Weber dos Santos, R.: Comparing CUDA and OpenGL implementations for a Jacobi iteration. In: IEEE International Conference High Performance Computing Simulation (2009)
Architectural Biometrics Project. https://architecturalbiometrics.com/
Felzenszwalb. P. Huttenlocher, D.: Distance transforms of sampled functions. Technical Report TR2004-1963, Cornell University (2004). URL https://ecommons.cornell.edu/handle/1813/5663
Kirk, D.B., Wen-mei, W.H.: Programming Massively Parallel Processors: A Hands-on Approach, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
Harris M.: Optimizing parallel reduction in CUDA. NVIDIA Corporation (2007). http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
The OpenMP API specification for parallel programming. http://www.openmp.org/
University of Rochester, BlueHive Cluster. https://info.circ.rochester.edu/BlueHive/System_Overview.html
CorvusEye\(^{\text{TM}}\)1500 Data Sheet. http://www.exelisinc.com/solutions/corvuseye1500/Documents/CorvusEye500DataSheetAUG14
Szeliski, R., Shum, H.Y.: Creating full view panoramic image mosaics and environment maps. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’97, pp. 251–258, 1997
Acknowledgements
We thank Bernard Brower of Harris Corporation for making available the CorvusEye [30] WAMI datasets used for demonstrating PChA on real-world 2D datasets. We also thank our colleagues from the Architectural Bio-metrics project for providing the 3D datasets of building models and lidar scans that are used in our evaluation. We also thank the Center for Integrated Research Computing (CIRC), University of Rochester, for providing access to computational resources for this research.
Author information
Authors and Affiliations
Corresponding author
Appendix: Compositional approach for projective transformation estimation
Appendix: Compositional approach for projective transformation estimation
The Jacobian matrix elements \(\{J^c_{j,l}\}\) associated with the projective transformation in Table 1 require a division operation per-element, which is computationally expensive. To simplify the Jacobian calculations, we adopt a compositional approach [31] that eliminates the division operations and also enables further simplifications. The compositional approach for our 2D point set alignment by projective transformation proceeds as follows:
-
The projective transformation defined by the current estimate \({\varvec{\alpha }}_t\) of the parameters is applied to each OS point \({\mathbf {p}}_j\) to obtain a corresponding warped point \({\mathbf {p'}}_j \,=\, {\mathcal {T}}_{{\varvec{\alpha }}_t}\left( {\mathbf {p}}_j\right)\).
-
Each LM iteration, then estimates the incremental parameter update \({\varvec{\delta }}\) that minimizes
$$\begin{aligned} f({\varvec{\delta }}) \,=\, \sum \limits _{j=1}^{N_p} \left|| {\mathbf {r}}\left( {\mathcal {T}}_{\left( {\varvec{\alpha }}^I+{\varvec{\delta }}\right) }\left( {\mathbf {p'}}_j\right) \right) \right||^2 , \end{aligned}$$(12)where \({\varvec{\alpha }}^I\,=\,\left[ 1,0,0,0,1,0,0,0 \right]\) is the parameter vector that corresponds to the identity transformation, i.e., \({\mathcal {T}}_{{\varvec{\alpha }}^I} \left( {\mathbf {p'}}_j \right) =\,{\mathbf {p'}}_j\).
-
The updated projective transformation is obtained as
$$\begin{aligned} {\mathcal {T}}_{{\varvec{\alpha }}_{t+1}} \,=\, {\mathcal {T}}_{{\varvec{\alpha }}_t} \circ {\mathcal {T}}_{{\varvec{\delta }}}, \end{aligned}$$(13)where \(\circ\) denotes composition, or equivalently multiplication of the corresponding matrix representations.
Considerable simplification of the Jacobian matrix calculation is obtained because the calculation is performed at \({\varvec{\alpha }}^I\), where the term \(w_j\) in Table 1 becomes unity, eliminating the need for division operations. Specifically, the Jacobian matrix \({\mathbf {J}}_j\) at the transformed point \({\mathbf {p'}}_j\equiv {\mathcal {T}}_{{\varvec{\alpha }}^I}({\mathbf {p'}}_j)\), is computed as
The Hessian matrix approximation elements are shown in Table 2, where additional simplifications are also incorporated.
Rights and permissions
About this article
Cite this article
Elliethy, A., Sharma, G. Accelerated parametric chamfer alignment using a parallel, pipelined GPU realization. J Real-Time Image Proc 16, 1661–1680 (2019). https://doi.org/10.1007/s11554-017-0668-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-017-0668-5