Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Registration of three-dimensional surfaces is a central problem in computer vision, computer graphics, and robotics. The problem is particularly challenging when the surfaces only partially overlap and no initial alignment is given. This difficult form of the problem is encountered in scene reconstruction [7, 39], 3D object retrieval [15, 29], camera relocalization [13], and other applications.

In order to deal with noisy data and partial overlap, practical registration pipelines employ iterative model fitting frameworks such as RANSAC [31]. Each iteration samples a set of candidate correspondences, produces an alignment based on these correspondences, and evaluates this alignment. If a satisfactory alignment is found, it is refined by a local registration algorithm such as ICP [30]. The combination of sampling-based coarse alignment and iterative local refinement is common in practice and is designed to produce a tight registration even with challenging input [7, 19, 39].

While such registration pipelines are common, they have significant drawbacks. Both the model fitting and the local refinement stages are iterative and perform computationally expensive nearest-neighbor queries in their inner loops. Much of the computational effort is expended on testing candidate alignments that are subsequently discarded. And the inelegant decomposition into a global alignment stage and a local refinement stage is itself a consequence of the low precision of global alignment frameworks.

In this paper, we present a fast global registration algorithm that does not involve iterative sampling, model fitting, or local refinement. The algorithm does not require initialization and can align noisy partially overlapping surfaces. It optimizes a robust objective defined densely over the surfaces. Due to this dense coverage, the algorithm directly produces an alignment that is as precise as that computed by well-initialized local refinement algorithms.

This direct approach has substantial benefits. It accomplishes in a single stage what is commonly done in two. This single stage optimizes a clear global objective. The optimization does not require closest-point queries in the inner loop. As a result, the presented algorithm is more than an order of magnitude faster than existing global registration pipelines, while matching or exceeding their accuracy.

Furthermore, we show that the presented algorithm can be extended to direct global alignment of multiple partially overlapping surfaces. Such joint alignment is often necessary in applications such as scene reconstruction [7, 39]. Existing approaches to this problem exhaustively produce candidate alignments between pairs of surfaces and then compute a globally consistent set of poses based on these intermediate pairwise alignments. In contrast, we show that a joint alignment can be produced directly by a single optimization of a global objective.

We evaluate the presented global registration algorithm on multiple datasets. Extensive experiments demonsrate that the presented approach matches or exceeds the accuracy of state-of-the-art global registration pipelines, while being at least an order of magnitude faster. Remarkably, the presented approach is also faster than local refinement algorithms such as ICP, since it does not need to recompute correspondences. It provides the accuracy achieved by well-initialized local refinement algorithms, without requiring an initialization and at lower computational cost.

2 Related Work

Geometric registration has been extensively studied [15, 28, 36, 38]. The typical workflow consists of two stages: global alignment, which computes an initial estimate of the rigid motion between two surfaces, followed by local refinement, which refines this initial estimate to obtain a tight registration [7, 12, 19, 25, 27, 37, 39]. We review each of these stages in turn.

Most global alignment methods operate on candidate correspondences. Some pipelines use point-to-point matches based on local geometric descriptors [16, 40], others define correspondences on pairs or tuples of points [1, 8, 26, 29]. Once candidate correspondences are collected, alignment is estimated iteratively from sparse subsets of correspondences and then validated on the entire surface. This iterative process is typically based on variants of RANSAC [1, 19, 26, 29, 34] or pose clustering [8, 26, 35]. When the data is noisy and the surfaces only partially overlap, existing pipelines often require many iterations to sample a good correspondence set and find a reasonable alignment.

Another approach to global registration is based on the branch-and-bound framework [10, 12, 18, 23, 42]. These algorithms systematically explore the pose space in search of the optimal solution. The branch-and-bound framework is appealing due to its theoretical optimality guarantees. However, the systematic search can be extremely time-consuming. In practice, the sampling-based frameworks described earlier outperform the branch-and-bound approaches when large datasets are involved.

Local refinement algorithms begin with a rough initial alignment and produce a tight registration based on dense correspondences. Most such methods are based on the iterative closest point (ICP) algorithm and its variants [30, 33]. In its basic form, ICP begins with an initial alignment and alternates between establishing correspondences via closest-point lookups and recomputing the alignment based on the current set of correspondences. ICP can produce an accurate result when initialized near the optimal pose, but is unreliable without such initialization. A long line of work has explored various approaches to increasing the robustness of ICP. Fitzgibbon [11] introduced nonlinear least-squares optimization to develop a robust error function that increases the radius of convergence. Bouaziz et al. [5] introduced sparsity inducing norms to deal with outliers and incomplete data. Other works explored the utility of relaxed assignments [14, 24, 32], distance field representations [6], and mixture models [21, 41] for increasing the robustness of local registration. Nevertheless, these approaches still rely on a satisfactory initialization. Our work demonstrates that the accuracy achieved by well-initialized local refinement algorithms can be achieved reliably without an initialization, at a computational cost that is more than an order of magnitude lower than the coarse global alignment algorithms described earlier.

Joint global registration of multiple partially overlapping surfaces has also been considered [7, 20, 39]. However, existing approaches to joint global registration first align many pairs of surfaces and then optimize the joint global alignment based on these intermediate pairwise results. This indirect approach incurs significant computational overhead. In contrast, we show that joint global alignment of many partially overlapping surfaces can be optimized for directly.

3 Pairwise Global Registration

3.1 Objective

Consider two point sets \(\mathbf {P}\) and \(\mathbf {Q}\). Our task is to find a rigid transformation \(\mathbf {T}\) that aligns \(\mathbf {Q}\) to \(\mathbf {P}\). Our approach optimizes a robust objective on correspondences between \(\mathbf {P}\) and \(\mathbf {Q}\). These correspondences are established by rapid feature matching that is performed before the objective is optimized. The correspondences are not recomputed during the optimization. For this reason, it is critical that the optimization be able to deal with very noisy correspondence sets. This is illustrated in Fig. 1.

Let \({\mathcal K}=\{(\mathbf {p},\mathbf {q})\}\) be the set of correspondences collected by matching points from \(\mathbf {P}\) and \(\mathbf {Q}\) as described in Sect. 3.3. Our objective is to optimize the pose \(\mathbf {T}\) such that distances between corresponding points are minimized, while spurious correspondences from \({\mathcal K}\) are seamlessly disabled. The objective has the following form:

$$\begin{aligned} E(\mathbf {T})=\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}}\rho \left( \Vert \mathbf {p}-\mathbf {T}\mathbf {q}\Vert \right) . \end{aligned}$$
(1)
Fig. 1.
figure 1

An illustration with 2D point sets. (a) A latent shape. (b) Two partially overlapping surfaces and a set of point-to-point correspondences. The blue correspondences are genuine, the red correspondences are erroneous. For fast and accurate registration, the erroneous correspondences must be disabled without sampling, validation, pruning, or correspondence recomputation. (Color figure online)

Fig. 2.
figure 2

Illustration of graduated non-convexity. As \(\mu \) decreases, the objective function for the matching problem in Fig. 1 becomes sharper and the registration more precise.

Here \(\rho (\cdot )\) is a robust penalty. The use of an appropriate robust penalty function is critical, because many of the terms in Objective 1 are contributed by spurious constraints. To achieve high computational efficiency, we do not want to sample, validate, prune, or recompute correspondences during the optimization. A well-chosen estimator \(\rho \) will perform the validation and pruning automatically without imposing additional computational costs. We use a scaled Geman-McClure estimator:

$$\begin{aligned} \rho (x)=\frac{\mu x^{2}}{\mu +x^{2}}. \end{aligned}$$
(2)

Figure 2(a) shows the Geman-McClure estimator for different values of \(\mu \). As can be seen in the figure, small residuals are penalized in the least-squares sense, while the sublinear growth and rapid flattening out of the estimator neutralize outliers. The parameter \(\mu \) controls the range within which residuals have a significant effect on the objective; its setting will be discussed in Sect. 3.2.

Objective 1 is difficult to optimize directly. We use the Black-Rangarajan duality between robust estimation and line processes [3]. Specifically, let \({\mathbb L}=\{l_{\mathbf {p},\mathbf {q}}\}\) be a line process over the correspondences. We optimize the following joint objective over \(\mathbf {T}\) and \({\mathbb L}\):

$$\begin{aligned} E(\mathbf {T},{\mathbb L})=\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}}l_{\mathbf {p},\mathbf {q}}\Vert \mathbf {p}-\mathbf {T}\mathbf {q}\Vert ^{2} +\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}} {\mathrm \Psi }(l_{\mathbf {p},\mathbf {q}}). \end{aligned}$$
(3)

Here \({\mathrm \Psi }(l_{\mathbf {p},\mathbf {q}})\) is a prior, set to

$$\begin{aligned} {\mathrm \Psi }(l_{\mathbf {p},\mathbf {q}})=\mu \left( \sqrt{l_{\mathbf {p}, \mathbf {q}}} - 1\right) ^{2}. \end{aligned}$$
(4)

For \(E(\mathbf {T},{\mathbb L})\) to be minimized, the partial derivative with respect to each \(l_{\mathbf {p},\mathbf {q}}\) must vanish:

$$\begin{aligned} \frac{\partial E}{\partial l_{\mathbf {p}, \mathbf {q}}} = \Vert \mathbf {p}-\mathbf {T}\mathbf {q}\Vert ^{2} + \mu \frac{\sqrt{l_{\mathbf {p},\mathbf {q}}}-1}{\sqrt{l_{\mathbf {p},\mathbf {q}}}} = 0. \end{aligned}$$
(5)

Solving for \(l_{\mathbf {p},\mathbf {q}}\) yields

$$\begin{aligned} l_{\mathbf {p},\mathbf {q}}=\left( \frac{\mu }{\mu +\Vert \mathbf {p}-\mathbf {T}\mathbf {q}\Vert ^{2}}\right) ^{2}. \end{aligned}$$
(6)

Substituting \(l_{\mathbf {p},\mathbf {q}}\) into \(E(\mathbf {T},{\mathbb L})\), Objective 3 becomes Objective 1. Thus optimizing Objective 3 yields a solution \(\mathbf {T}\) that is also optimal for the original Objective 1.

3.2 Optimization

The main benefit of the optimization objective defined in Eq. 3 is that the optimization can be performed extremely efficiently by alternating between optimizing \(\mathbf {T}\) and \({\mathbb L}\). The optimization performs block coordinate descent by fixing \({\mathbb L}\) when optimizing \(\mathbf {T}\) and vice versa. Both types of steps optimize the same global objective (Eq. 3). Thus the alternating algorithm is guaranteed to converge.

When \({\mathbb L}\) is fixed, Objective 3 turns into a weighted sum of \(L^2\) penalties on distances between point-to-point correspondences. This objective over \(\mathbf {T}\) can be solved efficiently in closed form [9]. However, such closed-form solution does not extend to joint registration of multiple surfaces, which we are interested in and will extend the presented approach to in Sect. 4. We therefore present a more flexible approach. We linearize \(\mathbf {T}\) locally as a 6-vector \(\xi = (\omega , \mathbf {t}) = (\alpha ,\beta ,\gamma ,a,b,c)\) that collates a rotational component \(\omega \) and a translation \(\mathbf {t}\). \(\mathbf {T}\) is approximated by a linear function of \(\xi \):

$$\begin{aligned} \mathbf {T}\approx \left( \begin{array}{c c c c} 1 &{} -\gamma &{} \beta &{} a\\ \gamma &{} 1 &{} -\alpha &{} b\\ -\beta &{} \alpha &{} 1 &{} c\\ 0 &{} 0 &{} 0 &{} 1 \end{array}\right) \mathbf {T}^{k}. \end{aligned}$$
(7)

Here \(\mathbf {T}^{k}\) is the transformation estimated in the last iteration. Equation 3 becomes a least-squares objective on \(\xi \). Using the Gauss-Newton method, \(\xi \) is computed by solving a linear system:

$$\begin{aligned} \mathbf {J}_{\mathbf {r}}^{\top }\mathbf {J}_{\mathbf {r}}\xi = -\mathbf {J}_{\mathbf {r}}^{\top }\mathbf {r}, \end{aligned}$$
(8)

where \(\mathbf {r}\) is the residual vector and \(\mathbf {J}_{\mathbf {r}}\) is its Jacobian. \(\mathbf {T}\) is updated by applying \(\xi \) to \(\mathbf {T}^{k}\) using Eq. 7, then mapped back into the SE(3) group.

When \(\mathbf {T}\) is fixed, the objective in Eq. 3 has a closed-form solution. It is minimized when \(l_{\mathbf {p},\mathbf {q}}\) satisfies Eq. 6.

Graduated Non-convexity. Objective 3 is non-convex and its shape is controlled by the parameter \(\mu \) of the penalty function (Eq. 2). To set \(\mu \) and alleviate the effect of local minima we employ graduated non-convexity [2, 4]. From the standpoint of Eq. 3, \(\mu \) balances the strength of the prior term and the alignment term. Large \(\mu \) makes the objective function smoother and allows many correspondences to participate in the optimization even when they are not fit tightly by the transformation \(\mathbf {T}\). The effect of varying \(\mu \) is illustrated in Fig. 2. Our optimization begins with a very large value \(\mu =D^2\), where D is the diameter of the largest surface. The parameter \(\mu \) is decreased during the optimization until it reaches the value \(\mu =\delta ^{2}\), where \(\delta \) is a distance threshold for genuine correspondences.

3.3 Correspondences

To generate the initial correspondence set \({\mathcal K}\), we use the Fast Point Feature Histogram (FPFH) feature [34]. We have chosen this feature because it can be computed in a fraction of a millisecond and provides good matching accuracy across a broad range of datasets [16]. Let \(\mathbf {F}(\mathbf {P}) = \{\mathbf {F}(\mathbf {p}):\mathbf {p}\in \mathbf {P}\}\), where \(\mathbf {F}(\mathbf {p})\) is the FPFH feature computed for point \(\mathbf {p}\). Define \(\mathbf {F}(\mathbf {Q}) = \{\mathbf {F}(\mathbf {q}):\mathbf {q}\in \mathbf {Q}\}\) analogously.

For each \(\mathbf {p}\!\in \!\mathbf {P}\), we find the nearest neighbor of \(\mathbf {F}(\mathbf {p})\) among \(\mathbf {F}(\mathbf {Q})\), and for each \(\mathbf {q}\!\in \!\mathbf {Q}\) we find the nearest neighbor of \(\mathbf {F}(\mathbf {q})\) among \(\mathbf {F}(\mathbf {P})\). Let \({\mathcal K}_{I}\) be the set that collects all these correspondences. This set could be used directly as the input to our approach. However, in practice \({\mathcal K}_{I}\) has a very high fraction of outliers. We use two tests to improve the inlier ratio of the correspondence set used by the algorithm.

  • Reciprocity test. A correspondence pair \((\mathbf {p},\mathbf {q})\) is selected from \({\mathcal K}_{I}\) if and only if \(\mathbf {F}(\mathbf {p})\) is the nearest neighbor of \(\mathbf {F}(\mathbf {q})\) among \(\mathbf {F}(\mathbf {P})\) and \(\mathbf {F}(\mathbf {q})\) is the nearest neighbor of \(\mathbf {F}(\mathbf {p})\) among \(\mathbf {F}(\mathbf {Q})\). The resulting correspondence set is denoted by \({\mathcal K}_{II}\).

  • Tuple test. We randomly pick 3 correspondence pairs \((\mathbf {p}_{1},\mathbf {q}_{1}),(\mathbf {p}_{2},\mathbf {q}_{2}),(\mathbf {p}_{3},\mathbf {q}_{3})\) from \({\mathcal K}_{II}\) and check if the tuples \((\mathbf {p}_{1},\mathbf {p}_{2},\mathbf {p}_{3})\) and \((\mathbf {q}_{1},\mathbf {q}_{2},\mathbf {q}_{3})\) are compatible. Specifically, we test if the following condition is met:

    $$\begin{aligned} \forall i\not =j,\quad \tau< \frac{\Vert \mathbf {p}_{i}-\mathbf {p}_{j}\Vert }{\Vert \mathbf {q}_{i}-\mathbf {q}_{j}\Vert } < 1/\tau , \end{aligned}$$
    (9)

    where \(\tau =0.9\). Intuitively, this test verifies that the correspondences are compatible. Correspondences from tuples that pass the test are collected in a set \({\mathcal K}_{III}\). This is the set used by the algorithm: \({\mathcal K}= {\mathcal K}_{III}\).

Algorithm 1 summarizes the pairwise registration algorithm used in all subsequent experiments.

figure a

4 Multi-way Registration

Many applications require aligning multiple surfaces to obtain a model of a large scene or object. To solve this multi-way registration problem, existing approaches first compute pairwise alignments between pairs of surfaces and then attempt to synchronize these alignments to obtain a global registration [7, 20, 39]. This has two significant disadvantages. First, the pairwise alignment stage is computationally wasteful because it is not apparent in advance which pairs will be useful. Second, pairwise registration can yield a suboptimal alignment due to local minima that could be disambiguated by a global approach that considers all surfaces jointly.

We develop an alternative approach: to directly align all surfaces based on raw dense point correspondences. Instead of optimizing separate pairwise alignments and then synchronizing the results, we can directly optimize a global registration objective over all surfaces.

4.1 Objective

Given a set of surfaces \(\{\mathbf {P}_{i}\}\), our task is to estimate a set of poses \({\mathbb T}=\{\mathbf {T}_{i}\}\) that aligns the surfaces in a global coordinate frame. We begin by constructing a set of candidate correspondences \({\mathcal K}_{ij}\) for each pair of surfaces \((\mathbf {P}_{i},\mathbf {Q}_{j}), i<j\). Objective 1 is extended to the multi-way setting as follows:

$$\begin{aligned} E({\mathbb T})=\lambda \sum _{i}\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}_{i}}\Vert \mathbf {T}_{i}\mathbf {p}-\mathbf {T}_{i+1}\mathbf {q}\Vert ^2+\sum _{i<j}\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}_{ij}}\rho (\Vert \mathbf {T}_{i}\mathbf {p}-\mathbf {T}_{j}\mathbf {q}\Vert ). \end{aligned}$$
(10)

This formulation incorporates initial odometry transformations \(\{\mathbf {T}_{i}\}\) between consecutive surfaces, which are commonly available in surface reconstruction. The set \({\mathcal K}_{i}\) collects correspondences between surfaces \(\mathbf {P}_{i}\) and \(\mathbf {P}_{i+1}\) under the odometry alignment. When available, the odometry terms are penalized directly with the \(L^2\) norm and serve as a backbone that stabilizes the optimization.

Define a line process \({\mathbb L}=\{{l}_{\mathbf {p},\mathbf {q}}\}\). The objective can now be reformulated as follows:

$$\begin{aligned} E({\mathbb T},{\mathbb L})= & {} \lambda \sum _{i}\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}_{i}}\Vert \mathbf {T}_{i}\mathbf {p}-\mathbf {T}_{i+1}\mathbf {q}\Vert ^2\nonumber \\+ & {} \sum _{i<j}\Big (\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}_{ij}}l_{\mathbf {p},\mathbf {q}}\Vert \mathbf {T}_{i}\mathbf {p}-\mathbf {T}_{j}\mathbf {q}\Vert ^{2}+\sum _{(\mathbf {p},\mathbf {q})\in {\mathcal K}_{ij}} {\mathrm \Psi }(l_{\mathbf {p},\mathbf {q}})\Big ). \end{aligned}$$
(11)

The prior term \({\mathrm \Psi }(l_{\mathbf {p},\mathbf {q}})\) is defined as in Eq. 4.

4.2 Optimization

We again use alternating optimization to solve the minimization problem. In each iteration, \(E({\mathbb T},{\mathbb L})\) is first minimized with respect to the line process variables \({\mathbb L}\). This has a closed-form solution:

$$\begin{aligned} l_{\mathbf {p},\mathbf {q}}=\left( \frac{\mu }{\mu +\Vert \mathbf {T}_{i}\mathbf {p}-\mathbf {T}_{j}\mathbf {q}\Vert ^{2}}\right) ^{2}. \end{aligned}$$
(12)

Next, \(E({\mathbb T},{\mathbb L})\) is minimized with respect to all poses \({\mathbb T}\). Let \(\mathbf {T}_{i}^{k}\) denote the i-th transformation estimated in the previous iteration. \(\mathbf {T}_{i}\) can be locally linearized with a 6-vector \(\xi _{i} = (\omega _{i}, \mathbf {t}_{i}) = (\alpha _{i},\beta _{i},\gamma _{i},a_{i},b_{i},c_{i})\):

$$\begin{aligned} \mathbf {T}_{i}\approx \left( \begin{array}{c c c c} 1 &{} -\gamma _{i} &{} \beta _{i} &{} a_{i}\\ \gamma _{i} &{} 1 &{} -\alpha _{i} &{} b_{i}\\ -\beta _{i} &{} \alpha _{i} &{} 1 &{} c_{i}\\ 0 &{} 0 &{} 0 &{} 1 \end{array}\right) \mathbf {T}_{i}^{k}. \end{aligned}$$
(13)

Let \({\mathrm \Xi }\) be a \(6|{\mathbb T}|\)-vector that collates \(\{\xi _{i}\}\). \(E({\mathbb T},{\mathbb L})\) becomes a least-squares objective on \({\mathrm \Xi }\). It is minimized by solving the linear system

$$\begin{aligned} \mathbf {J}_{\mathbf {r}}^{\top }\mathbf {J}_{\mathbf {r}}{\mathrm \Xi }= -\mathbf {J}_{\mathbf {r}}^{\top }\mathbf {r} \end{aligned}$$
(14)

and updating \(\mathbf {T}_{i}\) accordingly. Here \(\mathbf {J}_{\mathbf {r}}\) and \(\mathbf {r}\) are the Jacobian matrix and the residual vector, respectively.

Note that the correspondences are never updated. Each iteration performs only two steps: evaluate a line process variable for each point correspondence, then build and solve a linear system with \(6|{\mathbb T}|\) variables. Both steps are very efficient.

5 Results

5.1 Pairwise Registration

We evaluate the presented pairwise registration algorithm on synthetic range data, the UWA benchmark [27], and the global registration benchmark of Choi et al. [7]. We compare our algorithm with a number of prior global registration methods. GoICP is the algorithm of Yang et al. [42]. GoICP-Trimming is its trimming variant that supports partial overlap. We use a \(10\,\%\) trimming percentage and use only 1,000 data points, as suggested by Yang et al. [42]. Without downsampling, GoICP and GoICP-Trimming take hours to run on our point clouds. Super4PCS is the algorithm of Mellado et al. [26]. OpenCV is a recent OpenCV implementation of the surface registration algorithm of Drost et al. [8]. PCL is a Point Cloud Library implementation of the algorithm of Rusu et al. [19, 34]. CZK is the variant of Rusu’s algorithm used by Choi et al. [7].

We also conduct controlled comparisons with local registration algorithms. PCL ICP is a Point Cloud Library implementation of the ICP algorithm [19]. Sparse ICP is the algorithm of Bouaziz et al. [5]. We tested these algorithms with both point-to-point and point-to-plane distance measures [33].

All execution times are measured using a single thread on an Intel Core i7-5960X CPU clocked at 3.00 GHz.

Synthetic Range Data. We begin by performing a series of controlled experiments on synthetic data. The availability of precise ground truth enables a detailed evaluation. To conduct controlled experiments, we used three well-known models from the AIM@SHAPE repository (Bimba, Dancing Children, and Chinese Dragon), the Berkeley Angel dataset [22], and the Stanford Bunny. For each model, we synthesized five pairs of partially overlapping range images and then corrupted these range images with 3D Gaussian noise. We used three noise levels, defined by setting the standard deviation of the Gaussian distribution to \(\sigma =0\) (no noise), \(\sigma =0.0025\), and \(\sigma =0.005\). The unit of \(\sigma \) is the diameter of the surface. For each noise level, there are 25 partially overlapping global registration tests in total. The number of points in each range image varies between 8,868 and 19,749. The overlap ratio varies between \(47\,\%\) and \(90\,\%\).

Figure 3 shows the accuracy achieved by the different global registration algorithms on the 25 tests at each noise level. For each algorithm and each RMSE level \(\alpha \), the figure plots the \(\alpha \)-recall, defined as the fraction of tests for which the method achieved an RMSE \(< \alpha \). (Higher is better.) The RMSE is computed on the distances between ground-truth correspondences after alignment. Table 1 summarizes the average and maximal RMSE for each method. (Lower is better.) For synthetic data with no noise, our method, PCL, and CZK produce tight alignment in \(100\,\%\) of the tests. (RMSE \(\le 0.005\).) The accuracy of OpenCV and Super4PCS is worse by a multiplicative factor of at least 3, presumably due to their reliance on matching tuples of points rather than optimizing for fully dense surface registration. GoICP-Trimming produces accurate alignment in many cases but suffers from poor accuracy on others, presumably because its computational costs necessitate operation on downsampled point clouds.

Fig. 3.
figure 3

Controlled experiments on synthetic data. \(\alpha \)-recall is the fraction of tests for which a given method achieves an RMSE \(< \alpha \). Higher is better. The RMSE unit is the diameter of the surface. Our algorithm is more robust to noise and is more accurate than prior approaches, while being more than an order of magnitude faster.

Table 1. Average and maximal RMSE achieved by global registration algorithms on synthetic range images with noise level \(\sigma \). Maximal RMSE is the maxumum among the 25 RMSE values obtained for individual pairwise registration tests. Our approach outperforms other methods by a large margin when noise is present. Specifically, for \(\sigma =0.005\), the average RMSE of our approach is more than 2 times lower than the lowest average RMSE of any prior approach, and the maximal RMSE of our approach is 5.6 times lower.

On noisy data, our method is much more robust and accurate than others. For \(\sigma =0.005\), the average RMSE of our approach is more than 2 times smaller than the lowest average RMSE of any prior approach, and the maximal RMSE of our approach is 5.6 times smaller than the lowest maximal RMSE among prior approaches. This is presumably because our approach optimizes over dense correspondences rather than matching point tuples. A qualitative comparison with GoICP-Trimming and PCL is provided in Fig. 4.

Fig. 4.
figure 4

Visual comparison with GoICP-Trimming and PCL. Our method operates on dense point clouds and produces a tight alignment with RMSE 0.004 on clean data (top row) and RMSE 0.007 on noisy data (bottom row, \(\sigma =0.005\)). In contrast, the prior approaches break down in the presence of noise: RMSE 0.129 for GoICP-Trimming and 0.326 for PCL in the bottom row. Error magnitude is coded by color, with black indicating error above 0.05. (Color figure online)

Table 2. Running times of global registration methods, measured in seconds. GoICP and its trimming variant operate on point clouds downsampled to 1,000. All other methods operate on full-resolution point clouds. Our algorithm is 50 times faster than the fastest prior global registration method.

The major benefit of our approach is that it is faster by more than an order of magnitude than prior approaches. Table 2 shows the average computation time of each global registration method on each object. Our method improves registration speed by a factor of 50 relative to the fastest prior global registration algorithm (CZK). While previous methods require tens of seconds, our method takes 0.2 s on average. This is because our method avoids expensive nearest-neighbor lookups in the optimization loop.

figure b

We further analyze the computational requirements of our approach by varying the size (i.e., the point count) of the synthesized range images and measuring the execution time of individual components of our algorithm. The results are shown on the right. The majority of the time is spent on computing the FPFH features and building the input correspondences. These operations are performed only once, before the optimization, and the correspondences are never updated. The optimization itself is extremely fast. Its execution time is below 30 milliseconds even for point clouds with more than 20,000 points. In addition, our method performs validation only once, after optimization has converged. This one-time validation consumes on average \(3.3\,\%\) of our computation time. In contrast, sampling-based methods such as PCL perform validation thousands of times in the RANSAC loop.

We also compare our global registration algorithm with local refinement methods such as ICP and its variants. To perform a controlled evaluation, we varied the accuracy of the initial transformation provided to the local methods. The results are shown in Fig. 5. We performed two sets of experiments. In one, the local algorithms were initialized with the ground-truth translation and varying degrees of rotation. In the other, the local algorithms were initialized with the ground-truth rotation and varying degrees of translation. As shown in Fig. 5, our algorithm matches the accuracy of the local refinment methods in the idealized case when these methods are initialized with the ground-truth transformation. However, the accuracy of the local methods degrades when the initialization deviates sufficiently from the ground-truth pose: 5 degrees in rotation or 5–10 % of the point cloud diameter in translation. In contrast, our algorithm does not use an initialization and yields the same accuracy in all conditions.

Fig. 5.
figure 5

Controlled comparison with local methods. Local registration algorithms are initialized with a transformation generated by adding a perturbation in rotation (left) or translation (right) to the ground-truth alignment. The plots show the mean (bold curve) and standard deviation (shaded region) of the RMSE of each method. Lower is better. Our algorithm matches the accuracy achieved by the local algorithms when they are initialized near the ground-truth pose, but does not require an initialization.

Table 3. Timing comparison with local algorithms, measured in seconds. Our global algorithm is 2.8 times faster than a state-of-the-art implementation of ICP.

We further compare the computational costs of our algorithm and the local refinement methods. The results are shown in Table 3. Remarkably, our global registration algorithm is 2.8 times faster than a state-of-the-art implementation of ICP. The key reason is that our algorithm does not need to recompute correspondences.

UWA Benchmark. Next, we evaluate our method on the UWA dataset [27]. This dataset has 50 scenes. Each scene has multiple objects that can be aligned to it. In total, the dataset contains 188 pairwise registration tests. Figure 6(a) shows a scene with objects aligned to it by our approach. This dataset is challenging due to clutter, occlusion, and low overlap. The lowest overlap ratio in the dataset is only \(21\,\%\). As shown in Fig. 6(b), many prior global registration algorithms perform poorly on this dataset. Our algorithm achieves a 0.05-recall of 84 %, comparable with PCL and CZK (82 % and 78 %, respectively). OpenCV achieves 52 % and the other algorithms are all below 7 %.

Fig. 6.
figure 6

Global registration results on the UWA benchmark [27]. (a) Our result on one of the 188 tests. The scene is colored white and objects aligned to the scene have distinct colors. (b) \(\alpha \)-recall plot comparing our method and prior global registration algorithms. (Higher is better.) (Color figure online)

Table 4 compares the speed of our approach with the other global registration methods on the UWA benchmark. Our approach is an order of magnitude faster than the fastest prior methods.

Table 4. Average running times of global registration methods on the 188 tests from the UWA dataset, measured in seconds.

Scene Benchmark. We now evaluate on the scene benchmark provided by Choi et al. [7]. This benchmark has 4 datasets. Each dataset consists of 47 to 57 fragments of a scene. These fragments contain high-frequency noise and low-frequency distortion that simulate scans created by consumer depth cameras. Global pairwise registration is performed on every pair of fragments from a given scene. Table 5 compares recall and precision (as defined by Choi et al.), and average running times.

Table 5. Evaluation on the scene benchmark of Choi et al. [7]. Our approach has the highest precision and the second highest recall, while being at least an order of magnitude faster.

5.2 Multi-way Registration

We evaluate the multi-way extension of our algorithm on the Augmented ICL-NUIM dataset [7, 17]. The dataset contains four sequences, each of which consists of 47 to 57 scene fragments. We apply the multi-way registration algorithm presented in Sect. 4 to these fragments. (The same parameter \(\lambda =2\) was used for all experiments.) Our method produces a global alignment of all fragments. We integrate these aligned fragments and report the mean distance of the reconstructed surface to the ground-truth model. Table 6 reports the resulting accuracy and running time. The reconstruction accuracy yielded by our direct multi-way registration algorithm matches the accuracy of the registration approach of Choi et al. [7]. However, our algorithm solves for the joint global alignment directly, without exhaustive intermediate pairwise alignments. It is therefore 60 times faster than the approach of Choi et al.

Table 6. Evaluation of multi-way registration on the Augmented ICL-NUIM dataset [7, 17]. Our multi-way registration algorithm matches the accuracy of the state-of-the-art multi-way registration pipeline of Choi et al., but is 60 times faster.

6 Conclusion

We have presented a fast algorithm for global registration of partially overlapping 3D surfaces. Our algorithm is more than an order of magnitude faster than prior global registration algorithms and is much more robust to noise. It matches the accuracy of well-initialized local refinement algorithms such as ICP, without requiring an initialization and at lower computational cost. The algorithm may be broadly applicable in computer vision, computer graphics, and robotics.