Smart particle filtering for high-dimensional tracking

doi:10.1016/j.cviu.2005.09.013

Computer Vision and Image Understanding

Volume 106, Issue 1, April 2007, Pages 116-129

https://doi.org/10.1016/j.cviu.2005.09.013 Get rights and content

Abstract

Tracking articulated structures like a hand or body within a reasonable time is challenging because of the high-dimensionality of the state space. Recently, a new optimization method, called ‘Stochastic Meta-Descent’ (SMD) has been introduced in computer vision. This is a gradient descent scheme with adaptive and parameter-specific step sizes able to operate in a constrained space. However, while the local optimization works very well, reaching the global optimum is not guaranteed. We therefore propose an enhanced algorithm that wraps a particle filter around multiple SMD-based trackers, which play the role of many particles, i.e. that act as ‘smart particles’. After the standard particle propagation on the basis of a simple motion model, SMD is performed and the resulting new particle set is included such that the original Bayesian distribution is not altered. The resulting ‘Smart Particle Filter’ (SPF) tracks high-dimensional articulated structures with far fewer samples than previous methods. Additionally, it can handle multiple hypotheses and clutter, where pure optimization approaches have problems. Good performance is demonstrated for the case of hand tracking from 3D range data.

Introduction

Hand and body tracking have quite a few intriguing applications, ranging from human–computer interfacing to motion capture for animation. Yet, progress in this area is complicated by the high-dimensional state spaces in which such trackers must operate.

In this context, single hypothesis methods might get stuck in a local minimum while estimating the state of the tracker given an observation. One of the best known examples is the Kalman filter which is based on a uni-modal Gaussian assumption. In the case of articulated structure tracking, the posterior distribution has often a non-Gaussian nature which can cause failures while applying Kalman filtering. To remedy this drawback, alternatives have been proposed to deal with nonlinear estimation such as the Extended Kalman Filter (EKF), widely used for articulated structure tracking [1], [2], [3], or the Unscented Kalman Filter [4]. However, the limitations of the EKF have been demonstrated in the case of singularities (non observable degrees of freedom) and discontinuities (anatomical constraints) [5].

Another type of approach to tackle nonlinear estimation is the Condensation algorithm [6] which has shown a better robustness to clutter and occlusion due to its multiple hypotheses. However, as the number of dimensions increases, the computational cost becomes unmanageable. Indeed, in order to accurately explore the search space, the number of particles has to go up dramatically with its dimensionality. Therefore, basically two alternatives have been proposed:

•
To lower the dimensionality of the search space: Sidenbladh et al. [7] track a 3D articulated body, using particle filtering within a Bayesian framework. An action-specific dynamic model allows them to reduce the number of state parameters, though their particle filter still requires many samples. Wu et al. [8] represent articulations in a lower-dimensional space by a set of linear manifolds constructed from base configurations. However, their algorithm is view dependent and performs optimally only when the palm is orthogonal to the camera. In the same vein, Zhou and Huang [9] present a method called Eigen dynamic analysis to compress the actual dimensionality of the manifold of feasible finger configurations.
•
To devise schemes that work well with fewer samples: MacCormick and Isard [10] explore the search space more efficiently, through partitioned sampling: the state space is partitioned and explored in a hierarchical manner. In the ICondensation paper introduced by Isard and Blake [11] samples are placed according to a second information source based on the importance sampling concept, without changing the original probability distribution. Li et al. [12] apply Kalman filtering within a Condensation framework to use the information of the current observation more directly. After the propagation, a measurement is selected for each particle and used to update the particle location using Kalman filtering. A new particle is then drawn from a Gaussian distribution around this location, where the Kalman covariance matrix is used to specify the distribution. This strategy steers the particle distribution towards regions with a high likelihood. Deutscher et al. [13] propose a modified particle filter based on a simulated annealing algorithm able to track an articulated model in a high-dimensional space. Compared to a classical particle filter, they reduced the number of samples by a factor of 10, and use three cameras to track a full body model. Sminchisescu and Triggs [14] have combined global sampling with local optimization by gradient descent. However, this approach is still too slow for our purpose. Cham and Rehg [15] and Heap and Hogg [16] combined as well global sampling with local optimization.

In this paper, we propose a method of the second class (working with fewer samples). We developed our own local optimizer based on ‘Stochastic Meta-Descent’ [17] wrapped in a multiple hypotheses framework so as to increase the chance of reaching the global extremum.

The ‘Stochastic Meta-Descent’ (SMD) tracker was recently introduced in [17]. This new optimization technique is more efficient than previous gradient methods and can naturally incorporate constraints, which other optimization techniques often find difficult or costly to deal with. Stochasticity in the evaluation of the objective function increases the chance of getting out of local minima. In spite of the fact that the stochastic sampling decreases the risk to be trapped in a local minimum, the global minimum is not guaranteed to be reached. Therefore, we propose to combine several SMD trackers as ‘particles’ within a particle filter framework. This allows the tracker to deal with multiple hypotheses and increases the chance to find the global optimum. Furthermore, very few samples suffice as they tend to represent the minimum well. In comparison to Li et al. [12], we do not assume that the measurements are locally linearly dependent on the state which is a precondition for Kalman filtering.

The article is structured as follows: Section 2 describes our 3D hand model. Section 3 introduces the cost function which the tracker is to minimize. Section 4 briefly describes the SMD algorithm. Our ‘Smart Particle Filter’ (SPF) is introduced in Section 5. Finally, Sections 6 and 7 present the results and conclusions.

Section snippets

The hand model

In order to compare hypothesized states (3D hand poses are parameterized by the vector p) against observed 3D hand data (depth maps), the tracker uses a 3D hand model. This deformable model consists of a polygonal skin, driven by an underlying skeleton, and reproduces actual hand shapes quite well. A new pose is computed by linearly blending the motions that each skin vertex would undergo when rigidly coupled to a subset of the skeletonal joints, the ‘influencing joints’. The position of a

Observation model

Tracking proceeds by matching our 3D hand model against dense 3D measurements extracted at video rate. The speed with which our structured light sensor (ShapeSnatcher from Eyetronics¹) records the images is quite high, but the depth map cannot be calculated at the same speed. Hence, the 3D results are provided at a rate that is too low to support online tracking. This said, alternative structured light systems [22], [23] exist that can produce depth maps at video rate.

Problems with conventional methods

Nonlinear optimization problems are often solved by second-order gradient techniques such as the Levenberg–Marquardt algorithm [26], [27] or truncated quasi-Newton methods like the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [28]. These techniques update the model parameters in large steps, each of which is relatively expensive to compute. This makes it hard to enforce constraints on the parameters: the farther a single step takes the state outside of the feasible region, the more

Smart particle filtering

SMD has proven its efficiency in high-dimensional spaces [17]. However, spaces like ours are highly nonlinear with many local optima. SMD does not guarantee to reach the global optimum. The tracker’s chances to do so can be increased by exploring multiple hypotheses. A well-established framework to do so is particle filtering.

Particle filters [6] offer a probabilistic framework for dynamic state estimation. They compute the posterior density p(s_t|z_1:t) of the current object state s_t conditioned

Comparison of SMD with conventional optimization approaches

In previous research [17], we reported some extensive comparisons of SMD with conventional optimization approaches such as Gradient Descent, BFGS or Powell’s method [28] and show the superiority of an SMD tracker. All methods were optimizing the same cost function on the same 3D data by considering the parameter setting carefully. None of the approaches (including SMD) were optimized for speed. SMD performed best in several experiments, not only in terms of accuracy but also in terms of

Conclusion

We have presented a ‘Smart Particle Filter‘ (SPF) which integrates SMD optimization into particle filtering. The combination of the two approaches merges the advantages of both methods. Accordingly, the SPF tracks high-dimensional articulated structures with far fewer samples than the original Condensation approach and handles multiple hypotheses, clutter and occlusion robustly where pure optimization approaches often have problems. Furthermore, it is important to note that the SPF approach can

References (31)

S. Wachter et al.
Tracking persons in monocular image sequences
Computer Vision and Image Understanding
(1999)
P. Li et al.
Visual contour tracking based on particle filters
Image and Vision Computing
(2003)
L. Goncalves, E. di Bernardo, E. Ursella, P. Perona, Monocular tracking of the human arm in 3D, in: Int. Conf. on...
I.A. Kakadiaris, D. Metaxas, Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint...
B. Stenger, P.R.S. Mendonca, R. Cipolla, Model based 3D tracking of an articulated hand, in: Int. Conf. on Computer...
J. Deutscher, B. North, B. Bascle, A. Blake, Tracking through singularities and discontinuities by random sampling, in:...
M. Isard et al.
Condensation—conditional density propagation for visual tracking
International Journal on Computer Vision
(1998)
H. Sidenbladh, M.J. Black, D.J. Fleet, Stochastic tracking of 3D human figures using 2D image motion, in: Eur. Conf. on...
Y. Wu, J. Lin, T.S. Huang, Capturing natural hand articulation, in: Int. Conf. on Computer Vision, 2001, pp....
H. Zhou, T.S. Huang, Tracking articulated hand motion with eigen dynamics analysis, in: Int. Conf. on Computer Vision,...

J.P. MacCormick, M. Isard, Partitioned sampling, articulated objects, and interface-quality hand tracking, in: Eur....

M. Isard, A. Blake, ICondensation: unifying low-level and high-level tracking in a stochastic framework, in: Eur. Conf....

J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by annealed particle filtering, in: Int. Conf. on...

C. Sminchisescu, B. Triggs, Covariance scaled sampling for monocular 3D body tracking, in: Int. Conf. on Computer...

T.J. Cham, J. Rehg, A multiple hypotheses approach to figure tracking, in: Int. Conf. on Computer Vision and Pattern...

Cited by (0)

View full text