Smart particle filtering for high-dimensional tracking
Introduction
Hand and body tracking have quite a few intriguing applications, ranging from human–computer interfacing to motion capture for animation. Yet, progress in this area is complicated by the high-dimensional state spaces in which such trackers must operate.
In this context, single hypothesis methods might get stuck in a local minimum while estimating the state of the tracker given an observation. One of the best known examples is the Kalman filter which is based on a uni-modal Gaussian assumption. In the case of articulated structure tracking, the posterior distribution has often a non-Gaussian nature which can cause failures while applying Kalman filtering. To remedy this drawback, alternatives have been proposed to deal with nonlinear estimation such as the Extended Kalman Filter (EKF), widely used for articulated structure tracking [1], [2], [3], or the Unscented Kalman Filter [4]. However, the limitations of the EKF have been demonstrated in the case of singularities (non observable degrees of freedom) and discontinuities (anatomical constraints) [5].
Another type of approach to tackle nonlinear estimation is the Condensation algorithm [6] which has shown a better robustness to clutter and occlusion due to its multiple hypotheses. However, as the number of dimensions increases, the computational cost becomes unmanageable. Indeed, in order to accurately explore the search space, the number of particles has to go up dramatically with its dimensionality. Therefore, basically two alternatives have been proposed:
- •
To lower the dimensionality of the search space: Sidenbladh et al. [7] track a 3D articulated body, using particle filtering within a Bayesian framework. An action-specific dynamic model allows them to reduce the number of state parameters, though their particle filter still requires many samples. Wu et al. [8] represent articulations in a lower-dimensional space by a set of linear manifolds constructed from base configurations. However, their algorithm is view dependent and performs optimally only when the palm is orthogonal to the camera. In the same vein, Zhou and Huang [9] present a method called Eigen dynamic analysis to compress the actual dimensionality of the manifold of feasible finger configurations.
- •
To devise schemes that work well with fewer samples: MacCormick and Isard [10] explore the search space more efficiently, through partitioned sampling: the state space is partitioned and explored in a hierarchical manner. In the ICondensation paper introduced by Isard and Blake [11] samples are placed according to a second information source based on the importance sampling concept, without changing the original probability distribution. Li et al. [12] apply Kalman filtering within a Condensation framework to use the information of the current observation more directly. After the propagation, a measurement is selected for each particle and used to update the particle location using Kalman filtering. A new particle is then drawn from a Gaussian distribution around this location, where the Kalman covariance matrix is used to specify the distribution. This strategy steers the particle distribution towards regions with a high likelihood. Deutscher et al. [13] propose a modified particle filter based on a simulated annealing algorithm able to track an articulated model in a high-dimensional space. Compared to a classical particle filter, they reduced the number of samples by a factor of 10, and use three cameras to track a full body model. Sminchisescu and Triggs [14] have combined global sampling with local optimization by gradient descent. However, this approach is still too slow for our purpose. Cham and Rehg [15] and Heap and Hogg [16] combined as well global sampling with local optimization.
In this paper, we propose a method of the second class (working with fewer samples). We developed our own local optimizer based on ‘Stochastic Meta-Descent’ [17] wrapped in a multiple hypotheses framework so as to increase the chance of reaching the global extremum.
The ‘Stochastic Meta-Descent’ (SMD) tracker was recently introduced in [17]. This new optimization technique is more efficient than previous gradient methods and can naturally incorporate constraints, which other optimization techniques often find difficult or costly to deal with. Stochasticity in the evaluation of the objective function increases the chance of getting out of local minima. In spite of the fact that the stochastic sampling decreases the risk to be trapped in a local minimum, the global minimum is not guaranteed to be reached. Therefore, we propose to combine several SMD trackers as ‘particles’ within a particle filter framework. This allows the tracker to deal with multiple hypotheses and increases the chance to find the global optimum. Furthermore, very few samples suffice as they tend to represent the minimum well. In comparison to Li et al. [12], we do not assume that the measurements are locally linearly dependent on the state which is a precondition for Kalman filtering.
The article is structured as follows: Section 2 describes our 3D hand model. Section 3 introduces the cost function which the tracker is to minimize. Section 4 briefly describes the SMD algorithm. Our ‘Smart Particle Filter’ (SPF) is introduced in Section 5. Finally, Sections 6 and 7 present the results and conclusions.
Section snippets
The hand model
In order to compare hypothesized states (3D hand poses are parameterized by the vector p) against observed 3D hand data (depth maps), the tracker uses a 3D hand model. This deformable model consists of a polygonal skin, driven by an underlying skeleton, and reproduces actual hand shapes quite well. A new pose is computed by linearly blending the motions that each skin vertex would undergo when rigidly coupled to a subset of the skeletonal joints, the ‘influencing joints’. The position of a
Observation model
Tracking proceeds by matching our 3D hand model against dense 3D measurements extracted at video rate. The speed with which our structured light sensor (ShapeSnatcher from Eyetronics1) records the images is quite high, but the depth map cannot be calculated at the same speed. Hence, the 3D results are provided at a rate that is too low to support online tracking. This said, alternative structured light systems [22], [23] exist that can produce depth maps at video rate.
Problems with conventional methods
Nonlinear optimization problems are often solved by second-order gradient techniques such as the Levenberg–Marquardt algorithm [26], [27] or truncated quasi-Newton methods like the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [28]. These techniques update the model parameters in large steps, each of which is relatively expensive to compute. This makes it hard to enforce constraints on the parameters: the farther a single step takes the state outside of the feasible region, the more
Smart particle filtering
SMD has proven its efficiency in high-dimensional spaces [17]. However, spaces like ours are highly nonlinear with many local optima. SMD does not guarantee to reach the global optimum. The tracker’s chances to do so can be increased by exploring multiple hypotheses. A well-established framework to do so is particle filtering.
Particle filters [6] offer a probabilistic framework for dynamic state estimation. They compute the posterior density p(st|z1:t) of the current object state st conditioned
Comparison of SMD with conventional optimization approaches
In previous research [17], we reported some extensive comparisons of SMD with conventional optimization approaches such as Gradient Descent, BFGS or Powell’s method [28] and show the superiority of an SMD tracker. All methods were optimizing the same cost function on the same 3D data by considering the parameter setting carefully. None of the approaches (including SMD) were optimized for speed. SMD performed best in several experiments, not only in terms of accuracy but also in terms of
Conclusion
We have presented a ‘Smart Particle Filter‘ (SPF) which integrates SMD optimization into particle filtering. The combination of the two approaches merges the advantages of both methods. Accordingly, the SPF tracks high-dimensional articulated structures with far fewer samples than the original Condensation approach and handles multiple hypotheses, clutter and occlusion robustly where pure optimization approaches often have problems. Furthermore, it is important to note that the SPF approach can
References (31)
- et al.
Tracking persons in monocular image sequences
Computer Vision and Image Understanding
(1999) - et al.
Visual contour tracking based on particle filters
Image and Vision Computing
(2003) - L. Goncalves, E. di Bernardo, E. Ursella, P. Perona, Monocular tracking of the human arm in 3D, in: Int. Conf. on...
- I.A. Kakadiaris, D. Metaxas, Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint...
- B. Stenger, P.R.S. Mendonca, R. Cipolla, Model based 3D tracking of an articulated hand, in: Int. Conf. on Computer...
- J. Deutscher, B. North, B. Bascle, A. Blake, Tracking through singularities and discontinuities by random sampling, in:...
- et al.
Condensation—conditional density propagation for visual tracking
International Journal on Computer Vision
(1998) - H. Sidenbladh, M.J. Black, D.J. Fleet, Stochastic tracking of 3D human figures using 2D image motion, in: Eur. Conf. on...
- Y. Wu, J. Lin, T.S. Huang, Capturing natural hand articulation, in: Int. Conf. on Computer Vision, 2001, pp....
- H. Zhou, T.S. Huang, Tracking articulated hand motion with eigen dynamics analysis, in: Int. Conf. on Computer Vision,...