Abstract
A new, exemplar-based, probabilistic paradigm for visual tracking is presented. Probabilistic mechanisms are attractive because they handle fusion of information, especially temporal fusion, in a principled manner. Exemplars are selected representatives of raw training data, used here to represent probabilistic mixture distributions of object configurations. Their use avoids tedious hand-construction of object models, and problems with changes of topology.
Using exemplars in place of a parameterized model poses several challenges, addressed here with what we call the “Metric Mixture” (M2) approach, which has a number of attractions. Principally, it provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space. Secondly, it uses a noise model that is learned from training data. Lastly, it eliminates any need for an assumption of probabilistic pixelwise independence.
Experiments demonstrate the effectiveness of the M2 model in two domains: tracking walking people using “chamfer” distances on binary edge images, and tracking mouth movements by means of a shuffle distance.
Similar content being viewed by others
References
Amini, A., Tehrani, S., and Weymouth, T. 1988. Using dynamic programming for minimizing the energy of active contours in the presence of hard constraints. In Proc. 2nd Int. Conf. on Computer Vision, pp. 95–99.
Bartels, R., Beatty, J., and Barsky, B. 1987. An Introduction to Splines for use in Computer Graphics and Geometric Modeling. Morgan Kaufmann: San Mateo, CA.
Bascle, B. and Deriche, R. 1995. Region tracking through image sequences. In Proc. 5th Int. Conf. on Computer Vision, Boston, June 1995, pp. 302–307.
Black, M. and Jepson, A. 1996. Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. In Proc. 4th European Conf. Computer Vision, pp. 329–342.
Blake, A. and Isard, M. 1998. Active Contours. Springer: Berlin.
Brand, M. 1999. Shadow puppetry. In Proc. Int. Conf. on Computer Vision, pp. 1237–1244.
Cootes, T., Edwards, G., and Taylor, C. 1998. Active appearance models. In Proc. European Conf. on Computer Vision, pp. 484–498.
Efros, A. and Leung, T. 1999. Texture synthesis by non-parametric sampling. In Proc. Int. Conf. on Computer Vision, pp. 1033–1038.
Field, D. 1987. Relations between the statistics of natural images and the response properties of cortical cells. J. Optical Soc. of America A., 4:2379–2394.
Freeman, W. and Pasztor, E. 1999. Learning to estimate scenes from images. In Advances in Neural Information Processing Systems, Vol. 11. MIT Press: Cambridge, MA.
Frey, B. and Jojic, N. 2000. Learning graphical models of images, videos and their spatial transformations. In Proc. Conf. Uncertainty in Artificial Intelligence.
Gavrila, D. and Philomin, V. 1999. Real-time object detection for smart vehicles. In Proc. Int. Conf. on Computer Vision, pp. 87–93.
Gelb, A. (Ed.). 1974. Applied Optimal Estimation. MIT Press: Cambridge, MA.
Gordon, N., Salmond, D., and Smith, A. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F, 140(2):107–113.
Hager, G. and Toyama, K. 1996. XVision: Combining image warping and geometric constraints for fast tracking. In Proc. 4th European Conf. Computer Vision, pp. 507–517.
Huttenlocher, D., Noh, J., and Rucklidge, W. 1993. Tracking non-rigid objects in complex scenes. In Proc. 4th Int. Conf. on Computer Vision, pp. 93–101.
Isard, M. and Blake, A. 1996. Visual tracking by stochastic propagation of conditional density. In Proc. 4th European Conf. Computer Vision, Cambridge, UK, April 1996, pp. 343–356.
Kass, M., Witkin, A., and Terzopoulos, D. 1987. Snakes: Active contour models. In Proc. 1st Int. Conf. on Computer Vision, pp. 259–268.
Kutulakos, K. 2000. Approximate N-view stereo. In Proc. European Conf. Computer Vision, Vol. 1, pp. 67–83.
Mumford, D. 1996. Pattern theory: A unifying perspective. In Perception as Bayesian Inference, D. Knill and W. Richard (Eds.), Cambridge University Press: Cambridge, UK, pp. 25–62.
Neven, H. 2000. Eyematic interfaces. In Siggraph Demo Session. Los Angeles.
Rabiner, L.R. 1989. Atutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2): 257–285.
Ripley, B. 1996. Pattern Recognition and Neural Networks. Cambridge University Press: Cambridge.
Storvik, G. 1994. A Bayesian approach to dynamic contours through stochastic sampling and simulated annealing. IEEE Trans. Patt. Anal. Mach. Intel., 16(10):976–986.
Terzopoulos, D. and Szeliski, R. 1992. Tracking with Kalman snakes. In Active Vision, A. Blake and A. Yuille (Eds.), MIT: Cambridge, MA, pp. 3–20.
Vetter, T. and Poggio, T. 1996. Image synthesis from a single example image. In Proc. 4th European Conf. Computer Vision, Cambridge, UK, April 1996, pp. 652–659.
Wei, L.-Y. and Levoy, M. 2000. Fast texture synthesis using tree-structured vector quantization. In Proc. ACM Siggraph, ACM: New York.
Rights and permissions
About this article
Cite this article
Toyama, K., Blake, A. Probabilistic Tracking with Exemplars in a Metric Space. International Journal of Computer Vision 48, 9–19 (2002). https://doi.org/10.1023/A:1014899027014
Issue Date:
DOI: https://doi.org/10.1023/A:1014899027014