Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

Wu, Ying; Huang, Thomas S.

doi:10.1023/B:VISI.0000016147.97880.cd

Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

Published: June 2004

Volume 58, pages 55–71, (2004)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Ying Wu¹ &
Thomas S. Huang²

356 Accesses
112 Citations
Explore all metrics

Abstract

Visual tracking can be treated as a parameter estimation problem that infers target states based on image observations from video sequences. A richer target representation may incur better chances of successful tracking in cluttered and dynamic environments, and thus enhance the robustness. Richer representations can be constructed by either specifying a detailed model of a single cue or combining a set of rough models of multiple cues. Both approaches increase the dimensionality of the state space, which results in a dramatic increase of computation. To investigate the integration of rough models from multiple cues and to explore computationally efficient algorithms, this paper formulates the problem of multiple cue integration and tracking in a probabilistic framework based on a factorized graphical model. Structured variational analysis of such a graphical model factorizes different modalities and suggests a co-inference process among these modalities. Based on the importance sampling technique, a sequential Monte Carlo algorithm is proposed to provide an efficient simulation and approximation of the co-inferencing of multiple cues. This algorithm runs in real-time at around 30 Hz. Our extensive experiments show that the proposed algorithm performs robustly in a large variety of tracking scenarios. The approach presented in this paper has the potential to solve other problems including sensor fusion problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Resource Management for Sensor Fusion in Visual Tracking

Robust Visual Tracking Using Incremental Sparse Representation

Visual tracking in complex scenes through pixel-wise tri-modeling

Article 23 January 2015

References

Azoz, Y., Devi, L., and Sharma, R. 1998. Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition. Santa Barbara, California, pp. 905–910.
Birchfield, S. 1998. Ellitical head tracking using intensity gradient and color histograms. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition. Santa Barbara, California, pp. 232–237.
Black, M. and Jepson, A. 1996. Eigentracking: Robust matching and tracking of articulated object using a view-based representation. In Proc. European Conf. Computer Vision, vol. 1, pp. 343–356.
Google Scholar
Blake, A. and Isard, M. 1998. Active Contours. Springer-Verlag: London.
Google Scholar
Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proc. Conf. Computational Learning Theory, pp. 92–100.
Bregler, C. 1997. Learning and recognition human dynamics in video sequences. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 568–574.
Cham, T.-J. and Rehg, J. 1999.Amultiple hypothesis approach to figure tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 239–244.
Google Scholar
Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. II, pp. 142–149.
Google Scholar
Darrell, T., Gordon, G., Harville, M., and Woodfill, J. 1998. Integrated person tracking using stereo, color and pattern detection. In IEEE Conf. on Computer Vision and Pattern Recognition. Santa Barbra, pp. 601–609.
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38.
Google Scholar
Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. II, pp. 126–133.
Google Scholar
Doucet, A., Godsill, S.J., and Andrieu, C. 2000. On sequential monte carlo sampling methods for Bayesian filtering. Statistics and Computing, 10:197–208.
Google Scholar
Gavrila, D.M. 1999. The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73:82–98.
Google Scholar
Ghahramani, Z. 1995. Factorial learning and the EM algorithm. In Advanced in Neural Information Processing Systems 7, G. Tesauro, D. Touretzky, and T. Leen (Eds.). Cambridge, MA, MIT Press, pp. 617–624.
Google Scholar
Ghahramani, Z. and Jordan, M. 1997. Factorial hidden Markov models. Machine Learning, 29:245–275.
Google Scholar
Hager, G. and Belhumeur, P. 1996. Real-time tracking of image regions with changes in geometry and illumination. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 403–410.
Isard, M. and Blake, A. 1996. Contour tracking by stochastic propagation of conditional density. In Proc. of European Conf. on Computer Vision. Cambridge, UK, pp. 343–356.
Isard, M. and Blake, A. 1998a. Condensation—Conditional density propagation for visual tracking. Int'l Journal of Computer Vision, 29:5–28.
Google Scholar
Isard, M. and Blake, A. 1998b. ICONDENSATION: Unifying low-level and high-level tracking in a stochastic framework. In Proc. of European Conf. on Computer Vision, vol. 1, pp. 767–781.
Google Scholar
Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. 2000. An introduction to variational methods for graphical models. Machine Learning, 37:183–233.
Google Scholar
Li, B. and Chellapa, R. 2000. Simultaneous tracking and verification via sequential posterior estimation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. II, pp. 110–117.
Google Scholar
Liu, J. and Chen, R. 1998. Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc., 93:1032–1044.
Google Scholar
Liu, J., Chen, R., and Logvinenko, T. 2000. A theoretical framework for sequential importance sampling and resampling. In Sequential Monte Carlo in Practice, A. Doucet, N. de Freitas, and N. Gordon (Eds.). New York: Springer-Verlag.
Google Scholar
MacCormick, J. and Blake, A. 1999. A probabilistic exclusion principle for tracking multiple objects. In Proc. IEEE Int'l Conf. on Computer Vision. Greece, pp. 572–578.
MacCormick, J. and Isard, M. 2000. Partitioned sampling, articulated objects, and interface-quality hand tracking. In Proc. of European Conf. on Computer Vision, vol. 2, pp. 3–19.
Google Scholar
Pavlović, V., Sharma, R., and Huang, T.S. 1997.Visual interpretation of hand gestures for human computer interaction: A review. IEEE Trans. on PAMI, 19:677–695.
Google Scholar
Rabiner, L. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:257–286.
Google Scholar
Raja, Y., McKenna, S., and Gong, S. 1998. Colour model selection and adaptation in dynamic scenes. In Proc. of European Conf. on Computer Vision, pp. 460–475.
Rasmussen, C. and Hager, G. 1998. Joint probabilistic techniques for tracking multi-part objects. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 16–21.
Saul, L. and Jordan, M. 1996. Exploiting tractable substructures in intractable networks. In Advances in Neural Information Processing Systems, D. Touretzky, M. Mozer, and M. Hasselmo (Eds.). MIT Press, Cambridge, MA, vol. 8, pp. 486–492.
Google Scholar
Swain, M. and Ballard, D. 1991. Color indexing. Int'l Journal of Computer Vision, 7:11–32.
Google Scholar
Tanner, M.A. 1993. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer-Verlag, New York.
Google Scholar
Tao, H., Sawhney, H., and Kumar, R. 1999. A sampling algorithm for detecting and tracking multiple objects. In Proc. ICCV'99 Workshop on Vision Algorithm. Corfu, Greece.
Tao, H., Sawhney, H., and Kumar, R. 2000. Dynamic layer representation with applications to tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 134–141.
Google Scholar
Toyama, K. and Hager, G. 1996. Incremental focus of attention for robust visual tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 189–195.
Toyama, K., Krumm, J., Brumitt, B., and Meyers, B. 1999. Wallflower: Principles and practice of background maintenance. In Proc. IEEE Int'l Conf. on Computer Vision. Korfu, Greece, pp. 255–261.
Toyama, K. and Wu, Y. 2000. Bootstrap initialization of nonparametric texture models for tracking. In Proc. of European Conf. on Computer Vision. Irland.
Wren, C., Azarbayejani, A., Darrel, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. IEEE Trans. on Pattern Analysis and Machine Intelligence, 9:780–785.
Google Scholar
Wu, Y. and Huang, T.S. 2000. Color tracking by transductive learning. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. I, pp. 133–138.
Google Scholar
Wu, Y. and Huang T.S. 2001a. Hand modeling, analysis and recognition for vision-based human computer interaction. IEEE Signal Processing Magazine, 18:51–60.
Google Scholar
Wu, Y. and Huang, T.S. 2001b. Robust visual tracking by coinference learning. In Proc. IEEE Int'l Conference on Computer Vision, Vancouver, vol. II, pp. 26–33.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical & Computer Engineering, Northwestern University, 2145 Sheridan Road, Evanston, IL, 60208, USA
Ying Wu
Beckman Institute, University of Illinois at Urbana-Champaign, 405 N. Mathews, Urbana, IL, 61801, USA
Thomas S. Huang

Authors

Ying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Huang, T.S. Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning. International Journal of Computer Vision 58, 55–71 (2004). https://doi.org/10.1023/B:VISI.0000016147.97880.cd

Download citation

Issue Date: June 2004
DOI: https://doi.org/10.1023/B:VISI.0000016147.97880.cd

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

Abstract

Access this article

Similar content being viewed by others

Adaptive Resource Management for Sensor Fusion in Visual Tracking

Robust Visual Tracking Using Incremental Sparse Representation

Visual tracking in complex scenes through pixel-wise tri-modeling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

Abstract

Access this article

Similar content being viewed by others

Adaptive Resource Management for Sensor Fusion in Visual Tracking

Robust Visual Tracking Using Incremental Sparse Representation

Visual tracking in complex scenes through pixel-wise tri-modeling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation