Skip to main content
Log in

Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Visual tracking can be treated as a parameter estimation problem that infers target states based on image observations from video sequences. A richer target representation may incur better chances of successful tracking in cluttered and dynamic environments, and thus enhance the robustness. Richer representations can be constructed by either specifying a detailed model of a single cue or combining a set of rough models of multiple cues. Both approaches increase the dimensionality of the state space, which results in a dramatic increase of computation. To investigate the integration of rough models from multiple cues and to explore computationally efficient algorithms, this paper formulates the problem of multiple cue integration and tracking in a probabilistic framework based on a factorized graphical model. Structured variational analysis of such a graphical model factorizes different modalities and suggests a co-inference process among these modalities. Based on the importance sampling technique, a sequential Monte Carlo algorithm is proposed to provide an efficient simulation and approximation of the co-inferencing of multiple cues. This algorithm runs in real-time at around 30 Hz. Our extensive experiments show that the proposed algorithm performs robustly in a large variety of tracking scenarios. The approach presented in this paper has the potential to solve other problems including sensor fusion problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Azoz, Y., Devi, L., and Sharma, R. 1998. Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition. Santa Barbara, California, pp. 905–910.

  • Birchfield, S. 1998. Ellitical head tracking using intensity gradient and color histograms. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition. Santa Barbara, California, pp. 232–237.

  • Black, M. and Jepson, A. 1996. Eigentracking: Robust matching and tracking of articulated object using a view-based representation. In Proc. European Conf. Computer Vision, vol. 1, pp. 343–356.

    Google Scholar 

  • Blake, A. and Isard, M. 1998. Active Contours. Springer-Verlag: London.

    Google Scholar 

  • Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proc. Conf. Computational Learning Theory, pp. 92–100.

  • Bregler, C. 1997. Learning and recognition human dynamics in video sequences. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 568–574.

  • Cham, T.-J. and Rehg, J. 1999.Amultiple hypothesis approach to figure tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 239–244.

    Google Scholar 

  • Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. II, pp. 142–149.

    Google Scholar 

  • Darrell, T., Gordon, G., Harville, M., and Woodfill, J. 1998. Integrated person tracking using stereo, color and pattern detection. In IEEE Conf. on Computer Vision and Pattern Recognition. Santa Barbra, pp. 601–609.

  • Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38.

    Google Scholar 

  • Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. II, pp. 126–133.

    Google Scholar 

  • Doucet, A., Godsill, S.J., and Andrieu, C. 2000. On sequential monte carlo sampling methods for Bayesian filtering. Statistics and Computing, 10:197–208.

    Google Scholar 

  • Gavrila, D.M. 1999. The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73:82–98.

    Google Scholar 

  • Ghahramani, Z. 1995. Factorial learning and the EM algorithm. In Advanced in Neural Information Processing Systems 7, G. Tesauro, D. Touretzky, and T. Leen (Eds.). Cambridge, MA, MIT Press, pp. 617–624.

    Google Scholar 

  • Ghahramani, Z. and Jordan, M. 1997. Factorial hidden Markov models. Machine Learning, 29:245–275.

    Google Scholar 

  • Hager, G. and Belhumeur, P. 1996. Real-time tracking of image regions with changes in geometry and illumination. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 403–410.

  • Isard, M. and Blake, A. 1996. Contour tracking by stochastic propagation of conditional density. In Proc. of European Conf. on Computer Vision. Cambridge, UK, pp. 343–356.

  • Isard, M. and Blake, A. 1998a. Condensation—Conditional density propagation for visual tracking. Int'l Journal of Computer Vision, 29:5–28.

    Google Scholar 

  • Isard, M. and Blake, A. 1998b. ICONDENSATION: Unifying low-level and high-level tracking in a stochastic framework. In Proc. of European Conf. on Computer Vision, vol. 1, pp. 767–781.

    Google Scholar 

  • Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. 2000. An introduction to variational methods for graphical models. Machine Learning, 37:183–233.

    Google Scholar 

  • Li, B. and Chellapa, R. 2000. Simultaneous tracking and verification via sequential posterior estimation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. II, pp. 110–117.

    Google Scholar 

  • Liu, J. and Chen, R. 1998. Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc., 93:1032–1044.

    Google Scholar 

  • Liu, J., Chen, R., and Logvinenko, T. 2000. A theoretical framework for sequential importance sampling and resampling. In Sequential Monte Carlo in Practice, A. Doucet, N. de Freitas, and N. Gordon (Eds.). New York: Springer-Verlag.

    Google Scholar 

  • MacCormick, J. and Blake, A. 1999. A probabilistic exclusion principle for tracking multiple objects. In Proc. IEEE Int'l Conf. on Computer Vision. Greece, pp. 572–578.

  • MacCormick, J. and Isard, M. 2000. Partitioned sampling, articulated objects, and interface-quality hand tracking. In Proc. of European Conf. on Computer Vision, vol. 2, pp. 3–19.

    Google Scholar 

  • Pavlović, V., Sharma, R., and Huang, T.S. 1997.Visual interpretation of hand gestures for human computer interaction: A review. IEEE Trans. on PAMI, 19:677–695.

    Google Scholar 

  • Rabiner, L. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:257–286.

    Google Scholar 

  • Raja, Y., McKenna, S., and Gong, S. 1998. Colour model selection and adaptation in dynamic scenes. In Proc. of European Conf. on Computer Vision, pp. 460–475.

  • Rasmussen, C. and Hager, G. 1998. Joint probabilistic techniques for tracking multi-part objects. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 16–21.

  • Saul, L. and Jordan, M. 1996. Exploiting tractable substructures in intractable networks. In Advances in Neural Information Processing Systems, D. Touretzky, M. Mozer, and M. Hasselmo (Eds.). MIT Press, Cambridge, MA, vol. 8, pp. 486–492.

    Google Scholar 

  • Swain, M. and Ballard, D. 1991. Color indexing. Int'l Journal of Computer Vision, 7:11–32.

    Google Scholar 

  • Tanner, M.A. 1993. Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer-Verlag, New York.

    Google Scholar 

  • Tao, H., Sawhney, H., and Kumar, R. 1999. A sampling algorithm for detecting and tracking multiple objects. In Proc. ICCV'99 Workshop on Vision Algorithm. Corfu, Greece.

  • Tao, H., Sawhney, H., and Kumar, R. 2000. Dynamic layer representation with applications to tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 134–141.

    Google Scholar 

  • Toyama, K. and Hager, G. 1996. Incremental focus of attention for robust visual tracking. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 189–195.

  • Toyama, K., Krumm, J., Brumitt, B., and Meyers, B. 1999. Wallflower: Principles and practice of background maintenance. In Proc. IEEE Int'l Conf. on Computer Vision. Korfu, Greece, pp. 255–261.

  • Toyama, K. and Wu, Y. 2000. Bootstrap initialization of nonparametric texture models for tracking. In Proc. of European Conf. on Computer Vision. Irland.

  • Wren, C., Azarbayejani, A., Darrel, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. IEEE Trans. on Pattern Analysis and Machine Intelligence, 9:780–785.

    Google Scholar 

  • Wu, Y. and Huang, T.S. 2000. Color tracking by transductive learning. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, vol. I, pp. 133–138.

    Google Scholar 

  • Wu, Y. and Huang T.S. 2001a. Hand modeling, analysis and recognition for vision-based human computer interaction. IEEE Signal Processing Magazine, 18:51–60.

    Google Scholar 

  • Wu, Y. and Huang, T.S. 2001b. Robust visual tracking by coinference learning. In Proc. IEEE Int'l Conference on Computer Vision, Vancouver, vol. II, pp. 26–33.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Huang, T.S. Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning. International Journal of Computer Vision 58, 55–71 (2004). https://doi.org/10.1023/B:VISI.0000016147.97880.cd

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VISI.0000016147.97880.cd

Navigation