Recursive Compositional Models for Vision: Description and Review of Recent Work

Zhu, Long (Leo); Chen, Yuanhao; Yuille, Alan

doi:10.1007/s10851-011-0282-2

Recursive Compositional Models for Vision: Description and Review of Recent Work

Published: 29 April 2011

Volume 41, pages 122–146, (2011)
Cite this article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Long (Leo) Zhu¹,
Yuanhao Chen² &
Alan Yuille^2,3

494 Accesses
13 Citations
Explore all metrics

Abstract

This paper describes and reviews a class of hierarchical probabilistic models of images and objects. Visual structures are represented in a hierarchical form where complex structures are composed of more elementary structures following a design principle of recursive composition. Probabilities are defined over these structures which exploit properties of the hierarchy—e.g. long range spatial relationships can be represented by local potentials at the upper levels of the hierarchy. The compositional nature of this representation enables efficient learning and inference algorithms. In particular, parts can be shared between different object models. Overall the architecture of Recursive Compositional Models (RCMs) provides a balance between statistical and computational complexity.

The goal of this paper is to describe the basic ideas and common themes of RCMs, to illustrate their success on a range of vision tasks, and to gives pointers to the literature. In particular, we show that RCMs generally give state of the art results when applied to a range of different vision tasks and evaluated on the leading benchmarked datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufmann, San Mateo (1988)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Heckerman, D.: A tutorial on learning with Bayesian networks, pp. 301–354 (1999)
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd ed. Prentice Hall, New York (2003)
Google Scholar
Zhu, S., Mumford, D.: A stochastic grammar of images 2(4), 259–362 (2006)
MATH Google Scholar
Grenander, U.: Pattern Synthesis: Lectures in Pattern Theory 1. Springer, New York (1976)
Google Scholar
Grenander, U.: Pattern Analysis: Lectures in Pattern Theory 2. Springer, New York (1978)
Google Scholar
Tenenbaum, J., Yuille, A.: IPAM Summer School: The Mathematics of the Mind IPAM, UCLA (2007)
Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2), pp. 2145–2152 (2006)
Google Scholar
Zhu, L., Yuille, A.L.: A hierarchical compositional system for rapid object detection. In: NIPS (2005)
Google Scholar
Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A.L.: Max margin and/or graph learning for parsing the human body. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Zhu, L., Chen, Y., Ye, X., Yuille, A.L.: Learning a hierarchical log-linear model for rapid deformable object parsing. In: CVPR (2008)
Google Scholar
Zhu, L., Chen, Y., Lin, Y., Yuille, A.L.: A hierarchical image model for polynomial-time 2d parsing. In: Advances in Neural Information Processing System (2008)
Google Scholar
Zhu, L., Lin, C., Huang, H., Chen, Y., Yuille, A.L.: Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Proceedings of the 10th European Conference on Computer Vision (2008)
Google Scholar
Zhu, L., Chen, Y., Yuille, A.L.: Learning a hierarchical deformable template for rapid deformable object parsing. In: Transactions on Pattern Analysis and Machine Intelligence (2009)
Google Scholar
Zhu, L., Chen, Y., Torrabla, A., Freeman, W., Yuille, A.L.: Recursive compositional models with re-usable parts for multi-view multi-object detection and parsing. In: CVPR (2010)
Google Scholar
Zhu, L., Chen, Y., Yuille, A.L., Freeman, W.: Latent hierarchical structure learning for object detection. In: CVPR (2010)
Google Scholar
Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: ECCV (2), pp. 109–124 (2002)
Google Scholar
Van Gool, E.M.L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes. Challenge, (2007). (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Russell, B., Torralba, A., Murphy, K., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. In: IJCV (2008)
Google Scholar
Mori, G.: Guiding model search using segmentation. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1417–1423 (2005)
Google Scholar
Magee, D.R., Boyle, R.D.: Detecting lameness using ‘re-sampling condensation’ and ‘multi-stream cyclic hidden Markov models’. In: Image and Vision Computing, vol. 20, p. 2002 (2002)
Google Scholar
Li, H., Yan, S.-C., Peng, L.-Z.: Robust non-frontal face alignment with edge based texture. J. Comput. Sci. Technol. 20(6), 849–854 (2005)
Article Google Scholar
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV (1), pp. 1–15 (2006)
Google Scholar
Tu, Z., Zhu, S.C.: Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 657–673 (2002)
Article Google Scholar
Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection, and recognition. In: ICCV, pp. 18–25 (2003)
Google Scholar
Chen, H., Xu, Z., Liu, Z., Zhu, S.C.: Composite templates for cloth modeling and sketching. In: CVPR (1), pp. 943–950 (2006)
Google Scholar
Wu, Y., Si, Z., Fleming, C., Zhu, S.: Deformable template as active basis. In: Proceedings of International Conference of Computer Vision (2007)
Google Scholar
Amit, Y., Geman, D., Fan, X.: A coarse-to-fine strategy for multiclass shape detection. IEEE Trans. Pattern Anal. Mach. Intell. 26(12), 1606–1621 (2004)
Article Google Scholar
Willsky, A.S.: Multiresolution Markov models for signal and image processing. Proc. IEEE 90(8), 1396–1458 (2002)
Article Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Ranzato, M., Boureau, Y.-L., LeCun, Y.: Sparse feature learning for deep belief networks. In: NIPS (2007)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 19 (2007)
Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: International Conference on Machine Learning (ICML) (2009)
Google Scholar
Riesenhuber, M., Poggio, T.: Cbf: A new framework for object categorization in cortex. In: Biologically Motivated Computer Vision, pp. 1–9 (2000)
Chapter Google Scholar
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: CVPR (2), pp. 994–1000 (2005)
Google Scholar
Thorpe, S., Fabre-Thorpe, M.: Seeking categories in the brain. Science 291(5502), 260–263 (2001)
Article Google Scholar
Fukushima, K.: Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw. 1(2), 119–130 (1988)
Article Google Scholar
Jensen, F.V., Lauritzen, S.L., Olesen, K.G.: Bayesian updating in causal probabilistic networks by local computations. Comput. Stat.Q. 4, 269–282 (1990)
MathSciNet Google Scholar
Tenenbaum, J., Griffiths, T., Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10(7), 309–318 (2006)
Article Google Scholar
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP, pp. 1–8 (2002)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003)
Google Scholar
Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: EMNLP (2004)
Google Scholar
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: ICML, pp. 3–10 (2003)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)
Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and Bayesian restoration of images (1984)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)
Google Scholar
Geiger, D., Ladendorf, B., Yuille, A.: Occlusions and binocular stereo (1995)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 359–374 (2001)
Google Scholar
Geiger, D., Yuille, A.: A common framework for image segmentation (1991)
Felzenswalb, P., Huttenlocher, D.: Efficient belief propagation for early vision (2004)
Viola, P., Jones, M.: Robust real-time object detection. International Journal of Computer Vision (2001)
Google Scholar
Konishi, S., Yuille, A.L., Coughlan, J.M., Zhu, S.C.: Statistical edge detection: Learning and evaluating edge cues. IEEE Trans. Pattern Anal. Mach. Intell. 25, 57–74 (2003)
Article Google Scholar
Zhu, S.C., Wu, Y.N., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural Comput. 9(8), 1627–1660 (1997)
Article Google Scholar
Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV, pp. 1150–1157 (2003)
Google Scholar
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1), 67–92 (1973)
Article Google Scholar
Yuille, A.L., Hallinan, P.W., Cohen, D.S.: Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 8(2), 99–111 (1992)
Article Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. Lecture Notes in Computer Science 1407, 484 (1998). [Online]. Available: citeseer.ist.psu.edu/cootes98active.html
Article Google Scholar
Coughlan, J.M., Yuille, A.L., English, C., Snow, D.: Efficient deformable template detection and localization without user initialization. Comput. Vis. Image Underst. 78(3), 303–319 (2000)
Article Google Scholar
Chui, H., Rangarajan, A.: A new algorithm for non-rigid point matching. In: CVPR, pp. 2044–2051 (2000)
Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Article Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Article Google Scholar
Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: CVPR (1), pp. 380–387 (2005)
Google Scholar
Tu, Z., Yuille, A.L.: Shape matching and recognition—using generative models and informative features. In: ECCV (3), pp. 195–209 (2004)
Google Scholar
He, X., Zemel, R.S., Carreira-Perpiñán, M.Á.: Multiscale conditional random fields for image labeling. In: CVPR (2), pp. 695–702 (2004)
Google Scholar
Winn, J.M., Jojic, N.: Locus: Learning object classes with unsupervised segmentation. In: ICCV, pp. 756–763 (2005)
Google Scholar
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: CVPR (1), pp. 18–25 (2005)
Google Scholar
Felzenszwalb, P.F., Grishick, R.B., McAllister, D., Ramanan, D.: Object detection with discriminatively trained part based models. In: PAMI (2009)
Google Scholar
Ahuja, N., Todorovic, S.: Learning the taxonomy and models of categories present in arbitrary image. In: ICCV (2007)
Google Scholar
Sharon, E., Brandt, A., Basri, R.: Fast multiscale image segmentation. In: CVPR, pp. 1070–1077 (2000)
Google Scholar
Kokkinos, I., Yuille, A.L.: Hop: Hierarchical object parsing. In: CVPR (2009)
Google Scholar
Zhu, L., Yuille, A.L.: A hierarchical compositional system for rapid object detection. In: NIPS (2005)
Google Scholar
Chen, Y., Zhu, L., Lin, C., Yuille, A.L., Zhang, H.: Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In: NIPS (2007)
Google Scholar
He, C.: Empirical studies of structured learning for deformable object parsing. Master’s Thesis, Department of Statistics, UCLA (2008)
Wu, S., He, X., Lu, H., Yuille, A.: A unified model of short-rang and long-range motion perception. In: NIPS (2010)
Google Scholar
Yu, C.-N.J., Joachims, T.: Learning structural SVMs with latent variables. In: International Conference on Machine Learning (ICML) (2009)
Google Scholar
Yuille, A.L., Rangarajan, A.: The concave-convex procedure (CCCP). In: NIPS, pp. 1033–1040 (2001)
Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV’04 Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, 2004, pp. 17–32 (2004)
Google Scholar
Coughlan, J., Yuille, A.L.: Bayesian a* tree search with expected o(n) node expansions for road tracking. Neural Comput. 14(8), 1929–1958 (2002)
Article MATH Google Scholar
Ren, X., Fowlkes, C., Malik, J.: Cue integration for figure/ground labeling. In: NIPS (2005)
Google Scholar
Borenstein, E., Malik, J.: Shape guided object segmentation. In: CVPR (1), pp. 969–976 (2006)
Google Scholar
Cour, T., Shi, J.: Recognizing objects by piecing together the segmentation puzzle. In: CVPR (2007)
Google Scholar
Levin, A., Weiss, Y.: Learning to combine bottom-up and top-down segmentation. In: ECCV (4), pp. 581–594 (2006)
Google Scholar
Srinivasan, P., Shi, J.: Bottom-up recognition and parsing of the human body. In: CVPR (2007)
Google Scholar
Verbeek, J., Triggs, B.: Region classification with Markov field aspect models. In: CVPR (2007)
Google Scholar
Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: Proceedings of the International Conference on Computer Vision (2009)
Google Scholar
Chen, Y., Zhu, L., Yuille, A.L.: Active mask hierarchies for object detection. In: Proceedings of the 12th European Conference on Computer Vision (2010)
Google Scholar
Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1, 113–141 (2000)
Article MathSciNet Google Scholar
Fidler, S., Leonardis, A.: Towards scalable representations of object categories: learning a hierarchy of parts. In: CVPR (2007)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, New York University, New York, NY, USA
Long (Leo) Zhu
Department of Statistics, University of California at Los Angeles, Los Angeles, CA, 90095, USA
Yuanhao Chen & Alan Yuille
Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea
Alan Yuille

Authors

Long (Leo) Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Alan Yuille
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan Yuille.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, L.(., Chen, Y. & Yuille, A. Recursive Compositional Models for Vision: Description and Review of Recent Work. J Math Imaging Vis 41, 122–146 (2011). https://doi.org/10.1007/s10851-011-0282-2

Download citation

Published: 29 April 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10851-011-0282-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive Compositional Models for Vision: Description and Review of Recent Work

Abstract

Access this article

Similar content being viewed by others

A Unified Framework for Compositional Fitting of Active Appearance Models

A Simple Stochastic Algorithm for Structural Features Learning

Compositional Models: Iterative Structure Learning from Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recursive Compositional Models for Vision: Description and Review of Recent Work

Abstract

Access this article

Similar content being viewed by others

A Unified Framework for Compositional Fitting of Active Appearance Models

A Simple Stochastic Algorithm for Structural Features Learning

Compositional Models: Iterative Structure Learning from Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation