Skip to main content
Log in

Recursive Compositional Models for Vision: Description and Review of Recent Work

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

This paper describes and reviews a class of hierarchical probabilistic models of images and objects. Visual structures are represented in a hierarchical form where complex structures are composed of more elementary structures following a design principle of recursive composition. Probabilities are defined over these structures which exploit properties of the hierarchy—e.g. long range spatial relationships can be represented by local potentials at the upper levels of the hierarchy. The compositional nature of this representation enables efficient learning and inference algorithms. In particular, parts can be shared between different object models. Overall the architecture of Recursive Compositional Models (RCMs) provides a balance between statistical and computational complexity.

The goal of this paper is to describe the basic ideas and common themes of RCMs, to illustrate their success on a range of vision tasks, and to gives pointers to the literature. In particular, we show that RCMs generally give state of the art results when applied to a range of different vision tasks and evaluated on the leading benchmarked datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufmann, San Mateo (1988)

    Google Scholar 

  2. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  3. Heckerman, D.: A tutorial on learning with Bayesian networks, pp. 301–354 (1999)

  4. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd ed. Prentice Hall, New York (2003)

    Google Scholar 

  5. Zhu, S., Mumford, D.: A stochastic grammar of images 2(4), 259–362 (2006)

    MATH  Google Scholar 

  6. Grenander, U.: Pattern Synthesis: Lectures in Pattern Theory 1. Springer, New York (1976)

    Google Scholar 

  7. Grenander, U.: Pattern Analysis: Lectures in Pattern Theory 2. Springer, New York (1978)

    Google Scholar 

  8. Tenenbaum, J., Yuille, A.: IPAM Summer School: The Mathematics of the Mind IPAM, UCLA (2007)

  9. Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2), pp. 2145–2152 (2006)

    Google Scholar 

  10. Zhu, L., Yuille, A.L.: A hierarchical compositional system for rapid object detection. In: NIPS (2005)

    Google Scholar 

  11. Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A.L.: Max margin and/or graph learning for parsing the human body. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  12. Zhu, L., Chen, Y., Ye, X., Yuille, A.L.: Learning a hierarchical log-linear model for rapid deformable object parsing. In: CVPR (2008)

    Google Scholar 

  13. Zhu, L., Chen, Y., Lin, Y., Yuille, A.L.: A hierarchical image model for polynomial-time 2d parsing. In: Advances in Neural Information Processing System (2008)

    Google Scholar 

  14. Zhu, L., Lin, C., Huang, H., Chen, Y., Yuille, A.L.: Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Proceedings of the 10th European Conference on Computer Vision (2008)

    Google Scholar 

  15. Zhu, L., Chen, Y., Yuille, A.L.: Learning a hierarchical deformable template for rapid deformable object parsing. In: Transactions on Pattern Analysis and Machine Intelligence (2009)

    Google Scholar 

  16. Zhu, L., Chen, Y., Torrabla, A., Freeman, W., Yuille, A.L.: Recursive compositional models with re-usable parts for multi-view multi-object detection and parsing. In: CVPR (2010)

    Google Scholar 

  17. Zhu, L., Chen, Y., Yuille, A.L., Freeman, W.: Latent hierarchical structure learning for object detection. In: CVPR (2010)

    Google Scholar 

  18. Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: ECCV (2), pp. 109–124 (2002)

    Google Scholar 

  19. Van Gool, E.M.L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes. Challenge, (2007). (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  20. Russell, B., Torralba, A., Murphy, K., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. In: IJCV (2008)

    Google Scholar 

  21. Mori, G.: Guiding model search using segmentation. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1417–1423 (2005)

    Google Scholar 

  22. Magee, D.R., Boyle, R.D.: Detecting lameness using ‘re-sampling condensation’ and ‘multi-stream cyclic hidden Markov models’. In: Image and Vision Computing, vol. 20, p. 2002 (2002)

    Google Scholar 

  23. Li, H., Yan, S.-C., Peng, L.-Z.: Robust non-frontal face alignment with edge based texture. J. Comput. Sci. Technol. 20(6), 849–854 (2005)

    Article  Google Scholar 

  24. Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV (1), pp. 1–15 (2006)

    Google Scholar 

  25. Tu, Z., Zhu, S.C.: Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 657–673 (2002)

    Article  Google Scholar 

  26. Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection, and recognition. In: ICCV, pp. 18–25 (2003)

    Google Scholar 

  27. Chen, H., Xu, Z., Liu, Z., Zhu, S.C.: Composite templates for cloth modeling and sketching. In: CVPR (1), pp. 943–950 (2006)

    Google Scholar 

  28. Wu, Y., Si, Z., Fleming, C., Zhu, S.: Deformable template as active basis. In: Proceedings of International Conference of Computer Vision (2007)

    Google Scholar 

  29. Amit, Y., Geman, D., Fan, X.: A coarse-to-fine strategy for multiclass shape detection. IEEE Trans. Pattern Anal. Mach. Intell. 26(12), 1606–1621 (2004)

    Article  Google Scholar 

  30. Willsky, A.S.: Multiresolution Markov models for signal and image processing. Proc. IEEE 90(8), 1396–1458 (2002)

    Article  Google Scholar 

  31. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  32. Ranzato, M., Boureau, Y.-L., LeCun, Y.: Sparse feature learning for deep belief networks. In: NIPS (2007)

    Google Scholar 

  33. Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 19 (2007)

    Google Scholar 

  34. Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: International Conference on Machine Learning (ICML) (2009)

    Google Scholar 

  35. Riesenhuber, M., Poggio, T.: Cbf: A new framework for object categorization in cortex. In: Biologically Motivated Computer Vision, pp. 1–9 (2000)

    Chapter  Google Scholar 

  36. Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: CVPR (2), pp. 994–1000 (2005)

    Google Scholar 

  37. Thorpe, S., Fabre-Thorpe, M.: Seeking categories in the brain. Science 291(5502), 260–263 (2001)

    Article  Google Scholar 

  38. Fukushima, K.: Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw. 1(2), 119–130 (1988)

    Article  Google Scholar 

  39. Jensen, F.V., Lauritzen, S.L., Olesen, K.G.: Bayesian updating in causal probabilistic networks by local computations. Comput. Stat.Q. 4, 269–282 (1990)

    MathSciNet  Google Scholar 

  40. Tenenbaum, J., Griffiths, T., Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10(7), 309–318 (2006)

    Article  Google Scholar 

  41. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP, pp. 1–8 (2002)

    Google Scholar 

  42. Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003)

    Google Scholar 

  43. Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: EMNLP (2004)

    Google Scholar 

  44. Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: ICML, pp. 3–10 (2003)

    Google Scholar 

  45. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)

    Google Scholar 

  46. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and Bayesian restoration of images (1984)

  47. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)

    Google Scholar 

  48. Geiger, D., Ladendorf, B., Yuille, A.: Occlusions and binocular stereo (1995)

  49. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 359–374 (2001)

    Google Scholar 

  50. Geiger, D., Yuille, A.: A common framework for image segmentation (1991)

  51. Felzenswalb, P., Huttenlocher, D.: Efficient belief propagation for early vision (2004)

  52. Viola, P., Jones, M.: Robust real-time object detection. International Journal of Computer Vision (2001)

    Google Scholar 

  53. Konishi, S., Yuille, A.L., Coughlan, J.M., Zhu, S.C.: Statistical edge detection: Learning and evaluating edge cues. IEEE Trans. Pattern Anal. Mach. Intell. 25, 57–74 (2003)

    Article  Google Scholar 

  54. Zhu, S.C., Wu, Y.N., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural Comput. 9(8), 1627–1660 (1997)

    Article  Google Scholar 

  55. Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV, pp. 1150–1157 (2003)

    Google Scholar 

  56. Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1), 67–92 (1973)

    Article  Google Scholar 

  57. Yuille, A.L., Hallinan, P.W., Cohen, D.S.: Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 8(2), 99–111 (1992)

    Article  Google Scholar 

  58. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. Lecture Notes in Computer Science 1407, 484 (1998). [Online]. Available: citeseer.ist.psu.edu/cootes98active.html

    Article  Google Scholar 

  59. Coughlan, J.M., Yuille, A.L., English, C., Snow, D.: Efficient deformable template detection and localization without user initialization. Comput. Vis. Image Underst. 78(3), 303–319 (2000)

    Article  Google Scholar 

  60. Chui, H., Rangarajan, A.: A new algorithm for non-rigid point matching. In: CVPR, pp. 2044–2051 (2000)

    Google Scholar 

  61. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)

    Article  Google Scholar 

  62. Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)

    Article  Google Scholar 

  63. Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: CVPR (1), pp. 380–387 (2005)

    Google Scholar 

  64. Tu, Z., Yuille, A.L.: Shape matching and recognition—using generative models and informative features. In: ECCV (3), pp. 195–209 (2004)

    Google Scholar 

  65. He, X., Zemel, R.S., Carreira-Perpiñán, M.Á.: Multiscale conditional random fields for image labeling. In: CVPR (2), pp. 695–702 (2004)

    Google Scholar 

  66. Winn, J.M., Jojic, N.: Locus: Learning object classes with unsupervised segmentation. In: ICCV, pp. 756–763 (2005)

    Google Scholar 

  67. Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: CVPR (1), pp. 18–25 (2005)

    Google Scholar 

  68. Felzenszwalb, P.F., Grishick, R.B., McAllister, D., Ramanan, D.: Object detection with discriminatively trained part based models. In: PAMI (2009)

    Google Scholar 

  69. Ahuja, N., Todorovic, S.: Learning the taxonomy and models of categories present in arbitrary image. In: ICCV (2007)

    Google Scholar 

  70. Sharon, E., Brandt, A., Basri, R.: Fast multiscale image segmentation. In: CVPR, pp. 1070–1077 (2000)

    Google Scholar 

  71. Kokkinos, I., Yuille, A.L.: Hop: Hierarchical object parsing. In: CVPR (2009)

    Google Scholar 

  72. Zhu, L., Yuille, A.L.: A hierarchical compositional system for rapid object detection. In: NIPS (2005)

    Google Scholar 

  73. Chen, Y., Zhu, L., Lin, C., Yuille, A.L., Zhang, H.: Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In: NIPS (2007)

    Google Scholar 

  74. He, C.: Empirical studies of structured learning for deformable object parsing. Master’s Thesis, Department of Statistics, UCLA (2008)

  75. Wu, S., He, X., Lu, H., Yuille, A.: A unified model of short-rang and long-range motion perception. In: NIPS (2010)

    Google Scholar 

  76. Yu, C.-N.J., Joachims, T.: Learning structural SVMs with latent variables. In: International Conference on Machine Learning (ICML) (2009)

    Google Scholar 

  77. Yuille, A.L., Rangarajan, A.: The concave-convex procedure (CCCP). In: NIPS, pp. 1033–1040 (2001)

    Google Scholar 

  78. Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV’04 Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, 2004, pp. 17–32 (2004)

    Google Scholar 

  79. Coughlan, J., Yuille, A.L.: Bayesian a* tree search with expected o(n) node expansions for road tracking. Neural Comput. 14(8), 1929–1958 (2002)

    Article  MATH  Google Scholar 

  80. Ren, X., Fowlkes, C., Malik, J.: Cue integration for figure/ground labeling. In: NIPS (2005)

    Google Scholar 

  81. Borenstein, E., Malik, J.: Shape guided object segmentation. In: CVPR (1), pp. 969–976 (2006)

    Google Scholar 

  82. Cour, T., Shi, J.: Recognizing objects by piecing together the segmentation puzzle. In: CVPR (2007)

    Google Scholar 

  83. Levin, A., Weiss, Y.: Learning to combine bottom-up and top-down segmentation. In: ECCV (4), pp. 581–594 (2006)

    Google Scholar 

  84. Srinivasan, P., Shi, J.: Bottom-up recognition and parsing of the human body. In: CVPR (2007)

    Google Scholar 

  85. Verbeek, J., Triggs, B.: Region classification with Markov field aspect models. In: CVPR (2007)

    Google Scholar 

  86. Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  87. Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: Proceedings of the International Conference on Computer Vision (2009)

    Google Scholar 

  88. Chen, Y., Zhu, L., Yuille, A.L.: Active mask hierarchies for object detection. In: Proceedings of the 12th European Conference on Computer Vision (2010)

    Google Scholar 

  89. Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1, 113–141 (2000)

    Article  MathSciNet  Google Scholar 

  90. Fidler, S., Leonardis, A.: Towards scalable representations of object categories: learning a hierarchy of parts. In: CVPR (2007)

    Google Scholar 

  91. Rother, C., Kolmogorov, V., Blake, A.: “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Yuille.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, L.(., Chen, Y. & Yuille, A. Recursive Compositional Models for Vision: Description and Review of Recent Work. J Math Imaging Vis 41, 122–146 (2011). https://doi.org/10.1007/s10851-011-0282-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-011-0282-2

Keywords

Navigation