Skip to main content

Multi-view Object Categorization and Pose Estimation

  • Chapter
Computer Vision

Part of the book series: Studies in Computational Intelligence ((SCI,volume 285))

Abstract

Object and scene categorization has been a central topic of computer vision research in recent years. The problem is a highly challenging one. A single object may show tremendous variability in appearance and structure under various photometric and geometric conditions. In addition, members of the same class may differ from each other due to various degrees of intra-class variability. Recently, researchers have proposed new models towards the goal of: i) finding a suitable representation that can efficiently capture the intrinsic three-dimensional and multi-view nature of object categories; ii) taking advantage of this representation to help the recognition and categorization task. In this Chapter we will review recent approaches aimed at tackling this challenging problem and focus on the work by Savarese & Fei-Fei [54, 55]. In [54, 55] multi-view object models are obtained by linking together diagnostic parts of the objects from different viewing point. Instead of recovering a full 3D geometry, parts are connected through their mutual homographic transformation. The resulting model is a compact summarization of both the appearance and geometry information of the object class. We show that such a model can be learnt via minimal supervision compared to competitive techniques. The model can be used to detect objects under arbitrary and/or unseen poses by means of a two-step algorithm. This algorithm, inspired by works in single object view synthesis (e.g., Seitz & Dyer [57]), has the ability to synthesize object appearance and shape properties at recognition time, and in turn estimate the object pose that best matches the observations.We conclude this Chapter by presenting experiments on detection, recognition and pose estimation results with respect to two datasets in [54,55] as well as to PASCAL Visual Object Classes (VOC) dataset [15]. Experiments indicate that representation and algorithms presented in [54,55] can be successfully employed in a number of generic object recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The princeton shape benchmark. In: Proceedings of the Shape Modeling International, pp. 167–178 (2004)

    Google Scholar 

  2. Arie-Nachimson, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: Proceedings of the International Conference on Computer Vision (2009)

    Google Scholar 

  3. Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981)

    Article  MATH  Google Scholar 

  4. Bart, E., Byvatov, E., Ullman, S.: View-invariant recognition using corresponding object fragments. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 152–165. Springer, Heidelberg (2004)

    Google Scholar 

  5. Bowyer, K., Dyer, R.: Aspect graphs: An introduction and survey of recent results. International Journal of Imaging Systems and Technology 2(4), 315–328 (1990)

    Article  Google Scholar 

  6. Brown, M., Lowe, D.G.: Unsupervised 3d object recognition and reconstruction in unordered datasets. In: Proceedings of the Fifth International Conference on 3-D Digital Imaging and Modeling, pp. 56–63 (2005)

    Google Scholar 

  7. Burl, M.C., Weber, M., Perona, P.: A probabilistic approach to object recognition using local photometry and global geometry. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, p. 628. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Chen, S., Williams, L.: View interpolation for image synthesis. Computer Graphics 27, 279–288 (1993)

    Google Scholar 

  9. Chiu, H.P., Kaelbling, L.P., Lozano-Perez, T.: Virtual training for multi-view object class recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  10. Cyr, C., Kimia, B.: A similarity-based aspect-graph approach to 3D object recognition. International Journal of Computer Vision 57(1), 5–22 (2004)

    Article  Google Scholar 

  11. Dance, C., Willamowski, J., Fan, L., Bray, C., Csurka, G.: Visual categorization with bags of keypoints. In: Proceedigs of the ECCV International Workshop on Statistical Learning in Computer Vision (2004)

    Google Scholar 

  12. Dickinson, S.J., Pentland, A.P., Rosenfeld, A.: 3-d shape recovery using distributed aspect matching. IEEE Transaction on Pattern Analisys and Machine Intelligence 14(2), 174–198 (1992)

    Article  Google Scholar 

  13. Eggert, D., Bowyer, K.: Computing the perspective projection aspect graph of solids of revolution. IEEE Transaction on Pattern Analisys and Machine Intelligence 15(2), 109–128 (1993)

    Article  Google Scholar 

  14. Eggert, D., Bowyer, K., Dyer, C., Christensen, H., Goldgof, D.: The scale space aspect graph. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1114–1130 (1993)

    Article  Google Scholar 

  15. Everingham, M., et al.: The 2005 pascal visual object class challenge. In Proceedings of the 1st PASCAL Challenges Workshop (to appear)

    Google Scholar 

  16. Farhadi, A., Tabrizi, J., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: Proceedings of the International Conference on Computer Vision (2009)

    Google Scholar 

  17. Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. CVPR Short Course (2007)

    Google Scholar 

  18. Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 2066–2073 (2000)

    Google Scholar 

  19. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 264–271 (2003)

    Google Scholar 

  20. Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation from single or multiple model views. Iternational Journal of Computer Vision (2006)

    Google Scholar 

  21. Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM 24, 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  22. Frome, A., Huber, D., Kolluri, R., Bulow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)

    Google Scholar 

  23. Fulkerson, B., Vedaldi, A., Soatto, S.: Class Segmentation and Object Localization with Superpixel Neighborhoods. In: Proceedings of the International Conference on Computer Vision (2009)

    Google Scholar 

  24. Grimson, W., Lozano-Perez, T.: Recognition and localization of overlapping parts in two and three dimensions. In: Proceedings of the International Conference on Robotics and Automation, pp. 61–66 (1985)

    Google Scholar 

  25. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  26. Hetzel, G., Leibe, B., Levi, P., Schiele, B.: 3d object recognition from range images using local feature histograms. IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (2001)

    Google Scholar 

  27. Hoeim, D., Rother, C., Winn, J.: 3d layout crf for multi-view object class recognition and segmentation. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  28. Johnson, A., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 5 (1999)

    Google Scholar 

  29. Kadir, T., Brady, M.: Scale, saliency and image description. International Journal of Computer Vision 45(2), 83–105 (2001)

    Article  MATH  Google Scholar 

  30. Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3d shape descriptors. In: Proceedings of the Symposium on Geometry Processing (2003)

    Google Scholar 

  31. Koenderink, J., van Doorn, A.: The singularities of the visual mappings. Biological Cybernetics 24(1), 51–59 (1976)

    Article  MATH  Google Scholar 

  32. Koenderink, J.J., van Doorn, A.J.: The internal representation of solid shape with respect to vision. Biological cybernetics 32(4), 211–216 (1979)

    Article  MATH  Google Scholar 

  33. Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  34. Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: Proceedings of the British Machine Vision Conference, vol. 2, pp. 959–968 (2004)

    Google Scholar 

  35. Leibe, B., Schiele, B.: Scale Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 145–153. Springer, Heidelberg (2004)

    Google Scholar 

  36. Li, X., Guskov, I., Barhak, J.: Feature-based alignment of range scan data to cad model. International Journal of Shape Modeling 13, 1–23 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  37. Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3d feature maps. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  38. Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, pp. 1150–1157 (1999)

    Google Scholar 

  39. Lowe, D.G.: Three-dimensional object recognition from single two-dimensional images. Artificial Intelligence 31, 355–395 (1987)

    Article  Google Scholar 

  40. Lowe, D.G.: Local feature view clustering for 3d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2001)

    Google Scholar 

  41. Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. Freeman, New York (1982)

    Google Scholar 

  42. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, pp. 384–393 (2002)

    Google Scholar 

  43. Mei, L., Sun, M., Carter, K., Hero, A., Savarese, S.: Object pose classification from short video sequences. In: Proceedings of the British Machine Vision Conference (2009)

    Google Scholar 

  44. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  45. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)

    Article  Google Scholar 

  46. Murase, H., Nayar, S.K.: Learning by a generation approach to appearance-based object recognition. In: Proceedings of the International Conference on Pattern Recognition (1996)

    Google Scholar 

  47. Nayar, S.K., Nene, S.A., Murase, H.: Real-time 100 object recognition system. In: Proceedings of the International Conference on Robotics and Automation, pp. 2321–2325 (1996)

    Google Scholar 

  48. Ng, J., Gong, S.: Multi-view face detection and pose estimation using a composite support vector machine across the view sphere. In: Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (1999)

    Google Scholar 

  49. Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  50. Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision 66(3), 231–259 (2006)

    Article  Google Scholar 

  51. Rothwell, C.A., Zisserman, A., Forsyth, D.A., Mundy, J.L., Joseph, L.: Canonical frames for planar object recognition. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588. Springer, Heidelberg (1992)

    Google Scholar 

  52. Ruiz-Correa, S., Shapiro, L., Meila, M.: A new signature-based method for efficient 3-d object recognition. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2001)

    Google Scholar 

  53. Russell, B., Torralba, A., Murphy, K., Freeman, W.: Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision (in press)

    Google Scholar 

  54. Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: Proceedings of the International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  55. Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 602–615. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  56. Schneiderman, H., Kanade, T.: A statistical approach to 3D object detection applied to faces and cars. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 746–751 (2000)

    Google Scholar 

  57. Seitz, S., Dyer, C.: View morphing. In: Proceedings of the ACM SIGGRAPH, pp. 21–30 (1996)

    Google Scholar 

  58. Shimshoni, I., Ponce, J.: Finite-resolution aspect graphs of polyhedral objects. IEEE Transaction on Pattern Analysis Machine Intelligence 19(4), 315–327 (1997)

    Article  Google Scholar 

  59. Stewman, J., Bowyer, K.: Learning graph matching. In: Proceedings of the International Conference on Computer Vision, pp. 494–500 (1988)

    Google Scholar 

  60. Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: Proceedings of International Conference on Computer Vision (2009)

    Google Scholar 

  61. Sun, M., Su, H., Savarese, S., Fei-Fei, L.: A multi-view probabilistic model for 3d object classes. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  62. Tangelder, J.W.H., Veltkamp, R.C.: A survey of content based 3d shape retrieval methods. In: Proceedings of Shape Modeling Applications, pp. 145–156 (2004)

    Google Scholar 

  63. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 1589–1596 (2006)

    Google Scholar 

  64. Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2004)

    Google Scholar 

  65. Ullman, S., Basri, R.: Recognition by linear combination of models. Technical Report, Cambridge, MA, USA (1989)

    Google Scholar 

  66. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)

    Google Scholar 

  67. Weber, M., Einhäuser, W., Welling, M., Perona, P.: Viewpoint-invariant learning and detection of human heads. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (2000)

    Google Scholar 

  68. Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 101–108. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  69. Xiao, J., Chen, J., Yeung, D.Y., Quan, L.: Structuring visual words in 3d for arbitrary-view object localization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 725–737. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  70. Yan, P., Khan, D., Shah, M.: 3d model based object class detection in an arbitrary view. In: Proceedings of the International Conference on Computer Vision (2007)

    Google Scholar 

  71. Yan Li Leon Gu, T.K.: A robust shape model for multi-view car alignment. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  72. Zhang, Z.: Floatboost learning and statistical face detection. IEEE Transaction on Pattern Analysis Machine Intelligence 26(9), 1112–1123 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Savarese, S., Fei-Fei, L. (2010). Multi-view Object Categorization and Pose Estimation. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds) Computer Vision. Studies in Computational Intelligence, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12848-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12848-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12847-9

  • Online ISBN: 978-3-642-12848-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics