Abstract
This paper presents a novel approach for labeling objects based on multiple spatially-registered images of a scene. We argue that such a multi-view labeling approach is a better fit for applications such as robotics and surveillance than traditional object recognition where only a single image of each scene is available. To encourage further study in the area, we have collected a data set of well-registered imagery for many indoor scenes and have made this data publicly available. Our multi-view labeling approach is capable of improving the results of a wide variety of image-based classifiers, and we demonstrate this by producing scene labelings based on the output of both the Deformable Parts Model of [1] as well as a method for recognizing object contours which is similar to chamfer matching. Our experimental results show that labeling objects based on multiple viewpoints leads to a significant improvement in performance when compared with single image labeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proc. IEEE CVPR (2008)
Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: Proc. IEEE ICCV (2009)
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Gool, L.V.: Using multi-view recognition and meta-data annotation to guide a robot’s attention. Int. J. Robotics Research (2009)
Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3d feature maps. In: Proc. IEEE CVPR (2008)
Whaite, P., Ferrie, F.: Autonomous exploration: Driven by uncertainty. Technical Report TR-CIM-93-17, McGill U. CIM (1994)
Laporte, C., Arbel, T.: Efficient discriminant viewpoint selection for active bayesian recognition. Int. J. Computer Vision 68, 1573–1405 (2006)
Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: Proc. IEEE CVPR (2010)
Wojek, C., Roth, S., Schindler, K., Schiele, B.: Monocular 3D scene modeling and inference: Understanding multi-object traffic scenes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 467–481. Springer, Heidelberg (2010)
Coates, A., Ng, A.Y.: Multi-camera object detection for robotics. In: Proc. IEEE Int. Conf. Robotics and Automation (2010)
Leibe, B., Schindler, K., Cornelis, N., Gool, L.V.: Coupled object detection and tracking from static cameras and moving vehicles. IEEE Trans. Pattern Analysis Machine Intelligence (2008)
Wojek, C., Walk, S., Schiele, B.: Multi-cue onboard pedestrian detection. In: CVPR, pp. 1–8 (2009)
Kragic, D., Björkman, M.: Strategies for object manipulation using foveal and peripheral vision. In: Proc. IEEE ICVS (2006)
Gould, S., Arfvidsson, J., Kaehler, A., Sapp, B., Meissner, M., Bradski, G., Baumstarck, P., Chung, S., Ng, A.: Peripheral-foveal vision for real-time object recognition and tracking in video. In: Proc. IJCAI (2007)
Rusu, R.B., Holzbach, A., Beetz, M., Bradski, G.: Detecting and segmenting objects for mobile manipulation. In: Proc. ICCV, S3DV Workshop (2009)
Ye, Y., Tsotsos, J.K.: Sensor planning for 3d object search. Computer Vision and Image Understanding 73, 145–168 (1999)
Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: Proc. IEEE ICCV (2007)
Viksten, F., Forssen, P.E., Johansson, B., Moe, A.: Comparison of local image descriptors for full 6 degree-of-freedom pose estimation. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA (2009)
Forssen, P.E., Meger, D., Lai, K., Helmer, S., Little, J.J., Lowe, D.G.: Informed visual search: Combining attention and object recognition. In: ICRA, pp. 935–942 (2008)
LeCun, Y., Huang, F., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2004)
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Proc. of the 10th IEEE International Conference on Computer Vision, ICCV (2005)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, vol. 2, pp. 886–893 (2005)
Shotton, J., Blake, A., Cipolla, R.: Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1270–1281 (2008)
Fiala, M.: Artag, a fiducial marker system using digital techniques. In: CVPR 2005, vol. 1, pp. 590–596 (2005)
Poupyrev, I., Kato, H., Billinghurst, M.: Artoolkit user manual, version 2.33. Human Interface Technology Lab, University of Washington (2000)
Sattar, J., Bourque, E., Giguere, P., Dudek, G.: Fourier tags: Smoothly degradable fiducial markers for use in human-robot interaction. In: Fourth Canadian Conference on Computer and Robot Vision (CRV), Montreal, Quebec, Canada, pp. 165–174 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Helmer, S., Meger, D., Muja, M., Little, J.J., Lowe, D.G. (2011). Multiple Viewpoint Recognition and Localization. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6492. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19315-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-19315-6_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19314-9
Online ISBN: 978-3-642-19315-6
eBook Packages: Computer ScienceComputer Science (R0)