Learning Where to Classify in Multi-view Semantic Segmentation

Riemenschneider, Hayko; Bódis-Szomorú, András; Weissenberg, Julien; Van Gool, Luc

doi:10.1007/978-3-319-10602-1_34

Learning Where to Classify in Multi-view Semantic Segmentation

Hayko Riemenschneider¹⁹,
András Bódis-Szomorú¹⁹,
Julien Weissenberg¹⁹ &
…
Luc Van Gool^19,20

Conference paper

22k Accesses
41 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8693))

Abstract

There is an increasing interest in semantically annotated 3D models, e.g. of cities. The typical approaches start with the semantic labelling of all the images used for the 3D model. Such labelling tends to be very time consuming though. The inherent redundancy among the overlapping images calls for more efficient solutions. This paper proposes an alternative approach that exploits the geometry of a 3D mesh model obtained from multi-view reconstruction. Instead of clustering similar views, we predict the best view before the actual labelling. For this we find the single image part that bests supports the correct semantic labelling of each face of the underlying 3D mesh. Moreover, our single-image approach may surprise because it tends to increase the accuracy of the model labelling when compared to approaches that fuse the labels from multiple images. As a matter of fact, we even go a step further, and only explicitly label a subset of faces (e.g. 10%), to subsequently fill in the labels of the remaining faces. This leads to a further reduction of computation time, again combined with a gain in accuracy. Compared to a process that starts from the semantic labelling of the images, our method to semantically label 3D models yields accelerations of about 2 orders of magnitude. We tested our multi-view semantic labelling on a variety of street scenes.

Download to read the full chapter text

Chapter PDF

References

Gammeter, S., Quack, T., Tingdahl, D., van Gool, L.: Size does matter: Improving object recognition and 3D reconstruction with cross-media analysis of image clusters. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 734–747. Springer, Heidelberg (2010)
Chapter Google Scholar
Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008)
Chapter Google Scholar
Ladicky, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction. Intern. Journal of Computer Vision (IJCV) 100(2), 122–133 (2012)
Google Scholar
Sengupta, S., Sturgees, P., Ladicky, L., Torr, P.: Automatic dense visual semantic mapping from street-level imagery. In: Proc. Intern. Conf. on Intelligent Robots Systems, IROS (2012)
Google Scholar
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, Where and How Many? Combining Object Detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Chapter Google Scholar
Tighe, J., Lazebnik, S.: SuperParsing: Scalable Nonparametric Image Parsing with Superpixels. Intern. Journal of Computer Vision (IJCV) 101(2), 329–349 (2012)
Google Scholar
Koehler, O., Reid, I.: Efficient 3D Scene Labeling Using Fields of Trees. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2013)
Google Scholar
Sengupta, S., Valentin, J., Warrell, J., Shahrokni, A., Torr, P.: Mesh Based Semantic Modelling for Indoor and Outdoor Scenes. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2013)
Google Scholar
Roig, G., Boix, X., Ramos, S., de Nijs, R., Van Gool, L.: Active MAP Inference in CRFs for Efficient Semantic Segmentation. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2013)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC 2012) Results (2012), http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: textonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Chapter Google Scholar
Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2008)
Google Scholar
Kohli, P., Ladicky, L., Torr, P.: Robust higher order potentials for enforcing label consistency. Intern. Journal of Computer Vision (IJCV) 82(3), 302–324 (2009)
Google Scholar
Ladicky, L., Russell, C., Kohli, P., Torr, P.: Associative Hierarchical CRFs for Object Class Image Segmentation. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2009)
Google Scholar
Kluckner, S., Mauthner, T., Roth, P., Bischof, H.: Semantic image classification using consistent regions and individual context. In: Proc. British Machine Vision Conference, BMVC (2009)
Google Scholar
Gould, S., Rodgers, J., Cohen, D., Koller, D., Elidan, G.: Multi-class segmentation with relative location prior. Intern. Journal of Computer Vision (IJCV) 80(3), 300–316 (2008)
Article Google Scholar
Munoz, D., Bagnell, J.A., Hebert, M.: Stacked Hierarchical Labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010)
Chapter Google Scholar
Kraehenbuehl, P., Koltun, V.: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In: Advances in Neural Information Processing Systems, NIPS (2011)
Google Scholar
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)
Chapter Google Scholar
Berg, A., Grabler, F., Malik, J.: Parsing images of architectural scenes. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2007)
Google Scholar
Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2009)
Google Scholar
Riemenschneider, H., Krispel, U., Thaller, W., Donoser, M., Havemann, S., Fellner, D., Bischof, H.: Irregular lattices for complex shape grammar facade parsing. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Martinović, A., Mathias, M., Weissenberg, J., Van Gool, L.: A Three-Layered Approach to Facade Parsing. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 416–429. Springer, Heidelberg (2012)
Chapter Google Scholar
Teboul, O., Simon, L., Koutsourakis, P., Paragios, N.: Segmentation of building facades using procedural shape prior. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2010)
Google Scholar
Simon, L., Teboul, O., Koutsourakis, P., Van Gool, L., Paragiosn, N.: Parameter-free/pareto-driven procedural 3d reconstruction of buildings from ground-level sequences. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Müller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L.: Procedural modeling of buildings. In: Proc. of the Intern. Conf. on Computer graphics and interactive techniques, SIGGRAPH (2006)
Google Scholar
Floros, G., Leibe, B.: Joint 2D-3D Temporally Consistent Semantic Segmentation of Street Scenes. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Google Scholar
Zhang, C., Wang, L., Yang, R.: Semantic segmentation of urban scenes using dense depth maps. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 708–721. Springer, Heidelberg (2010)
Chapter Google Scholar
Gallup, D., Frahm, J., Pollefeys, M.: Piecewise planar and non-planar stereo for urban scene reconstruction. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2010)
Google Scholar
Munoz, D., Bagnell, J.A., Hebert, M.: Co-inference for Multi-modal Scene Analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 668–681. Springer, Heidelberg (2012)
Chapter Google Scholar
Haene, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D Scene Reconstruction and Class Segmentation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2013)
Google Scholar
Kim, B., Kohli, P., Savarese, S.: 3D Scene Understanding by Voxel-CRF. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2013)
Google Scholar
Furukawa, Y., Curless, B., Seitz, S., Szeliski, R.: Towards Internet-scale Multi-view Stereos. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2010)
Google Scholar
Mauro, M., Riemenschneider, H., Van Gool, L., Leonardi, R.: Overlapping camera clustering through dominant sets for scalable 3D reconstruction. In: Proc. British Machine Vision Conference, BMVC (2013)
Google Scholar
Debevec, P., Borshukov, G., Yu, Y.: Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping. In: Eurographics Rendering Workshop (1998)
Google Scholar
Laveau, S., Faugeras, O.: 3-D scene representation as a collection of images. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (1994)
Google Scholar
Williams, L., Chen, E.: View interpolation for image synthesis. In: Proc. of the Intern. Conf. on Computer graphics and interactive techniques, SIGGRAPH (1993)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor Segmentation and Support Inference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Chapter Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Intern. Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)
Google Scholar
Wu, C.: Towards linear-time incremental structure from motion. In: Proc. of Intern. Symp. on 3D Data, Processing, Visualiz. and Transmission (3DPVT) (2013)
Google Scholar
Labatut, P., Pons, J., Keriven, R.: Efficient Multi-View Reconstruction of Large-Scale Scenes using Interest Points, Delaunay Triangulation and Graph Cuts. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2007)
Google Scholar
Hiep, V., Labatut, P., Pons, J., Keriven, R.: High Accuracy and Visibility-Consistent Dense Multi-view Stereo. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 34(5), 889–901 (2012)
Article Google Scholar
Jancosek, M., Pajdla, T.: Multi-View Reconstruction Preserving Weakly-Supported Surfaces. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2011)
Google Scholar
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Intern. Journal of Computer Vision (IJCV) 62(1-2), 61–81 (2005)
Google Scholar
Geusebroek, J., Smeulders, A., van de Weijer, J.: Fast Anisotropic Gauss Filtering. IEEE Trans. on Image Processing (TIP) 12(8), 938–943 (2003)
Article MATH MathSciNet Google Scholar
Kluckner, S., Mauthner, T., Roth, P.M., Bischof, H.: Semantic classification in aerial imagery by integrating appearance and height information. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009, Part II. LNCS, vol. 5995, pp. 477–488. Springer, Heidelberg (2010)
Chapter Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 23(11), 1222–1239 (2001)
Article Google Scholar
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 26(9), 124–1137 (2004)
Google Scholar
Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 26(2), 147–159 (2004)
Article Google Scholar
Amit, Y., August, G., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1996)
Article Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision Laboratory, ETH Zurich, Switzerland
Hayko Riemenschneider, András Bódis-Szomorú, Julien Weissenberg & Luc Van Gool
K.U. Leuven, Belgium
Luc Van Gool

Authors

Hayko Riemenschneider
View author publications
You can also search for this author in PubMed Google Scholar
András Bódis-Szomorú
View author publications
You can also search for this author in PubMed Google Scholar
Julien Weissenberg
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Riemenschneider, H., Bódis-Szomorú, A., Weissenberg, J., Van Gool, L. (2014). Learning Where to Classify in Multi-view Semantic Segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-10602-1_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics