Abstract
There is an increasing interest in semantically annotated 3D models, e.g. of cities. The typical approaches start with the semantic labelling of all the images used for the 3D model. Such labelling tends to be very time consuming though. The inherent redundancy among the overlapping images calls for more efficient solutions. This paper proposes an alternative approach that exploits the geometry of a 3D mesh model obtained from multi-view reconstruction. Instead of clustering similar views, we predict the best view before the actual labelling. For this we find the single image part that bests supports the correct semantic labelling of each face of the underlying 3D mesh. Moreover, our single-image approach may surprise because it tends to increase the accuracy of the model labelling when compared to approaches that fuse the labels from multiple images. As a matter of fact, we even go a step further, and only explicitly label a subset of faces (e.g. 10%), to subsequently fill in the labels of the remaining faces. This leads to a further reduction of computation time, again combined with a gain in accuracy. Compared to a process that starts from the semantic labelling of the images, our method to semantically label 3D models yields accelerations of about 2 orders of magnitude. We tested our multi-view semantic labelling on a variety of street scenes.
Chapter PDF
References
Gammeter, S., Quack, T., Tingdahl, D., van Gool, L.: Size does matter: Improving object recognition and 3D reconstruction with cross-media analysis of image clusters. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 734–747. Springer, Heidelberg (2010)
Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008)
Ladicky, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction. Intern. Journal of Computer Vision (IJCV) 100(2), 122–133 (2012)
Sengupta, S., Sturgees, P., Ladicky, L., Torr, P.: Automatic dense visual semantic mapping from street-level imagery. In: Proc. Intern. Conf. on Intelligent Robots Systems, IROS (2012)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, Where and How Many? Combining Object Detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Tighe, J., Lazebnik, S.: SuperParsing: Scalable Nonparametric Image Parsing with Superpixels. Intern. Journal of Computer Vision (IJCV) 101(2), 329–349 (2012)
Koehler, O., Reid, I.: Efficient 3D Scene Labeling Using Fields of Trees. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2013)
Sengupta, S., Valentin, J., Warrell, J., Shahrokni, A., Torr, P.: Mesh Based Semantic Modelling for Indoor and Outdoor Scenes. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2013)
Roig, G., Boix, X., Ramos, S., de Nijs, R., Van Gool, L.: Active MAP Inference in CRFs for Efficient Semantic Segmentation. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2013)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC 2012) Results (2012), http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: textonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2008)
Kohli, P., Ladicky, L., Torr, P.: Robust higher order potentials for enforcing label consistency. Intern. Journal of Computer Vision (IJCV) 82(3), 302–324 (2009)
Ladicky, L., Russell, C., Kohli, P., Torr, P.: Associative Hierarchical CRFs for Object Class Image Segmentation. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2009)
Kluckner, S., Mauthner, T., Roth, P., Bischof, H.: Semantic image classification using consistent regions and individual context. In: Proc. British Machine Vision Conference, BMVC (2009)
Gould, S., Rodgers, J., Cohen, D., Koller, D., Elidan, G.: Multi-class segmentation with relative location prior. Intern. Journal of Computer Vision (IJCV) 80(3), 300–316 (2008)
Munoz, D., Bagnell, J.A., Hebert, M.: Stacked Hierarchical Labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010)
Kraehenbuehl, P., Koltun, V.: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In: Advances in Neural Information Processing Systems, NIPS (2011)
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)
Berg, A., Grabler, F., Malik, J.: Parsing images of architectural scenes. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2007)
Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2009)
Riemenschneider, H., Krispel, U., Thaller, W., Donoser, M., Havemann, S., Fellner, D., Bischof, H.: Irregular lattices for complex shape grammar facade parsing. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Martinović, A., Mathias, M., Weissenberg, J., Van Gool, L.: A Three-Layered Approach to Facade Parsing. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 416–429. Springer, Heidelberg (2012)
Teboul, O., Simon, L., Koutsourakis, P., Paragios, N.: Segmentation of building facades using procedural shape prior. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2010)
Simon, L., Teboul, O., Koutsourakis, P., Van Gool, L., Paragiosn, N.: Parameter-free/pareto-driven procedural 3d reconstruction of buildings from ground-level sequences. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Müller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L.: Procedural modeling of buildings. In: Proc. of the Intern. Conf. on Computer graphics and interactive techniques, SIGGRAPH (2006)
Floros, G., Leibe, B.: Joint 2D-3D Temporally Consistent Semantic Segmentation of Street Scenes. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2012)
Zhang, C., Wang, L., Yang, R.: Semantic segmentation of urban scenes using dense depth maps. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 708–721. Springer, Heidelberg (2010)
Gallup, D., Frahm, J., Pollefeys, M.: Piecewise planar and non-planar stereo for urban scene reconstruction. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2010)
Munoz, D., Bagnell, J.A., Hebert, M.: Co-inference for Multi-modal Scene Analysis. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 668–681. Springer, Heidelberg (2012)
Haene, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D Scene Reconstruction and Class Segmentation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2013)
Kim, B., Kohli, P., Savarese, S.: 3D Scene Understanding by Voxel-CRF. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2013)
Furukawa, Y., Curless, B., Seitz, S., Szeliski, R.: Towards Internet-scale Multi-view Stereos. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2010)
Mauro, M., Riemenschneider, H., Van Gool, L., Leonardi, R.: Overlapping camera clustering through dominant sets for scalable 3D reconstruction. In: Proc. British Machine Vision Conference, BMVC (2013)
Debevec, P., Borshukov, G., Yu, Y.: Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping. In: Eurographics Rendering Workshop (1998)
Laveau, S., Faugeras, O.: 3-D scene representation as a collection of images. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (1994)
Williams, L., Chen, E.: View interpolation for image synthesis. In: Proc. of the Intern. Conf. on Computer graphics and interactive techniques, SIGGRAPH (1993)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor Segmentation and Support Inference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Intern. Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)
Wu, C.: Towards linear-time incremental structure from motion. In: Proc. of Intern. Symp. on 3D Data, Processing, Visualiz. and Transmission (3DPVT) (2013)
Labatut, P., Pons, J., Keriven, R.: Efficient Multi-View Reconstruction of Large-Scale Scenes using Interest Points, Delaunay Triangulation and Graph Cuts. In: Proc. IEEE Intern. Conf. on Computer Vision, ICCV (2007)
Hiep, V., Labatut, P., Pons, J., Keriven, R.: High Accuracy and Visibility-Consistent Dense Multi-view Stereo. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 34(5), 889–901 (2012)
Jancosek, M., Pajdla, T.: Multi-View Reconstruction Preserving Weakly-Supported Surfaces. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2011)
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Intern. Journal of Computer Vision (IJCV) 62(1-2), 61–81 (2005)
Geusebroek, J., Smeulders, A., van de Weijer, J.: Fast Anisotropic Gauss Filtering. IEEE Trans. on Image Processing (TIP) 12(8), 938–943 (2003)
Kluckner, S., Mauthner, T., Roth, P.M., Bischof, H.: Semantic classification in aerial imagery by integrating appearance and height information. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009, Part II. LNCS, vol. 5995, pp. 477–488. Springer, Heidelberg (2010)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 23(11), 1222–1239 (2001)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 26(9), 124–1137 (2004)
Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 26(2), 147–159 (2004)
Amit, Y., August, G., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Riemenschneider, H., Bódis-Szomorú, A., Weissenberg, J., Van Gool, L. (2014). Learning Where to Classify in Multi-view Semantic Segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-10602-1_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)