Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling

Uhrig, Jonas; Cordts, Marius; Franke, Uwe; Brox, Thomas

doi:10.1007/978-3-319-45886-1_2

Jonas Uhrig^15,16,
Marius Cordts^15,17,
Uwe Franke¹⁵ &
…
Thomas Brox¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9796))

Included in the following conference series:

German Conference on Pattern Recognition

2917 Accesses
73 Citations

Abstract

Recent approaches for instance-aware semantic labeling have augmented convolutional neural networks (CNNs) with complex multi-task architectures or computationally expensive graphical models. We present a method that leverages a fully convolutional network (FCN) to predict semantic labels, depth and an instance-based encoding using each pixel’s direction towards its corresponding instance center. Subsequently, we apply low-level computer vision techniques to generate state-of-the-art instance segmentation on the street scene datasets KITTI and Cityscapes. Our approach outperforms existing works by a large margin and can additionally predict absolute distances of individual instances from a monocular image as well as a pixel-level semantic labeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. Trans. PAMI 33(5), 898–916 (2011)
Article Google Scholar
Arbelez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR (2014)
Google Scholar
Chen, L.C., Fidler, S., Urtasun, R.: Beat the MTurkers: automatic image labeling from weak 3d supervision. In: CVPR (2014)
Google Scholar
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)
Google Scholar
Chen, Y.T., Liu, X., Yang, M.H.: Multi-instance object segmentation with occlusion handling. In: CVPR (2015)
Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The Cityscapes Dataset for semantic urban scene understanding. In: CVPR (2016)
Google Scholar
Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: CVPR (2015)
Google Scholar
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)
Google Scholar
Everingham, M., Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2009)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Hariharan, B., Arbelez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR (2015)
Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hosang, J., Benenson, R., Dollr, P., Schiele, B.: What makes for effective detection proposals? Trans. PAMI 38(4), 814–830 (2016)
Google Scholar
Kirillov, A., Schlesinger, D., Forkel, W., Zelenin, A., Zheng, S., Torr, P.H.S., Rother, C.: Efficient likelihood learning of a generic CNN-CRF model for semantic segmentation. In: [cs.CV] (2015). arXiv:1511.05067v2
Kumar, M.P., Ton, P.H.S., Zisserman, A.: Obj Cut. In: CVPR (2005)
Google Scholar
Liang, X., Wei, Y., Shen, X., Yang, J., Lin, L., Yan, S.: Proposal-free network for instance-level object segmentation. In: [cs.CV] (2015). arXiv:1509.02636v2
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Papandreou, G., Chen, L., Murphy, K., Yuille, A.L.: Weakly- and semi-supervised learning of a DCNN for semantic image segmentation. In: ICCV (2015)
Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Ren, S., He, K., Girshick, R.B., Zhang, X., Sun, J.: Object detection networks on convolutional feature maps. In: [cs.CV] (2015). arXiv:1504.06066v1
Romera-Paredes, B., Torr, P.H.S.: Recurrent instance segmentation. In: [cs.CV] (2015). arXiv:1511.08250v2
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015)
Chapter Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Silberman, N., Sontag, D., Fergus, R.: Instance segmentation of indoor scenes using a coverage loss. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 616–631. Springer, Heidelberg (2014)
Google Scholar
Tighe, J., Niethammer, M., Lazebnik, S.: Scene parsing with object instances and occlusion ordering. In: CVPR (2014)
Google Scholar
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: CVPR (2015)
Google Scholar
Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.: Layered object models for image segmentation. Trans. PAMI 34(9), 1731–1743 (2012)
Article Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)
Google Scholar
Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R.: Monocular object instance segmentation and depth ordering with cnns. In: ICCV (2015)
Google Scholar
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation with deep densely connected MRFs. In: CVPR (2016)
Google Scholar
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: ICCV (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Daimler AG R&D, Stuttgart, Germany
Jonas Uhrig, Marius Cordts & Uwe Franke
University of Freiburg, Freiburg im Breisgau, Germany
Jonas Uhrig & Thomas Brox
TU Darmstadt, Darmstadt, Germany
Marius Cordts

Authors

Jonas Uhrig
View author publications
You can also search for this author in PubMed Google Scholar
Marius Cordts
View author publications
You can also search for this author in PubMed Google Scholar
Uwe Franke
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brox
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonas Uhrig .

Editor information

Editors and Affiliations

University of Hannover, Hannover, Germany
Bodo Rosenhahn
Max Planck Institute for Informatics, Saarbrücken, Germany
Bjoern Andres

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uhrig, J., Cordts, M., Franke, U., Brox, T. (2016). Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling. In: Rosenhahn, B., Andres, B. (eds) Pattern Recognition. GCPR 2016. Lecture Notes in Computer Science(), vol 9796. Springer, Cham. https://doi.org/10.1007/978-3-319-45886-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-45886-1_2
Published: 27 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45885-4
Online ISBN: 978-3-319-45886-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics