What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Mayer, Nikolaus; Ilg, Eddy; Fischer, Philipp; Hazirbas, Caner; Cremers, Daniel; Dosovitskiy, Alexey; Brox, Thomas

doi:10.1007/s11263-018-1082-6

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Published: 02 April 2018

Volume 126, pages 942–960, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Nikolaus Mayer¹,
Eddy Ilg¹,
Philipp Fischer¹,
Caner Hazirbas²,
Daniel Cremers²,
Alexey Dosovitskiy¹ &
…
Thomas Brox¹

3365 Accesses
120 Citations
29 Altmetric
1 Mention
Explore all metrics

Abstract

The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks. We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

A review of object detection based on deep learning

Article 12 June 2020

Deep learning models for digital image processing: a review

Article 07 January 2024

Notes

Clearly, the engineering and optimization of tuning parameters requires some training data. For a long time, test data was misused for this purpose.
There is also the extreme case of handcrafting entire scenes and all their objects. We do not consider this option here because it yields even far less data for the same effort.
https://3dwarehouse.sketchup.com/.
Made by ImageMagick’s random “plasma” generator.
Visual effects artists have long been using the knowledge that “perfect” pictures are not perceived as “real” by humans; hence artificial film grain, chromatic aberrations, and lens flare effects are applied in movies and computer games.
Bumblebee2 BB2-08S2C.

References

Aubry, M., Maturana, D., Efros, A., Russell, B., & Sivic, J. (2014). Seeing 3d chairs: Exemplar part-based 2d–3d alignment using a large dataset of cad models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. IJCV, 92, 1–31.
Article Google Scholar
Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance of optical flow techniques. IJCV, 12, 43–77.
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, ICML ’09 (pp. 41–48).
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Computer vision-ECCV, 2004 (pp. 25–36).
Brox, T., & Malik, J. (2011). Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33, 500–513.
Article Google Scholar
Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In European conference on computer vision (ECCV).
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., et al. (2015). ShapeNet: An Information-rich 3D model repository. Tech. Rep. ArXiv preprint arXiv:1512.03012.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In IEEE conference on computer vision and pattern recognition (CVPR).
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In IEEE conference on computer vision and pattern recognition (CVPR).
de Souza, C. R., Gaidon, A., Cabon, Y., & Peña, A. M. L. (2017). Procedural generation of videos to train deep action recognition networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017 (pp. 2594–2604).
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR).
Dosovitskiy, A., Fischer, P., Ilg E, Häusser, P., Hazırbaş, C., Golkov, V., et al. (2015). FlowNet: Learning optical flow with convolutional networks. In IEEE international conference on computer vision (ICCV).
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). Carla: An open urban driving simulator. In Conference on robot learning (pp. 1–16).
Dwibedi, D., Misra, I., & Hebert, M. (2017). Cut, paste and learn: Surprisingly easy synthesis for instance detection. In The IEEE international conference on computer vision (ICCV).
Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Conference on neural information processing systems (NIPS).
Elman, J. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48(1), 71–99.
Article Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In IEEE conference on computer vision and pattern recognition (CVPR).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE conference on computer vision and pattern recognition (CVPR).
Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). Understanding realworld indoor scenes with synthetic data. In IEEE conference on computer vision and pattern recognition (CVPR).
Handa, A., Whelan, T., McDonald, J., & Davison, A. (2014). A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In IEEE international conference on robotics and automation (ICRA).
Heeger, D. J. (1987). Model for the extraction of image flow. JOSA A, 4, 1455–1471.
Article Google Scholar
Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.
Article Google Scholar
Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In IEEE international conference on computer vision (ICCV).
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE conference on computer vision and pattern recognition (CVPR).
Klein, G., & Murray, D. W. (2010). Simulating low-cost cameras for augmented reality compositing. IEEE Transactions on Visualization and Computer Graphics, 16, 369–380.
Article Google Scholar
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (ECCV).
Mac Aodha, O., Brostow, G. J., Pollefeys, M. (2010). Segmenting video into classes of algorithm-suitability. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1054–1061). IEEE.
Mac Aodha, O., Humayun, A., Pollefeys, M., & Brostow, G. J. (2013). Learning a confidence measure for optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1107–1120.
Article Google Scholar
Mac Aodha, O., Brostow, G. J., Pollefeys, M. (2010). Segmenting video into classes of algorithm-suitability. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1054–1061). IEEE.
Mac Aodha, O., Humayun, A., Pollefeys, M., & Brostow, G. J. (2013). Learning a confidence measure for optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1107–1120.
Article Google Scholar
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE conference on computer vision and pattern recognition (CVPR).
McCane, B., Novins, K., Crannitch, D., & Galvin, B. (2001). On benchmarking optical flow. Computer Vision and Image Understanding, 84, 126–143.
Article MATH Google Scholar
McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2017). Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In The IEEE international conference on computer vision (ICCV).
Meister, S., & Kondermann, D. (2011). Real versus realistically rendered scenes for optical flow evaluation. In ITG conference on electronic media technology (CEMT).
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In IEEE conference on computer vision and pattern recognition (CVPR).
Movshovitz-Attias, Y., Kanade, T., & Sheikh, Y. (2016). How useful is photo-realistic rendering for visual learning? In ECCV workshops.
Onkarappa, N., & Sappa, A. D. (2014). Speed and texture: An empirical study on optical-flow accuracy in ADAS scenarios. IEEE Transactions on Intelligent Transportation Systems, 15, 136–147.
Article Google Scholar
Otte, M., & Nagel, H. H. (1995). Estimation of optical flow based on higher-order spatiotemporal derivatives in interlaced and non-interlaced image sequences. Artificial Intelligence, 78, 5–43.
Article Google Scholar
Qiu, W., & Yuille, A. L. (2016). Unrealcv: Connecting computer vision to unreal engine. In Computer Vision-ECCV 2016 Workshops-Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part III (pp. 909–916)
Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In International conference on computer vision (ICCV).
Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV).
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3234–3243)
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G, Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern recognition.
Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In European conference on computer vision (ECCV).
Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., & Funkhouser, T. (2017). Semantic scene completion from a single depth image. In IEEE conference on computer vision and pattern recognition (CVPR).
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In International conference on intelligent robot systems (IROS).
Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for CNN: Viewpoint estimation in images using cnns trained with rendered 3d model views. In IEEE international conference on computer vision (ICCV).
Taylor, G. R., Chosak, A. J., & Brewer, P. C. (2007). Ovvv: Using virtual worlds to design and evaluate surveillance systems. In IEEE conference on computer vision and pattern recognition (CVPR).
Vaudrey, T., Rabe, C., Klette, R., & Milburn, J. (2008). Differences between stereo and motion behaviour on synthetic and real-world stereo sequences. In International conference on image and vision computing.
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J. (2015) 3d shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition (CVPR).
Wulff, J., Butler, D. J., Stanley, G. B., & Black, M. J. (2012). Lessons and insights from creating a synthetic optical flow benchmark. In ECCV Workshop on unsolved problems in optical flow and stereo estimation.
Xiao, J., Owens, A., Torralba, A. (2013). Sun3d: A database of big spaces reconstructed using sfm and object labels. In IEEE international conference on computer vision (ICCV).
Zhang, Y., Qiu, W., Chen, Q., Hu, X., & Yuille, A. L. (2016). Unrealstereo: A synthetic dataset for analyzing stereo vision. Tech. Rep. ArXiv preprint arXiv:1612.04647.
Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., & Funkhouser, T. (2017). Physically-based rendering for indoor scene understanding using convolutional neural networks. In IEEE Conference on computer vision and pattern recognition (CVPR).

Download references

Author information

Authors and Affiliations

University of Freiburg, Freiburg im Breisgau, Germany
Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Alexey Dosovitskiy & Thomas Brox
Technical University of Munich, Munich, Germany
Caner Hazirbas & Daniel Cremers

Authors

Nikolaus Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Eddy Ilg
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Caner Hazirbas
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cremers
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Dosovitskiy
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brox
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaus Mayer.

Additional information

Communicated by Adrien Gaidon, Florent Perronnin and Antonio Lopez.

We acknowledge funding by the ERC Starting Grant VideoLearn, the ERC Consolidator Grant “3D Reloaded”, the DFG Grant BR-3815/7-1, the DFG Grant CR 250/17-1, and the EU Horizon2020 project TrimBot2020. We thank Benjamin Ummenhofer for code that kick-started the creation of our 3D datasets.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5088 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mayer, N., Ilg, E., Fischer, P. et al. What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?. Int J Comput Vis 126, 942–960 (2018). https://doi.org/10.1007/s11263-018-1082-6

Download citation

Received: 24 July 2017
Accepted: 16 March 2018
Published: 02 April 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11263-018-1082-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A review of object detection based on deep learning

Deep learning models for digital image processing: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 5088 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A review of object detection based on deep learning

Deep learning models for digital image processing: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 5088 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation