Skip to main content

Advertisement

Log in

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks. We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Clearly, the engineering and optimization of tuning parameters requires some training data. For a long time, test data was misused for this purpose.

  2. There is also the extreme case of handcrafting entire scenes and all their objects. We do not consider this option here because it yields even far less data for the same effort.

  3. https://3dwarehouse.sketchup.com/.

  4. Made by ImageMagick’s random “plasma” generator.

  5. Visual effects artists have long been using the knowledge that “perfect” pictures are not perceived as “real” by humans; hence artificial film grain, chromatic aberrations, and lens flare effects are applied in movies and computer games.

  6. Bumblebee2 BB2-08S2C.

References

  • Aubry, M., Maturana, D., Efros, A., Russell, B., & Sivic, J. (2014). Seeing 3d chairs: Exemplar part-based 2d–3d alignment using a large dataset of cad models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. IJCV, 92, 1–31.

    Article  Google Scholar 

  • Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance of optical flow techniques. IJCV, 12, 43–77.

    Google Scholar 

  • Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, ICML ’09 (pp. 41–48).

  • Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Computer vision-ECCV, 2004 (pp. 25–36).

  • Brox, T., & Malik, J. (2011). Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 33, 500–513.

    Article  Google Scholar 

  • Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In European conference on computer vision (ECCV).

  • Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., et al. (2015). ShapeNet: An Information-rich 3D model repository. Tech. Rep. ArXiv preprint arXiv:1512.03012.

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In IEEE conference on computer vision and pattern recognition (CVPR).

  • de Souza, C. R., Gaidon, A., Cabon, Y., & Peña, A. M. L. (2017). Procedural generation of videos to train deep action recognition networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017 (pp. 2594–2604).

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Dosovitskiy, A., Fischer, P., Ilg E, Häusser, P., Hazırbaş, C., Golkov, V., et al. (2015). FlowNet: Learning optical flow with convolutional networks. In IEEE international conference on computer vision (ICCV).

  • Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). Carla: An open urban driving simulator. In Conference on robot learning (pp. 1–16).

  • Dwibedi, D., Misra, I., & Hebert, M. (2017). Cut, paste and learn: Surprisingly easy synthesis for instance detection. In The IEEE international conference on computer vision (ICCV).

  • Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Conference on neural information processing systems (NIPS).

  • Elman, J. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48(1), 71–99.

    Article  Google Scholar 

  • Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Handa, A., Pătrăucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). Understanding realworld indoor scenes with synthetic data. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Handa, A., Whelan, T., McDonald, J., & Davison, A. (2014). A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In IEEE international conference on robotics and automation (ICRA).

  • Heeger, D. J. (1987). Model for the extraction of image flow. JOSA A, 4, 1455–1471.

    Article  Google Scholar 

  • Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.

    Article  Google Scholar 

  • Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In IEEE international conference on computer vision (ICCV).

  • Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Klein, G., & Murray, D. W. (2010). Simulating low-cost cameras for augmented reality compositing. IEEE Transactions on Visualization and Computer Graphics, 16, 369–380.

    Article  Google Scholar 

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (ECCV).

  • Mac Aodha, O., Brostow, G. J., Pollefeys, M. (2010). Segmenting video into classes of algorithm-suitability. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1054–1061). IEEE.

  • Mac Aodha, O., Humayun, A., Pollefeys, M., & Brostow, G. J. (2013). Learning a confidence measure for optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1107–1120.

    Article  Google Scholar 

  • Mac Aodha, O., Brostow, G. J., Pollefeys, M. (2010). Segmenting video into classes of algorithm-suitability. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1054–1061). IEEE.

  • Mac Aodha, O., Humayun, A., Pollefeys, M., & Brostow, G. J. (2013). Learning a confidence measure for optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1107–1120.

    Article  Google Scholar 

  • Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE conference on computer vision and pattern recognition (CVPR).

  • McCane, B., Novins, K., Crannitch, D., & Galvin, B. (2001). On benchmarking optical flow. Computer Vision and Image Understanding, 84, 126–143.

    Article  MATH  Google Scholar 

  • McCormac, J., Handa, A., Leutenegger, S., & Davison, A. J. (2017). Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation? In The IEEE international conference on computer vision (ICCV).

  • Meister, S., & Kondermann, D. (2011). Real versus realistically rendered scenes for optical flow evaluation. In ITG conference on electronic media technology (CEMT).

  • Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Movshovitz-Attias, Y., Kanade, T., & Sheikh, Y. (2016). How useful is photo-realistic rendering for visual learning? In ECCV workshops.

  • Onkarappa, N., & Sappa, A. D. (2014). Speed and texture: An empirical study on optical-flow accuracy in ADAS scenarios. IEEE Transactions on Intelligent Transportation Systems, 15, 136–147.

    Article  Google Scholar 

  • Otte, M., & Nagel, H. H. (1995). Estimation of optical flow based on higher-order spatiotemporal derivatives in interlaced and non-interlaced image sequences. Artificial Intelligence, 78, 5–43.

    Article  Google Scholar 

  • Qiu, W., & Yuille, A. L. (2016). Unrealcv: Connecting computer vision to unreal engine. In Computer Vision-ECCV 2016 Workshops-Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part III (pp. 909–916)

  • Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In International conference on computer vision (ICCV).

  • Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In European conference on computer vision (ECCV).

  • Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3234–3243)

  • Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G, Nešić, N., Wang, X., & Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern recognition.

  • Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In European conference on computer vision (ECCV).

  • Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., & Funkhouser, T. (2017). Semantic scene completion from a single depth image. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In International conference on intelligent robot systems (IROS).

  • Su, H., Qi, C. R., Li, Y., & Guibas, L. J. (2015). Render for CNN: Viewpoint estimation in images using cnns trained with rendered 3d model views. In IEEE international conference on computer vision (ICCV).

  • Taylor, G. R., Chosak, A. J., & Brewer, P. C. (2007). Ovvv: Using virtual worlds to design and evaluate surveillance systems. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Vaudrey, T., Rabe, C., Klette, R., & Milburn, J. (2008). Differences between stereo and motion behaviour on synthetic and real-world stereo sequences. In International conference on image and vision computing.

  • Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J. (2015) 3d shapenets: A deep representation for volumetric shapes. In IEEE conference on computer vision and pattern recognition (CVPR).

  • Wulff, J., Butler, D. J., Stanley, G. B., & Black, M. J. (2012). Lessons and insights from creating a synthetic optical flow benchmark. In ECCV Workshop on unsolved problems in optical flow and stereo estimation.

  • Xiao, J., Owens, A., Torralba, A. (2013). Sun3d: A database of big spaces reconstructed using sfm and object labels. In IEEE international conference on computer vision (ICCV).

  • Zhang, Y., Qiu, W., Chen, Q., Hu, X., & Yuille, A. L. (2016). Unrealstereo: A synthetic dataset for analyzing stereo vision. Tech. Rep. ArXiv preprint arXiv:1612.04647.

  • Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., & Funkhouser, T. (2017). Physically-based rendering for indoor scene understanding using convolutional neural networks. In IEEE Conference on computer vision and pattern recognition (CVPR).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaus Mayer.

Additional information

Communicated by Adrien Gaidon, Florent Perronnin and Antonio Lopez.

We acknowledge funding by the ERC Starting Grant VideoLearn, the ERC Consolidator Grant “3D Reloaded”, the DFG Grant BR-3815/7-1, the DFG Grant CR 250/17-1, and the EU Horizon2020 project TrimBot2020. We thank Benjamin Ummenhofer for code that kick-started the creation of our 3D datasets.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5088 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mayer, N., Ilg, E., Fischer, P. et al. What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?. Int J Comput Vis 126, 942–960 (2018). https://doi.org/10.1007/s11263-018-1082-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1082-6

Keywords

Navigation