Abstract
The performance of supervised deep learning algorithms depends significantly on the scale, quality and diversity of the data used for their training. Collecting and manually annotating large amount of data can be both time-consuming and costly tasks to perform. In the case of tasks related to visual human-centric perception, the collection and distribution of such data may also face restrictions due to legislation regarding privacy. In addition, the design and testing of complex systems, e.g., robots, which often employ deep learning-based perception models, may face severe difficulties as even state-of-the-art methods trained on real and large-scale datasets cannot always perform adequately as they have not adapted to the visual differences between the virtual and the real world data. As an attempt to tackle and mitigate the effect of these issues, we present a method that automatically generates realistic synthetic data with annotations for a) person detection, b) face recognition, and c) human pose estimation. The proposed method takes as input real background images and populates them with human figures in various poses. Instead of using hand-made 3D human models, we propose the use of models generated through deep learning methods, further reducing the dataset creation costs, while maintaining a high level of realism. In addition, we provide open-source and easy to use tools that implement the proposed pipeline, allowing for generating highly-realistic synthetic datasets for a variety of tasks. A benchmarking and evaluation in the corresponding tasks shows that synthetic data can be effectively used as a supplement to real data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alhaija, H.A., Mustikovela, S.K., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets deep learning for car instance segmentation in urban scenes. In: Proceedings of the British Machine Vision Conference (2017)
Alhaija, H.A., Mustikovela, S.K., Mescheder, L.M., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126, 961–972 (2018)
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2shape: detailed full human body geometry from a single image. In: Proceedings of the International Conference on Computer Vision (2019)
Behrmann, J., Grathwohl, W., Chen, R.T.Q., Duvenaud, D., Jacobsen, J.H.: Invertible residual networks (2019)
Bulat, A., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: Toward fast and accurate human pose estimation via soft-gated skip connections. arXiv preprint arXiv:2002.11098 (2020)
Chen, S., Liu, Y., Gao, X., Han, Z.: MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices (2018)
Chen, W., et al.: Synthesizing training images for boosting human 3d pose estimation. In: 2016 4th International Conference on 3D Vision (3DV), pp. 479–488 (2016)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition (2019)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016)
Girasa, R.: Ethics and privacy I: facial recognition and robotics. In: Artificial Intelligence as a Disruptive Technology, pp. 105–146. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-35975-1_4
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-Celeb-1M: a dataset and benchmark for large-scale face recognition, pp. 87–102 (October 2016)
Hattori, H., Naresh Boddeti, V., Kitani, K.M., Kanade, T.: Learning scene-specific pedestrian detectors without real data. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Hinterstoisser, S., Pauly, O., Heibel, H., Marek, M., Bokeloh, M.: An annotation saved is an annotation earned: using fully synthetic training for object detection. In: Proceedings of the International Conference on Computer Vision Workshops (2019)
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst (October 2007)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-degree textures of people in clothing from a single image. In: 2019 International Conference on 3D Vision (3DV), pp. 643–653 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 2481–24816 (2015)
Masi, I., Hassner, T., Tran, A.T., Medioni, G.: Rapid synthesis of massive face sets for improved face recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, pp. 604–611 (2017)
Michel, O.: Cyberbotics Ltd. webots\(^{\text{TM}}\): professional mobile robot simulation. Int. J. Adv. Robot. Syst. 1(1), 5 (2004)
Movshovitz-Attias, Y., Kanade, T., Sheikh, Y.: How useful is photo-realistic rendering for visual learning? In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 202–217. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_18
Osokin, D.: Real-time 2d multi-person pose estimation on CPU: lightweight openpose. arXiv preprint arXiv:1811.12004 (2018)
Pavlakos, G., et al.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B.: Learning people detection models from few training samples. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 1473–1480 (2011)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Rodriguez, M., Laptev, I., Sivic, J., Audibert, J.Y.: Density-aware person detection and tracking in crowds. In: Proceedings of the International Conference on Computer Vision, pp. 2423–2430 (2011)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2019)
Sengupta, S., Cheng, J., Castillo, C., Patel, V., Chellappa, R., Jacobs, D.: Frontal to profile face verification in the wild. In: IEEE Conference on Applications of Computer Vision (2016)
Roberto de Souza, C., Gaidon, A., Cabon, Y., Manuel Lopez, A.: Procedural generation of videos to train deep action recognition networks. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2017)
Tremblay, J., To, T., Birchfield, S.: Falling things: a synthetic dataset for 3d object detection and pose estimation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (June 2018)
Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition (2018)
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2019)
Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2014)
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. arXiv preprint arXiv:1910.06278 (2019)
Zimmermann, C., Welschehold, T., Dornhege, C., Burgard, W., Brox, T.: 3d human pose estimation in RGBD images for robotic task learning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1986–1992 (2018)
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 871449 (OpenDR). This publication reflects the authors views only. The European Commission is not responsible for any use that may be made of the information it contains.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Symeonidis, C. et al. (2021). Efficient Realistic Data Generation Framework Leveraging Deep Learning-Based Human Digitization. In: Iliadis, L., Macintyre, J., Jayne, C., Pimenidis, E. (eds) Proceedings of the 22nd Engineering Applications of Neural Networks Conference. EANN 2021. Proceedings of the International Neural Networks Society, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-80568-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-80568-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80567-8
Online ISBN: 978-3-030-80568-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)