Efficient Realistic Data Generation Framework Leveraging Deep Learning-Based Human Digitization

Symeonidis, C.; Nousi, P.; Tosidis, P.; Tsampazis, K.; Passalis, N.; Tefas, A.; Nikolaidis, N.

doi:10.1007/978-3-030-80568-5_23

C. Symeonidis⁷,
P. Nousi⁷,
P. Tosidis⁷,
K. Tsampazis⁷,
N. Passalis⁷,
A. Tefas⁷ &
…
N. Nikolaidis⁷

Part of the book series: Proceedings of the International Neural Networks Society ((INNS,volume 3))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

801 Accesses
2 Citations

Abstract

The performance of supervised deep learning algorithms depends significantly on the scale, quality and diversity of the data used for their training. Collecting and manually annotating large amount of data can be both time-consuming and costly tasks to perform. In the case of tasks related to visual human-centric perception, the collection and distribution of such data may also face restrictions due to legislation regarding privacy. In addition, the design and testing of complex systems, e.g., robots, which often employ deep learning-based perception models, may face severe difficulties as even state-of-the-art methods trained on real and large-scale datasets cannot always perform adequately as they have not adapted to the visual differences between the virtual and the real world data. As an attempt to tackle and mitigate the effect of these issues, we present a method that automatically generates realistic synthetic data with annotations for a) person detection, b) face recognition, and c) human pose estimation. The proposed method takes as input real background images and populates them with human figures in various poses. Instead of using hand-made 3D human models, we propose the use of models generated through deep learning methods, further reducing the dataset creation costs, while maintaining a high level of realism. In addition, we provide open-source and easy to use tools that implement the proposed pipeline, allowing for generating highly-realistic synthetic datasets for a variety of tasks. A benchmarking and evaluation in the corresponding tasks shows that synthetic data can be effectively used as a supplement to real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alhaija, H.A., Mustikovela, S.K., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets deep learning for car instance segmentation in urban scenes. In: Proceedings of the British Machine Vision Conference (2017)
Google Scholar
Alhaija, H.A., Mustikovela, S.K., Mescheder, L.M., Geiger, A., Rother, C.: Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126, 961–972 (2018)
Article Google Scholar
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2shape: detailed full human body geometry from a single image. In: Proceedings of the International Conference on Computer Vision (2019)
Google Scholar
Behrmann, J., Grathwohl, W., Chen, R.T.Q., Duvenaud, D., Jacobsen, J.H.: Invertible residual networks (2019)
Google Scholar
Bulat, A., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: Toward fast and accurate human pose estimation via soft-gated skip connections. arXiv preprint arXiv:2002.11098 (2020)
Chen, S., Liu, Y., Gao, X., Han, Z.: MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices (2018)
Google Scholar
Chen, W., et al.: Synthesizing training images for boosting human 3d pose estimation. In: 2016 4th International Conference on 3D Vision (3DV), pp. 479–488 (2016)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition (2019)
Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Girasa, R.: Ethics and privacy I: facial recognition and robotics. In: Artificial Intelligence as a Disruptive Technology, pp. 105–146. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-35975-1_4
Chapter Google Scholar
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-Celeb-1M: a dataset and benchmark for large-scale face recognition, pp. 87–102 (October 2016)
Google Scholar
Hattori, H., Naresh Boddeti, V., Kitani, K.M., Kanade, T.: Learning scene-specific pedestrian detectors without real data. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Hinterstoisser, S., Pauly, O., Heibel, H., Marek, M., Bokeloh, M.: An annotation saved is an annotation earned: using fully synthetic training for object detection. In: Proceedings of the International Conference on Computer Vision Workshops (2019)
Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst (October 2007)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Google Scholar
Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-degree textures of people in clothing from a single image. In: 2019 International Conference on 3D Vision (3DV), pp. 643–653 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 2481–24816 (2015)
Article Google Scholar
Masi, I., Hassner, T., Tran, A.T., Medioni, G.: Rapid synthesis of massive face sets for improved face recognition. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017, pp. 604–611 (2017)
Google Scholar
Michel, O.: Cyberbotics Ltd. webots\(^{\text{TM}}\): professional mobile robot simulation. Int. J. Adv. Robot. Syst. 1(1), 5 (2004)
Google Scholar
Movshovitz-Attias, Y., Kanade, T., Sheikh, Y.: How useful is photo-realistic rendering for visual learning? In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 202–217. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_18
Chapter Google Scholar
Osokin, D.: Real-time 2d multi-person pose estimation on CPU: lightweight openpose. arXiv preprint arXiv:1811.12004 (2018)
Pavlakos, G., et al.: Expressive body capture: 3d hands, face, and body from a single image. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Google Scholar
Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B.: Learning people detection models from few training samples. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 1473–1480 (2011)
Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Rodriguez, M., Laptev, I., Sivic, J., Audibert, J.Y.: Density-aware person detection and tracking in crowds. In: Proceedings of the International Conference on Computer Vision, pp. 2423–2430 (2011)
Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Sengupta, S., Cheng, J., Castillo, C., Patel, V., Chellappa, R., Jacobs, D.: Frontal to profile face verification in the wild. In: IEEE Conference on Applications of Computer Vision (2016)
Google Scholar
Roberto de Souza, C., Gaidon, A., Cabon, Y., Manuel Lopez, A.: Procedural generation of videos to train deep action recognition networks. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Tremblay, J., To, T., Birchfield, S.: Falling things: a synthetic dataset for 3d object detection and pose estimation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (June 2018)
Google Scholar
Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition (2018)
Google Scholar
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Yang, W., Luo, P., Lin, L.: Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. arXiv preprint arXiv:1910.06278 (2019)
Zimmermann, C., Welschehold, T., Dornhege, C., Burgard, W., Brox, T.: 3d human pose estimation in RGBD images for robotic task learning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1986–1992 (2018)
Google Scholar

Download references

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 871449 (OpenDR). This publication reflects the authors views only. The European Commission is not responsible for any use that may be made of the information it contains.

Author information

Authors and Affiliations

Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
C. Symeonidis, P. Nousi, P. Tosidis, K. Tsampazis, N. Passalis, A. Tefas & N. Nikolaidis

Authors

C. Symeonidis
View author publications
You can also search for this author in PubMed Google Scholar
P. Nousi
View author publications
You can also search for this author in PubMed Google Scholar
P. Tosidis
View author publications
You can also search for this author in PubMed Google Scholar
K. Tsampazis
View author publications
You can also search for this author in PubMed Google Scholar
N. Passalis
View author publications
You can also search for this author in PubMed Google Scholar
A. Tefas
View author publications
You can also search for this author in PubMed Google Scholar
N. Nikolaidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Symeonidis .

Editor information

Editors and Affiliations

School of Engineering, Department of Civil Engineering, Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Faculty of Applied Sciences, University of Sunderland, Sunderland, UK
John Macintyre
School of Computing and Digital Technologies, Teesside University, Middlesbrough, UK
Chrisina Jayne
University of the West of England, Bristol, UK
Elias Pimenidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Symeonidis, C. et al. (2021). Efficient Realistic Data Generation Framework Leveraging Deep Learning-Based Human Digitization. In: Iliadis, L., Macintyre, J., Jayne, C., Pimenidis, E. (eds) Proceedings of the 22nd Engineering Applications of Neural Networks Conference. EANN 2021. Proceedings of the International Neural Networks Society, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-80568-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-80568-5_23
Published: 01 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80567-8
Online ISBN: 978-3-030-80568-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics