Skip to main content

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12367))

Included in the following conference series:

Abstract

Recent advances in image generation gave rise to powerful tools for semantic image editing. However, existing approaches can either operate on a single image or require an abundance of additional information. They are not capable of handling the complete set of editing operations, that is addition, manipulation or removal of semantic concepts. To address these limitations, we propose SESAME, a novel generator-discriminator pair for Semantic Editing of Scenes by Adding, Manipulating or Erasing objects. In our setup, the user provides the semantic labels of the areas to be edited and the generator synthesizes the corresponding pixels. In contrast to previous methods that employ a discriminator that trivially concatenates semantics and image as an input, the SESAME discriminator is composed of two input streams that independently process the image and its semantics, using the latter to manipulate the results of the former. We evaluate our model on a diverse set of datasets and report state-of-the-art performance on two tasks: (a) image manipulation and (b) image generation conditioned on semantic labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.mturk.com.

References

  1. Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)

  2. Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the International Conference Computer Vision (ICCV) (2019)

    Google Scholar 

  3. Bau, D., et al.: Semantic photo manipulation with a generative image prior. ACM Trans. Graph. (TOG) 38, 1–11 (2019)

    Article  Google Scholar 

  4. Bau, D., et al.: Seeing what a GAN cannot generate. In: Proceedings of the International Conference Computer Vision (ICCV) (2019)

    Google Scholar 

  5. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of the International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  6. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  7. COGNEX: Visionpro vidi: deep learning-based software for industrial image analysis. https://www.cognex.com/products/machine-vision/vision-software/visionpro-vidi. Accessed 05 Mar 2019

  8. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  9. Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, 321–331 (2018)

    Article  Google Scholar 

  10. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  11. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)

    Google Scholar 

  12. Hong, S., Yan, X., Huang, T.E., Lee, H.: Learning hierarchical semantic image manipulation through structured representations. In: Advances in Neural Information Processing Systems, pp. 2713–2723 (2018)

    Google Scholar 

  13. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 36(4), 1–14 (2017)

    Article  Google Scholar 

  14. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  15. Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state of the art. arXiv preprint arXiv:1704.05519 (2017)

  16. Jo, Y., Park, J.: SC-FEGAN: face editing generative adversarial network with user’s sketch and color. In: Proceedings of the International Conference Computer Vision (ICCV) (2019)

    Google Scholar 

  17. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    Chapter  Google Scholar 

  18. Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  19. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  20. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  22. Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. arXiv preprint arXiv:1907.11922 (2019)

  23. Lee, D., Liu, S., Gu, J., Liu, M.Y., Yang, M.H., Kautz, J.: Context-aware synthesis and placement of object instances. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 10393–10403 (2018)

    Google Scholar 

  24. Lim, J.H., Ye, J.C.: Geometric GAN. arXiv preprint arXiv:1705.02894 (2017)

  25. Liu, G., Reda, F.A., Shih, K.J., Wang, T.C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100 (2018)

    Google Scholar 

  26. Liu, X., Yin, G., Shao, J., Wang, X., Li, H.: Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  27. Lugmayr, A., et al.: Aim 2019 challenge on real-world image super-resolution: methods and results. In: Proceedings of the International Conference Computer Vision (ICCV), Advances in Image Manipulation Workshop (2019)

    Google Scholar 

  28. Mao, Q., Lee, H.Y., Tseng, H.Y., Ma, S., Yang, M.H.: Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1429–1437 (2019)

    Google Scholar 

  29. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  30. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  31. Miyato, T., Koyama, M.: cGANs with projection discriminator. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  32. Nazeri, K., Ng, E., Joseph, T., Qureshi, F.Z., Ebrahimi, M.: EdgeConnect: generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212 (2019)

  33. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585 (2016)

  34. Ouyang, X., Cheng, Y., Jiang, Y., Li, C.L., Zhou, P.: Pedestrian-Synthesis-GAN: generating pedestrian data in real scene and beyond. arXiv preprint arXiv:1804.02047 (2018)

  35. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  36. Photoshop: version 21.1.0. Adobe Inc., San Jose, California, U.S. (2020)

    Google Scholar 

  37. Portenier, T., Hu, Q., Szabó, A., Bigdeli, S., Favaro, P., Zwicker, M.: FaceShop: deep sketch-based face image editing. ACM Trans. Graph. 37, 99:1–99:13 (2018)

    Article  Google Scholar 

  38. G.I.M. Program: version 2.10.18. The GIMP Development Team (2018)

    Google Scholar 

  39. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: Proceedings of the International Conference on Machine Learning (ICML) (2016)

    Google Scholar 

  40. Romero, A., Arbeláez, P., Van Gool, L., Timofte, R.: SMIT: stochastic multi-label image-to-image translation. In: Proceedings of the International Conference Computer Vision (ICCV), Workshops (2019)

    Google Scholar 

  41. Salimans, T., et al.: Improved techniques for training GANs. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)

    Google Scholar 

  42. Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  43. Shaham, T.R., Dekel, T., Michaeli, T.: SinGAN: learning a generative model from a single natural image. In: Proceedings of the International Conference Computer Vision (ICCV) (2019)

    Google Scholar 

  44. Shetty, R., Fritz, M., Schiele, B.: Adversarial scene editing: automatic object removal from weak supervision. In: Bengio, S., Wallach, H., Larochelle, H., Graumann, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 7716–7726. Curran Associates, Montréal (2018)

    Google Scholar 

  45. Shetty, R., Schiele, B., Fritz, M.: Not using the car to see the sidewalk: quantifying and controlling the effects of context in classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)

    Google Scholar 

  46. Tran, D., Ranganath, R., Blei, D.: Hierarchical implicit models and likelihood-free variational inference. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, pp. 5523–5533 (2017)

    Google Scholar 

  47. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  48. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Yu., Loy, C.C.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5

    Chapter  Google Scholar 

  49. Wang, Y.X., Girshick, R., Hebert, M., Hariharan, B.: Low-shot learning from imaginary data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018)

    Google Scholar 

  50. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  51. Wu, S., Lin, S., Wu, W., Azzam, M., Wong, H.S.: Semi-supervised pedestrian instance synthesis and detection with mutual reinforcement. In: Proceedings of the International Conference Computer Vision (ICCV) (2019)

    Google Scholar 

  52. Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. arXiv preprint arXiv:1711.10485 (2017)

  53. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  54. Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  55. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589 (2018)

  56. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5505–5514 (2018)

    Google Scholar 

  57. Zhang, H., Goodfellow, I.J., Metaxas, D.N., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)

  58. Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the International Conference Computer Vision (ICCV) (2017)

    Google Scholar 

  59. Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916 (2017)

  60. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ADE20K dataset. arXiv preprint arXiv:1608.05442 (2016)

  61. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by CSEM, ETH Zurich Fund (OK) and by Huawei, Amazon AWS and Nvidia GPU grants. We are grateful to Despoina Paschalidou, Siavash Bigdeli and Danda Pani Paudel for fruitful discussions. We also thank Gene Kogan for providing guidance on how to prepare the Flickr Landscapes Dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evangelos Ntavelis .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4878 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ntavelis, E., Romero, A., Kastanis, I., Van Gool, L., Timofte, R. (2020). SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58542-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58541-9

  • Online ISBN: 978-3-030-58542-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics