skip to main content
10.1145/3474085.3475206acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Towards Controllable and Photorealistic Region-wise Image Manipulation

Published:17 October 2021Publication History

ABSTRACT

Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresponding content and style references. The model is also constrained by a content alignment loss to ensure the foreground editing will not interfere background contents. As a result, given interested region masks provided by users, our model supports foreground region-wise style transfer. Specially, our model receives no extra annotations such as semantic labels except for self-supervision. Extensive experiments show the effectiveness of the proposed method and exhibit the flexibility of the proposed model for various applications, including region-wise style editing, latent space interpolation, cross-domain style transfer.

References

  1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google ScholarGoogle Scholar
  2. Yazeed Alharbi and Peter Wonka. 2020. Disentangled Image Generation Through Structured Noise Injection. In CVPR. 5133--5141.Google ScholarGoogle Scholar
  3. David Bau, Hendrik Strobelt, William S. Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, and Antonio Torralba. 2019. Semantic photo manipulation with a generative image prior. ACM TOG, Vol. 38, 4 (2019), 59:1--59:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In ICLR.Google ScholarGoogle Scholar
  5. Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In CVPR. 8789--8797.Google ScholarGoogle Scholar
  6. Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In CVPR. 8185--8194.Google ScholarGoogle Scholar
  7. Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2017. A Learned Representation For Artistic Style. In ICLR.Google ScholarGoogle Scholar
  8. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. A Neural Algorithm of Artistic Style. CoRR, Vol. abs/1508.06576 (2015).Google ScholarGoogle Scholar
  9. L. A. Gatys, A. S. Ecker, and M. Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In CVPR. 2414--2423.Google ScholarGoogle Scholar
  10. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Erik H"a rkö nen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. CoRR, Vol. abs/2004.02546 (2020).Google ScholarGoogle Scholar
  12. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS. 6626--6637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In ICCV. 1510--1519.Google ScholarGoogle Scholar
  14. Xun Huang, Ming-Yu Liu, Serge J. Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-Image Translation. In ECCV. 179--196.Google ScholarGoogle Scholar
  15. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. 5967--5976.Google ScholarGoogle Scholar
  16. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In ECCV. 694--711.Google ScholarGoogle Scholar
  17. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018a. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In ICLR.Google ScholarGoogle Scholar
  18. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018b. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In ICLR.Google ScholarGoogle Scholar
  19. Tero Karras, Samuli Laine, and Timo Aila. 2019 a. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR. 4401--4410.Google ScholarGoogle Scholar
  20. Tero Karras, Samuli Laine, and Timo Aila. 2019 b. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR.Google ScholarGoogle Scholar
  21. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In CVPR.Google ScholarGoogle Scholar
  22. Nicholas Kolkin, Jason Salavon, and Greg Shakhnarovich. 2019 a. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In CVPR.Google ScholarGoogle Scholar
  23. Nicholas I. Kolkin, Jason Salavon, and Gregory Shakhnarovich. 2019 b. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In CVPR. 10051--10060.Google ScholarGoogle Scholar
  24. Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In CVPR.Google ScholarGoogle Scholar
  25. Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse Image-to-Image Translation via Disentangled Representations. In ECCV. 36--52.Google ScholarGoogle Scholar
  26. Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. In ECCV. 702--716.Google ScholarGoogle Scholar
  27. Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017a. Diversified Texture Synthesis with Feed-Forward Networks. In CVPR. 266--274.Google ScholarGoogle Scholar
  28. Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017b. Universal Style Transfer via Feature Transforms. In NeurIPS. 386--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and J. Kautz. 2018. A Closed-form Solution to Photorealistic Image Stylization. In ECCV.Google ScholarGoogle Scholar
  30. Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. In ECCV. 89--105.Google ScholarGoogle Scholar
  31. Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In NeurIPS. 700--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. 2017. Deep Photo Style Transfer. In CVPR. 6997--7005.Google ScholarGoogle Scholar
  34. Fangchang Ma, Ulas Ayaz, and Sertac Karaman. 2018. Invertibility of Convolutional Generative Networks from Partial Measurements. In NeurIPS. 9651--9660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google ScholarGoogle Scholar
  36. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In CVPR. 2337--2346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, and Richard Zhang. 2020. Swapping Autoencoder for Deep Image Manipulation. In NeurIPS.Google ScholarGoogle Scholar
  38. William S. Peebles, John Peebles, Jun-Yan Zhu, Alexei A. Efros, and Antonio Torralba. 2020. The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement. In ECCV. 581--597.Google ScholarGoogle Scholar
  39. Justin N. M. Pinkney and Doron Adler. 2020. Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains. CoRR, Vol. abs/2010.05334 (2020).Google ScholarGoogle Scholar
  40. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI. 234--241.Google ScholarGoogle Scholar
  41. Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning a Generative Model from a Single Natural Image. In ICCV.Google ScholarGoogle Scholar
  42. Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020 a. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR. 9240--9249.Google ScholarGoogle Scholar
  43. Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020 b. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE TPAMI (2020).Google ScholarGoogle Scholar
  44. Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. CoRR, Vol. abs/2007.06600 (2020).Google ScholarGoogle Scholar
  45. Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. 2019. InGAN: Capturing and Retargeting the "DNA" of a Natural Image. In ICCV.Google ScholarGoogle Scholar
  46. Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. 2019. Whitening and Coloring Batch Transform for GANs. In ICLR.Google ScholarGoogle Scholar
  47. Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. 2020. Deep Image Prior. IJCV, Vol. 128, 7 (2020), 1867--1888.Google ScholarGoogle ScholarCross RefCross Ref
  48. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In CVPR. 8798--8807.Google ScholarGoogle Scholar
  49. Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling Deep Image Synthesis With Texture Patches. In CVPR. 8456--8465.Google ScholarGoogle Scholar
  50. Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, and Jung-Woo Ha. 2019. Photorealistic Style Transfer via Wavelet Transforms. In ICCV.Google ScholarGoogle Scholar
  51. Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. CoRR, Vol. abs/1506.03365 (2015).Google ScholarGoogle Scholar
  52. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In CVPR. 5505--5514.Google ScholarGoogle Scholar
  53. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2019. Free-Form Image Inpainting With Gated Convolution. In ICCV. 4470--4479.Google ScholarGoogle Scholar
  54. Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2018. Non-stationary texture synthesis by adversarial expansion. ACM Trans. Graph., Vol. 37, 4 (2018), 49:1--49:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jun-Yan Zhu, Philipp Kr"a henbü hl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV. 597--613.Google ScholarGoogle Scholar
  56. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017a. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. 2242--2251.Google ScholarGoogle Scholar
  57. Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward Multimodal Image-to-Image Translation. In NeurIPS. 465--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020 a. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In CVPR.Google ScholarGoogle Scholar
  59. Zhen Zhu, Zhiliang Xu, Ansheng You, and Xiang Bai. 2020 b. Semantically Multi-modal Image Synthesis. In CVPR.Google ScholarGoogle Scholar

Index Terms

  1. Towards Controllable and Photorealistic Region-wise Image Manipulation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '21: Proceedings of the 29th ACM International Conference on Multimedia
        October 2021
        5796 pages
        ISBN:9781450386517
        DOI:10.1145/3474085

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader