research-article

Towards Controllable and Photorealistic Region-wise Image Manipulation

Authors:
Ansheng You

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Chenglin Zhou

ShanghaiTech University, Shanghai, China

ShanghaiTech University, Shanghai, China
View Profile

,
Qixuan Zhang

ShanghaiTech University, Shanghai, China

ShanghaiTech University, Shanghai, China
View Profile

,
Lan Xu

ShanghaiTech University, Shanghai, China

ShanghaiTech University, Shanghai, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 535–543https://doi.org/10.1145/3474085.3475206

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 535–543

ABSTRACT

Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresponding content and style references. The model is also constrained by a content alignment loss to ensure the foreground editing will not interfere background contents. As a result, given interested region masks provided by users, our model supports foreground region-wise style transfer. Specially, our model receives no extra annotations such as semantic labels except for self-supervision. Extensive experiments show the effectiveness of the proposed method and exhibit the flexibility of the proposed model for various applications, including region-wise style editing, latent space interpolation, cross-domain style transfer.

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google Scholar
Yazeed Alharbi and Peter Wonka. 2020. Disentangled Image Generation Through Structured Noise Injection. In CVPR. 5133--5141.Google Scholar
David Bau, Hendrik Strobelt, William S. Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, and Antonio Torralba. 2019. Semantic photo manipulation with a generative image prior. ACM TOG, Vol. 38, 4 (2019), 59:1--59:11. Google ScholarDigital Library
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In ICLR.Google Scholar
Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In CVPR. 8789--8797.Google Scholar
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In CVPR. 8185--8194.Google Scholar
Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2017. A Learned Representation For Artistic Style. In ICLR.Google Scholar
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. A Neural Algorithm of Artistic Style. CoRR, Vol. abs/1508.06576 (2015).Google Scholar
L. A. Gatys, A. S. Ecker, and M. Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In CVPR. 2414--2423.Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680. Google ScholarDigital Library
Erik H"a rkö nen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. CoRR, Vol. abs/2004.02546 (2020).Google Scholar
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS. 6626--6637. Google ScholarDigital Library
Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In ICCV. 1510--1519.Google Scholar
Xun Huang, Ming-Yu Liu, Serge J. Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-Image Translation. In ECCV. 179--196.Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. 5967--5976.Google Scholar
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In ECCV. 694--711.Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018a. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In ICLR.Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018b. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In ICLR.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019 a. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR. 4401--4410.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019 b. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR.Google Scholar
T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In CVPR.Google Scholar
Nicholas Kolkin, Jason Salavon, and Greg Shakhnarovich. 2019 a. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In CVPR.Google Scholar
Nicholas I. Kolkin, Jason Salavon, and Gregory Shakhnarovich. 2019 b. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In CVPR. 10051--10060.Google Scholar
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In CVPR.Google Scholar
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse Image-to-Image Translation via Disentangled Representations. In ECCV. 36--52.Google Scholar
Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. In ECCV. 702--716.Google Scholar
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017a. Diversified Texture Synthesis with Feed-Forward Networks. In CVPR. 266--274.Google Scholar
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017b. Universal Style Transfer via Feature Transforms. In NeurIPS. 386--396. Google ScholarDigital Library
Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and J. Kautz. 2018. A Closed-form Solution to Photorealistic Image Stylization. In ECCV.Google Scholar
Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. In ECCV. 89--105.Google Scholar
Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In NeurIPS. 700--708. Google ScholarDigital Library
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV. Google ScholarDigital Library
Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. 2017. Deep Photo Style Transfer. In CVPR. 6997--7005.Google Scholar
Fangchang Ma, Ulas Ayaz, and Sertac Karaman. 2018. Invertibility of Convolutional Generative Networks from Partial Measurements. In NeurIPS. 9651--9660. Google ScholarDigital Library
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In CVPR. 2337--2346. Google ScholarDigital Library
Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, and Richard Zhang. 2020. Swapping Autoencoder for Deep Image Manipulation. In NeurIPS.Google Scholar
William S. Peebles, John Peebles, Jun-Yan Zhu, Alexei A. Efros, and Antonio Torralba. 2020. The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement. In ECCV. 581--597.Google Scholar
Justin N. M. Pinkney and Doron Adler. 2020. Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains. CoRR, Vol. abs/2010.05334 (2020).Google Scholar
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI. 234--241.Google Scholar
Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning a Generative Model from a Single Natural Image. In ICCV.Google Scholar
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020 a. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR. 9240--9249.Google Scholar
Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020 b. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE TPAMI (2020).Google Scholar
Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. CoRR, Vol. abs/2007.06600 (2020).Google Scholar
Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. 2019. InGAN: Capturing and Retargeting the "DNA" of a Natural Image. In ICCV.Google Scholar
Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. 2019. Whitening and Coloring Batch Transform for GANs. In ICLR.Google Scholar
Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. 2020. Deep Image Prior. IJCV, Vol. 128, 7 (2020), 1867--1888.Google ScholarCross Ref
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In CVPR. 8798--8807.Google Scholar
Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling Deep Image Synthesis With Texture Patches. In CVPR. 8456--8465.Google Scholar
Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, and Jung-Woo Ha. 2019. Photorealistic Style Transfer via Wavelet Transforms. In ICCV.Google Scholar
Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. CoRR, Vol. abs/1506.03365 (2015).Google Scholar
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In CVPR. 5505--5514.Google Scholar
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2019. Free-Form Image Inpainting With Gated Convolution. In ICCV. 4470--4479.Google Scholar
Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2018. Non-stationary texture synthesis by adversarial expansion. ACM Trans. Graph., Vol. 37, 4 (2018), 49:1--49:13. Google ScholarDigital Library
Jun-Yan Zhu, Philipp Kr"a henbü hl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV. 597--613.Google Scholar
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017a. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. 2242--2251.Google Scholar
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward Multimodal Image-to-Image Translation. In NeurIPS. 465--476. Google ScholarDigital Library
Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020 a. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In CVPR.Google Scholar
Zhen Zhu, Zhiliang Xu, Ansheng You, and Xiang Bai. 2020 b. Semantically Multi-modal Image Synthesis. In CVPR.Google Scholar

Index Terms

Towards Controllable and Photorealistic Region-wise Image Manipulation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
      2. Computer vision representations
        Appearance and texture representations

Recommendations

Image region description using orthogonal combination of local binary patterns enhanced with color information

Visual content description is a key issue for machine-based image analysis and understanding. A good visual descriptor should be both discriminative and computationally efficient while possessing some properties of robustness to viewpoint changes and ...
Read More
Discriminative image hashing based on region of interest
MMM'10: Proceedings of the 16th international conference on Advances in Multimedia Modeling

In this paper, we propose a discriminative image hashing scheme based on Region of Interest (ROI) in order to increase the discriminative capability under image content modifications, while the robustness to content preserving operations is also ...
Read More
Region-based automatic web image selection
MIR '10: Proceedings of the international conference on Multimedia information retrieval

We propose a new Web image selection method which employs the region-based bag-of-features representation. The contribution of this work is (1) to introduce the region-based bag-of-features representation into an Web image selection task where training ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
code consistency
content alignment
controllable and photorealistic
image editing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 158
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Controllable and Photorealistic Region-wise Image Manipulation

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Image region description using orthogonal combination of local binary patterns enhanced with color information

Discriminative image hashing based on region of interest

Region-based automatic web image selection