ABSTRACT
Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresponding content and style references. The model is also constrained by a content alignment loss to ensure the foreground editing will not interfere background contents. As a result, given interested region masks provided by users, our model supports foreground region-wise style transfer. Specially, our model receives no extra annotations such as semantic labels except for self-supervision. Extensive experiments show the effectiveness of the proposed method and exhibit the flexibility of the proposed model for various applications, including region-wise style editing, latent space interpolation, cross-domain style transfer.
- Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In ICCV.Google Scholar
- Yazeed Alharbi and Peter Wonka. 2020. Disentangled Image Generation Through Structured Noise Injection. In CVPR. 5133--5141.Google Scholar
- David Bau, Hendrik Strobelt, William S. Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, and Antonio Torralba. 2019. Semantic photo manipulation with a generative image prior. ACM TOG, Vol. 38, 4 (2019), 59:1--59:11. Google ScholarDigital Library
- Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In ICLR.Google Scholar
- Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In CVPR. 8789--8797.Google Scholar
- Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In CVPR. 8185--8194.Google Scholar
- Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2017. A Learned Representation For Artistic Style. In ICLR.Google Scholar
- Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. A Neural Algorithm of Artistic Style. CoRR, Vol. abs/1508.06576 (2015).Google Scholar
- L. A. Gatys, A. S. Ecker, and M. Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In CVPR. 2414--2423.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680. Google ScholarDigital Library
- Erik H"a rkö nen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. CoRR, Vol. abs/2004.02546 (2020).Google Scholar
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NeurIPS. 6626--6637. Google ScholarDigital Library
- Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In ICCV. 1510--1519.Google Scholar
- Xun Huang, Ming-Yu Liu, Serge J. Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-Image Translation. In ECCV. 179--196.Google Scholar
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. 5967--5976.Google Scholar
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In ECCV. 694--711.Google Scholar
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018a. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In ICLR.Google Scholar
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018b. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In ICLR.Google Scholar
- Tero Karras, Samuli Laine, and Timo Aila. 2019 a. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR. 4401--4410.Google Scholar
- Tero Karras, Samuli Laine, and Timo Aila. 2019 b. A Style-Based Generator Architecture for Generative Adversarial Networks. In CVPR.Google Scholar
- T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In CVPR.Google Scholar
- Nicholas Kolkin, Jason Salavon, and Greg Shakhnarovich. 2019 a. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In CVPR.Google Scholar
- Nicholas I. Kolkin, Jason Salavon, and Gregory Shakhnarovich. 2019 b. Style Transfer by Relaxed Optimal Transport and Self-Similarity. In CVPR. 10051--10060.Google Scholar
- Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In CVPR.Google Scholar
- Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse Image-to-Image Translation via Disentangled Representations. In ECCV. 36--52.Google Scholar
- Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. In ECCV. 702--716.Google Scholar
- Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017a. Diversified Texture Synthesis with Feed-Forward Networks. In CVPR. 266--274.Google Scholar
- Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017b. Universal Style Transfer via Feature Transforms. In NeurIPS. 386--396. Google ScholarDigital Library
- Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and J. Kautz. 2018. A Closed-form Solution to Photorealistic Image Stylization. In ECCV.Google Scholar
- Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. In ECCV. 89--105.Google Scholar
- Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In NeurIPS. 700--708. Google ScholarDigital Library
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV. Google ScholarDigital Library
- Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. 2017. Deep Photo Style Transfer. In CVPR. 6997--7005.Google Scholar
- Fangchang Ma, Ulas Ayaz, and Sertac Karaman. 2018. Invertibility of Convolutional Generative Networks from Partial Measurements. In NeurIPS. 9651--9660. Google ScholarDigital Library
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
- Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In CVPR. 2337--2346. Google ScholarDigital Library
- Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, and Richard Zhang. 2020. Swapping Autoencoder for Deep Image Manipulation. In NeurIPS.Google Scholar
- William S. Peebles, John Peebles, Jun-Yan Zhu, Alexei A. Efros, and Antonio Torralba. 2020. The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement. In ECCV. 581--597.Google Scholar
- Justin N. M. Pinkney and Doron Adler. 2020. Resolution Dependent GAN Interpolation for Controllable Image Synthesis Between Domains. CoRR, Vol. abs/2010.05334 (2020).Google Scholar
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI. 234--241.Google Scholar
- Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning a Generative Model from a Single Natural Image. In ICCV.Google Scholar
- Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020 a. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR. 9240--9249.Google Scholar
- Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020 b. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE TPAMI (2020).Google Scholar
- Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. CoRR, Vol. abs/2007.06600 (2020).Google Scholar
- Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. 2019. InGAN: Capturing and Retargeting the "DNA" of a Natural Image. In ICCV.Google Scholar
- Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. 2019. Whitening and Coloring Batch Transform for GANs. In ICLR.Google Scholar
- Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempitsky. 2020. Deep Image Prior. IJCV, Vol. 128, 7 (2020), 1867--1888.Google ScholarCross Ref
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In CVPR. 8798--8807.Google Scholar
- Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling Deep Image Synthesis With Texture Patches. In CVPR. 8456--8465.Google Scholar
- Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, and Jung-Woo Ha. 2019. Photorealistic Style Transfer via Wavelet Transforms. In ICCV.Google Scholar
- Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. CoRR, Vol. abs/1506.03365 (2015).Google Scholar
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In CVPR. 5505--5514.Google Scholar
- Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2019. Free-Form Image Inpainting With Gated Convolution. In ICCV. 4470--4479.Google Scholar
- Yang Zhou, Zhen Zhu, Xiang Bai, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2018. Non-stationary texture synthesis by adversarial expansion. ACM Trans. Graph., Vol. 37, 4 (2018), 49:1--49:13. Google ScholarDigital Library
- Jun-Yan Zhu, Philipp Kr"a henbü hl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV. 597--613.Google Scholar
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017a. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In ICCV. 2242--2251.Google Scholar
- Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward Multimodal Image-to-Image Translation. In NeurIPS. 465--476. Google ScholarDigital Library
- Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020 a. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. In CVPR.Google Scholar
- Zhen Zhu, Zhiliang Xu, Ansheng You, and Xiang Bai. 2020 b. Semantically Multi-modal Image Synthesis. In CVPR.Google Scholar
Index Terms
- Towards Controllable and Photorealistic Region-wise Image Manipulation
Recommendations
Image region description using orthogonal combination of local binary patterns enhanced with color information
Visual content description is a key issue for machine-based image analysis and understanding. A good visual descriptor should be both discriminative and computationally efficient while possessing some properties of robustness to viewpoint changes and ...
Discriminative image hashing based on region of interest
MMM'10: Proceedings of the 16th international conference on Advances in Multimedia ModelingIn this paper, we propose a discriminative image hashing scheme based on Region of Interest (ROI) in order to increase the discriminative capability under image content modifications, while the robustness to content preserving operations is also ...
Region-based automatic web image selection
MIR '10: Proceedings of the international conference on Multimedia information retrievalWe propose a new Web image selection method which employs the region-based bag-of-features representation. The contribution of this work is (1) to introduce the region-based bag-of-features representation into an Web image selection task where training ...
Comments