Latent Style: multi-style image transfer via latent style coding and skip connection

Hu, Jingfei; Wu, Guang; Wang, Hua; Zhang, Jicong

doi:10.1007/s11760-021-01940-3

Latent Style: multi-style image transfer via latent style coding and skip connection

Original Paper
Published: 22 September 2021

Volume 16, pages 359–368, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jingfei Hu^1,2,3,4,5,
Guang Wu²,
Hua Wang^1,2,3,4,5 &
…
Jicong Zhang^1,2,3,4,5

401 Accesses
1 Altmetric
Explore all metrics

Abstract

Unsupervised multi-style image translation is an important and challenging study in the task of image translation. The translation relations between interrelated images should be analyzed from multiple angles as these relations are not merely unidirectional or based on a single factor. Multi-style image translation algorithms have recently emerged to establish a multifaceted relationship between coupled images and interpret their features, which can fully express the content and semantic information of these images. One key algorithm, the multimodal unsupervised image-to-image translation (MUNIT), achieves reasonable unsupervised translation, but its image style representation is random noise, which leads to suboptimal multi-style representation. In order to achieve better multi-style image translation, we propose an improved MUNIT scheme equipped with style coding, skip connection, and a self-attention mechanism. The proposed scheme pays more attention to image style coding as well as the global and detailed image information. Through extensive experimental comparisons with state-of-the-art methods on various image translation tasks, the advantages of this scheme are demonstrated qualitatively and quantitatively. The code and tutorials have already released at https://github.com/huawang123/LatentStyle.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

References

Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Thies, J., Zollhöfer, M., Theobalt, C., Stamminger, M., Nießner, M.: Headon: real-time reenactment of human portrait videos. ACM Trans. Graph. TOG 37(4), 164 (2018)
Google Scholar
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Liu, M., Ding, Y., Xia, M., Liu, X., Ding, E., Zuo, W., Wen, S.: STGAN: a unified selective transfer network for arbitrary image attribute editing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3673–3682 (2019)
Bilinski, P., Prisacariu, V.: Dense decoder shortcut connections for single-pass semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6596–6605 (2018)
Tang, H., Xu, D., Sebe, N., Wang, Y., Corso, J.J., Yan, Y.: Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2417–2426 (2019)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711. Springer (2016)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613. Springer (2016)
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)
Ji, Y., Zhang, H., Wu, Q.J.: Saliency detection via conditional adversarial image-to-image network. Neurocomputing 316, 357–368 (2018)
Article Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., Choo, J.: STARGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789–8797 (2018)
Benaim, S., Wolf, L.: One-sided unsupervised domain mapping. In: Advances in Neural Information Processing Systems, pp. 752–762 (2017)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Lee, D., Kim, J., Moon, W.-J., Ye, J.C.: COLLAGAN: collaborative GAN for missing image data imputation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2487–2496 (2019)
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv:1805.08318 (2018)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
Gao, X., Deng, F., Yue, X.: Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty. Neurocomputing (2019)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv:1802.05957 (2018)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784 (2014)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. JMLR. org, pp. 1857–1865 (2017)
Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Machine Learning, pp. 1558–1566 (2016)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: INFOGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Liu, M.-Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)
Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4990–4998 (2017)
Zhang, L., Ji, Y., Lin, X., Liu, C.: Style transfer for anime sketches with enhanced residual U-NET and auxiliary classifier GAN. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR). IEEE, pp. 506–511 (2017)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer pp. 234–241 (2015)
Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. arXiv:1412.6980 (2014)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Bińkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD GANs. arXiv:1801.01401 (2018)
Kim, J., Kim, M., Kang, H., Lee, K.H.: U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=BJlZ5ySKPH
Lee, H.-Y., Tseng, H.-Y., Huang, J.-B., Singh, M., Yang, M.-H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 35–51 (2018)

Download references

Author information

Authors and Affiliations

School of Biological Science and Medical Engineering, Beihang University, Beijing, China
Jingfei Hu, Hua Wang & Jicong Zhang
Hefei Innovation Research Institute, Beihang University, Hefei, China
Jingfei Hu, Guang Wu, Hua Wang & Jicong Zhang
Beijing Advanced Innovation Centre for Biomedical Engineering, Beihang University, Beijing, China
Jingfei Hu, Hua Wang & Jicong Zhang
Beijing Advanced Innovation Centre for Big Data-Based Precision Medicine, Beihang University, Beijing, China
Jingfei Hu, Hua Wang & Jicong Zhang
School of Biomedical Engineering, Anhui Medical University, Hefei, China
Jingfei Hu, Hua Wang & Jicong Zhang

Authors

Jingfei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Guang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jicong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jicong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, J., Wu, G., Wang, H. et al. Latent Style: multi-style image transfer via latent style coding and skip connection. SIViP 16, 359–368 (2022). https://doi.org/10.1007/s11760-021-01940-3

Download citation

Received: 10 October 2020
Revised: 08 March 2021
Accepted: 17 May 2021
Published: 22 September 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11760-021-01940-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Style: multi-style image transfer via latent style coding and skip connection

Abstract

Access this article

Similar content being viewed by others

Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Latent Style: multi-style image transfer via latent style coding and skip connection

Abstract

Access this article

Similar content being viewed by others

Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation