Abstract
We present Mask-guided Generative Adversarial Network (MagGAN) for high-resolution face attribute editing, in which semantic facial masks from a pre-trained face parser are used to guide the fine-grained image editing process. With the introduction of a mask-guided reconstruction loss, MagGAN learns to only edit the facial parts that are relevant to the desired attribute changes, while preserving the attribute-irrelevant regions (e.g., hat, scarf for modification ‘To Bald’). Further, a novel mask-guided conditioning strategy is introduced to incorporate the influence region of each attribute change into the generator. In addition, a multi-level patch-wise discriminator structure is proposed to scale our model for high-resolution (\(1024 \times 1024\)) face editing. Experiments on the CelebA benchmark show that the proposed method significantly outperforms prior state-of-the-art approaches in terms of both image quality and editing performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We use \(\mathbf {att} \in \mathbb {R}^{C}\) to denote attributes without spatial dimension and \(\mathbf {Att} \in \mathbb {R}^{C\times H\times W}\) for attributes with spatial dimensions.
- 3.
STGAN: https://github.com/csmliu/STGAN.
- 4.
We pretrained an Inception-V3 model that achieves 92.69% average attribute classification accuracy on all 40 attributes of CelebA dataset.
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: NIPS, pp. 2172–2180 (2016)
Chen, Y.C., Shen, X., Lin, Z., Lu, X., Pao, I.M., Jia, J.: Semantic component decomposition for face attribute manipulation. In: CVPR (2019)
Chen, Y.C., Xu, X., Tian, Z., Jia, J.: Homomorphic latent space interpolation for unpaired image-to-image translation. In: CVPR (2019)
Choi, Y., Choi, M., Kim, M., Ha, J., Kim, S., Choo, J.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, pp. 8789–8797 (2018)
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Gu, S., Bao, J., Yang, H., Chen, D., Wen, F., Yuan, L.: Mask-guided portrait editing with conditional gans. In: CVPR (2019)
He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: Attgan: facial attribute editing by only changing what you want. IEEE Trans. Image Process. 28(11), 5464–5478 (2019)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. arXiv preprint arXiv:1706.08500 (2017)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR.abs/1812.04948 (2018)
Klys, J., Snell, J., Zemel, R.S.: Learning latent subspaces in variational autoencoders. In: NeurIPS, pp. 6445–6455 (2018)
Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., Ranzato, M.: Fader networks: manipulating images by sliding attributes. In: NIPS, pp. 5969–5978 (2017)
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: ICML, pp. 1558–1566 (2016)
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_49
Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: towards diverse and interactive facial image manipulation. arXiv preprint arXiv:1907.11922 (2019)
Li, H., Dong, W., Hu, B.: Facial image attributes transformation via conditional recycle generative adversarial networks. J. Comput. Sci. Technol. 33(3), 511–521 (2018)
Li, M., Zuo, W., Zhang, D.: Deep identity-aware transfer of facial attributes. CoRR abs/1610.05586 (2016)
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: CVPR (2019)
Liang, X., et al.: Human parsing with contextualized convolutional neural network. In: ICCV (2015)
Liu, M., et al.: STGAN: a unified selective transfer network for arbitrary image attribute editing. In: CVPR, pp. 3673–3682 (2019)
Liu, M., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NIPS, pp. 700–708 (2017)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Lu, Y., Tai, Y., Tang, C.: Attribute-guided face generation using conditional cyclegan. In: ECCV, pp. 293–308 (2018)
Ma, L., Jia, X., Georgoulis, S., Tuytelaars, T., Gool, L.V.: Exemplar guided unsupervised image-to-image translation. CoRR abs/1805.11145 (2018)
Park, T., Liu, M., Wang, T., Zhu, J.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR (2019)
Perarnau, G., van de Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional gans for image editing. CoRR abs/1611.06355 (2016)
Shen, W., Liu, R.: Learning residual images for face attribute manipulation. In: CVPR, pp. 1225–1233 (2017)
Wang, Y., Wang, S., Qi, G., Tang, J., Li, B.: Weakly supervised facial attribute manipulation via deep adversarial network. In: WACV, pp. 112–121 (2018)
Xiao, T., Hong, J., Ma, J.: ELEGANT: exchanging latent encodings with GAN for transferring multiple face attributes. In: ECCV, pp. 172–187 (2018)
Xie, D., Yang, M., Deng, C., Liu, W., Tao, D.: Fully-featured attribute transfer. CoRR abs/1902.06258 (2019)
Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2Image: conditional image generation from visual attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 776–791. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_47
Yin, W., Liu, Z., Loy, C.C.: Instance-level facial attributes transfer with geometry-aware flow. CoRR abs/1811.12670 (2018)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Bisenet: bilateral segmentation network for real-time semantic segmentation. In: ECCV (2018)
Zhang, G., Kan, M., Shan, S., Chen, X.: Generative adversarial network with spatial attention for face attribute editing. In: ECCV, pp. 422–437 (2018)
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Zhang, J., et al.: Sparsely grouped multi-task generative adversarial networks for facial attribute manipulation. In: ACM MM, pp. 392–401 (2018)
Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: CVPR, pp. 4352–4360 (2017)
Zheng, X., Guo, Y., Huang, H., Li, Y., He, R.: A survey to deep facial attribute analysis. CoRR abs/1812.10265 (2018)
Zhou, S., Xiao, T., Yang, Y., Feng, D., He, Q., He, W.: Genegan: learning object transfiguration and object subspace from unpaired data. In: BMVC (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, Y. et al. (2021). MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12625. Springer, Cham. https://doi.org/10.1007/978-3-030-69538-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-69538-5_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69537-8
Online ISBN: 978-3-030-69538-5
eBook Packages: Computer ScienceComputer Science (R0)