Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition

Ji, Wen; He, Kelei; Huo, Jing; Gu, Zheng; Gao, Yang

doi:10.1007/978-3-030-58598-3_2

Wen Ji¹²,
Kelei He^13,14,
Jing Huo¹²,
Zheng Gu¹² &
…
Yang Gao^12,14

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12353))

Included in the following conference series:

European Conference on Computer Vision

3657 Accesses
4 Citations

Abstract

Caricature attributes provide distinctive facial features to help research in Psychology and Neuroscience. However, unlike the facial photo attribute datasets that have a quantity of annotated images, the annotations of caricature attributes are rare. To facility the research in attribute learning of caricatures, we propose a caricature attribute dataset, namely WebCariA. Moreover, to utilize models that trained by face attributes, we propose a novel unsupervised domain adaptation framework for cross-modality (i.e., photos to caricatures) attribute recognition, with an integrated inter- and intra-domain consistency learning scheme. Specifically, the inter-domain consistency learning scheme consisting an image-to-image translator to first fill the domain gap between photos and caricatures by generating intermediate image samples, and a label consistency learning module to align their semantic information. The intra-domain consistency learning scheme integrates the common feature consistency learning module with a novel attribute-aware attention-consistency learning module for a more efficient alignment. We did an extensive ablation study to show the effectiveness of the proposed method. And the proposed method also outperforms the state-of-the-art methods by a margin. The implementation of the proposed method is available at https://github.com/KeleiHe/DAAN.

W. Ji and K. He—These authors contributed equally as co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://cs.nju.edu.cn/huojing/WebCariA.htm.

References

Abaci, B., Akgul, T.: Matching caricatures to photographs. Signal Image Video Process. 9(1), 295–303 (2015). https://doi.org/10.1007/s11760-015-0819-8
Article Google Scholar
Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimed. 17(11), 1949–1959 (2015)
Article Google Scholar
Brennan, S.E.: Caricature generator: the dynamic exaggeration of faces by computer. Leonardo 40(4), 392–400 (2007)
Article Google Scholar
Cao, K., Liao, J., Yuan, L.: Carigans: unpaired photo-to-caricature translation. ACM Trans. Graph. 37(6), 244 (2018)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Csurka, G.: Domain adaptation for visual applications: a comprehensive survey. arXiv preprint arXiv:1702.05374 (2017)
Ding, H., Zhou, H., Zhou, S.K., Chellappa, R.: A deep cascade network for unaligned face attribute classification. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ehrlich, M., Shields, T.J., Almaev, T., Amer, M.R.: Facial attributes classification using multi-task representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 47–55 (2016)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)
Google Scholar
Geng, X., Yin, C., Zhou, Z.H.: Facial age estimation by learning from label distributions. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2401–2412 (2013)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, K., Wang, Z., Fu, Y., Feng, R., Jiang, Y.G., Xue, X.: Adaptively weighted multi-task deep network for person attribute classification. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1636–1644. ACM (2017)
Google Scholar
Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning (2018)
Google Scholar
Huo, J., Li, W., Shi, Y., Gao, Y., Yin, H.: Webcaricature: a benchmark for caricature recognition. arXiv preprint arXiv:1703.03230 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Jacob, L., Philippe Vert, J., Bach, F.R.: Clustered multi-task learning: a convex formulation. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 745–752. Curran Associates, Inc. (2009). http://papers.nips.cc/paper/3499-clustered-multi-task-learning-a-convex-formulation.pdf
Kim, J., Kim, M., Kang, H., Lee, K.H.: U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: International Conference on Learning Representations (2019)
Google Scholar
Klare, B.F., Bucak, S.S., Jain, A.K., Akgul, T.: Towards automated caricature recognition. In: 2012 5th IAPR International Conference on Biometrics (ICB), pp. 139–146. IEEE (2012)
Google Scholar
Kumar, A., Daume III, H.: Learning task grouping and overlap in multi-task learning. In: ICML (2012)
Google Scholar
Lee, S., Kim, D., Kim, N., Jeong, S.G.: Drop to adapt: learning discriminative features for unsupervised domain adaptation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 91–100 (2019)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)
Google Scholar
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp. 97–105 (2015)
Google Scholar
Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5334–5343 (2017)
Google Scholar
Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2480–2487. IEEE (2012)
Google Scholar
Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2864–2871 (2013)
Google Scholar
Mauro, R., Kubovy, M.: Caricature and face recognition. Mem. Cogn. 20(4), 433–440 (1992)
Article Google Scholar
Perkins, D.: A definition of caricature and caricature and recognition. Stud. Vis. Commun. 2(1), 1–24 (1975)
Google Scholar
Rudd, E.M., Günther, M., Boult, T.E.: MOON: a mixed objective optimization network for the recognition of facial attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 19–35. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_2
Chapter Google Scholar
Russo, P., Carlucci, F.M., Tommasi, T., Caputo, B.: From source to target and back: symmetric bi-directional adaptive GAN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2018)
Google Scholar
Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning. In: Advances in Neural Information Processing Systems, pp. 4424–4434 (2017)
Google Scholar
Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)
Google Scholar
Valentine, T., Lewis, M.B., Hills, P.J.: Face-space: a unifying concept in face recognition research. Quart. J. Exp. Psychol. 69(10), 1996–2019 (2016)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Vázquez, D., López, A.M., Ponsa, D.: Unsupervised domain adaptation of virtual and real worlds for pedestrian detection. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3492–3495. IEEE (2012)
Google Scholar
Wang, X., Guo, R., Kambhamettu, C.: Deeply-learned feature for age estimation. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 534–541. IEEE (2015)
Google Scholar
Wang, Z., He, K., Fu, Y., Feng, R., Jiang, Y.G., Xue, X.: Multi-task deep neural network for joint face recognition and facial attribute prediction. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 365–374. ACM (2017)
Google Scholar
Zhang, Y., Shen, W., Sun, L., Li, Q.: Position-squeeze and excitation module for facial attribute analysis. In: BMVC (2018)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar
Zhu, Z., Luo, P., Wang, X., Tang, X.: Multi-view perceptron: a deep model for learning face identity and view representations. In: Advances in Neural Information Processing Systems, pp. 217–225 (2014)
Google Scholar

Download references

Acknowledgement

This work is supported in part by National Science Foundation of China under Grant No. 61806092, and in part by Jiangsu Natural Science Foundation under Grant No. BK20180326.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing, China
Wen Ji, Jing Huo, Zheng Gu & Yang Gao
Medical School of Nanjing University, Nanjing, China
Kelei He
National Institute of Healthcare Data Science at Nanjing University, Nanjing, China
Kelei He & Yang Gao

Authors

Wen Ji
View author publications
You can also search for this author in PubMed Google Scholar
Kelei He
View author publications
You can also search for this author in PubMed Google Scholar
Jing Huo
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kelei He or Jing Huo .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 862 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, W., He, K., Huo, J., Gu, Z., Gao, Y. (2020). Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-58598-3_2
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58597-6
Online ISBN: 978-3-030-58598-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics