CoGS: Controllable Generation and Search from Sketch and Style

Ham, Cusuh; Tarrés, Gemma Canet; Bui, Tu; Hays, James; Lin, Zhe; Collomosse, John

doi:10.1007/978-3-031-19787-1_36

Cusuh Ham¹²,
Gemma Canet Tarrés¹³,
Tu Bui¹³,
James Hays¹²,
Zhe Lin¹⁴ &
…
John Collomosse^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

European Conference on Computer Vision

2182 Accesses
4 Citations

Abstract

We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar “style” conditioning image to a transformer-based sketch and style encoder to generate a discrete codebook representation. We map the codebook representation into a metric space, enabling fine-grained control over selection and interpolation between multiple synthesis options before generating the image via a vector quantized GAN (VQGAN) decoder. Our framework thereby unifies search and synthesis tasks, in that a sketch and style pair may be used to run an initial synthesis which may be refined via combination with similar results in a search corpus to produce an image more closely matching the user’s intent. We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.

C. Ham and G. C. Tarres—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ashual, O., Wolf, L.: Specifying object attributes and relations in interactive scene generation. In: Proceedings of the CVPR (2019)
Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Article Google Scholar
Barnes, C., Zhang, F.-L.: A survey of the state-of-the-art in patch-based synthesis. Comput. Visual Media 3(1), 3–20 (2016). https://doi.org/10.1007/s41095-016-0064-2
Article Google Scholar
Bui, T., Ribeiro, L., Collomosse, J., Ponti, M.: Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput. Graph. 71, 77–87 (2018)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Casanova, A., Careil, M., Verbeek, J., Drozdzal, M., Romero-Soriano, A.: Instance-conditioned gan. arXiv preprint arXiv:2109.05070 (2021)
Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: Internet image montage. Proc ACM SIGGRAPH 28(5), 124 (2009)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR, 13–18 July 2020. https://proceedings.mlr.press/v119/chen20j.html
Chen, W., Hays, J.: SketchyGAN: towards diverse and realistic sketch to image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, June 2016
Google Scholar
Collomosse, J., Bui, T., Wilber, M., Fang, C., Jin, H.: Sketching with style: Visual search with sketches and aesthetic context. In: Proceedings of the ICCV (2017)
Google Scholar
Collomosse, J.P., McNeill, G., Watts, L.: Free-hand sketch grouping for video retrieval. In: Proceedings of the ICPR (2008)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Efros, A., Freeman, W.: Image quilting for texture synthesis and transfer. In: Proceedings of the SIGGRAPH (2001)
Google Scholar
Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31(4), 44:1–44:10 (2012)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis (2020)
Google Scholar
Gao, C., Liu, Q., Xu, Q., Wang, L., Liu, J., Zou, C.: SketchyCOCO: image generation from freehand scene sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
Google Scholar
Gao, H., Chen, Z., Huang, B., Chen, J., Li, Z.: Image super-resolution based on conditional generative adversarial network. IET Image Proc. 14(13), 3006–3013 (2020)
Article Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016). https://doi.org/10.1109/CVPR.2016.265
Ghosh, A., et al.: Interactive sketch & fill: multiclass sketch-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar
Gucluturk, Y., Guclu, U., van Lier, R., van Gerven, M.A.: Convolutional sketch inversion. In: Proceedings of the ECCV Workshop on Vision and Art (VISART) (2016)
Google Scholar
Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. In: Conference: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Hays, J., Efros, A.A.: Scene completion using millions of photographs. ACM Trans. Graph. 26(3), 4 (2007)
Article Google Scholar
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the ACM SIGGRAPH. pp. 327–340 (2001)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bklr3j0cKX
Hospedales, T., Song, Y.Z.: Sketch me that shoe. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), January 2016
Google Scholar
Huang, X., Mallya, A., Wang, T.C., Liu, M.Y.: Multimodal conditional image synthesis with product-of-experts GANs (2021)
Google Scholar
Hwang, J., Oh, S.W., Lee, J., Han, B.: Exemplar-based open-set panoptic segmentation network. CoRR abs/2105.08336 (2021). https://arxiv.org/abs/2105.08336
Hénaff, O.J., Razavi, A., Doersch, C., Eslami, S.M.A., Oord, A.v.d.: Data-efficient image recognition with contrastive predictive coding (2019). https://arxiv.org/abs/1905.09272, cite arxiv:1905.09272
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. (Proc. of SIGGRAPH 2016) 35(6) (2016)
Google Scholar
Inoue, N., Ito, D., Xu, N., Yang, J., Price, B., Yamasaki, T.: Learning to trace: expressive line drawing generation from photographs. Comput. Graph. Forum 38(7), 69–80 (2019)
Google Scholar
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
Johnson, J., Gupta, A., Fei-Fei, L.: Image synthesis from reconfigurable layout and style. In: Proceedings of the CVPR (2018)
Google Scholar
Jongejan, J., Rowley, H., Kawashima, T., Kim, J., Fox-Gieg, N.: The quick, draw! A.I. experiment (2016). https://quickdraw.withgoogle.com/
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. ArXiv e-prints, December 2013
Google Scholar
Lu, Y., Wu, S., Tai, Y.W., Tang, C.K.: Image generation from sketch constraint using contextual GAN. In: The European Conference on Computer Vision (ECCV), September 2018
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021)
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: Proceedings ICML (2016)
Google Scholar
Ribeiro, L., Bui, T., Collomosse, J., Ponti, M.: Scene designer: a unified model for scene search and synthesis from sketch. In: Proceedings of CVPRW on Sketch and Human Expressivity (SHE) (2021)
Google Scholar
Ribeiro, L.S.F., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: Proceedings of CVPR (2020)
Google Scholar
Ruta, D., et al.: Aladin: all layer adaptive instance normalization for fine-grained style similarity. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11906–11915 (2021)
Google Scholar
Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. 35(4), 119 (2016)
Article Google Scholar
Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: Learning to retrieve badly drawn bunnies. ACM Trans. Graph. 35(4) (2016). https://doi.org/10.1145/2897824.2925954, https://doi.org/10.1145/2897824.2925954
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, J., Song, Y.Z., Xiang, T., Hospedales, T., Ruan, X.: Deep multi-task attribute-driven ranking for fine-grained sketch-based image retrieval. In: British Machine Vision Conference (2016)
Google Scholar
Song, J., Yu, Q., Song, Y.Z., Xiang, T., Hospedales, T.M.: Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: Proceedings of CVPR (2019)
Google Scholar
Sylvain, T., Zhang, P., Bengio, Y., Hjelm, D., Sharma, S.: Object-centric image generation from layouts. arXiv preprint arXiv:2003.07449 (2020)
Tang, H., Liu, H., Xu, D., Torr, P., Sebe, N.: Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. arXiv preprint arXiv:1911.11897 (2019)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR abs/1906.05849 (2019). https://arxiv.org/abs/1906.05849
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Wexler, Y., Shechtman, E., Irani, M.: Space-time video completion. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. vol. 1, pp. I-I. IEEE (2004)
Google Scholar
Xian, W., et al.: TextureGAN: controlling deep image synthesis with texture patches. arXiv preprint arXiv:1706.02823 (2017)
Xue, Y., Guo, Y.-C., Zhang, H., Xu, T., Zhang, S.-H., Huang, X.: Deep image synthesis from intuitive user input: a review and perspectives. Comput. Visual Media 8(1), 3–31 (2021). https://doi.org/10.1007/s41095-021-0234-8
Article Google Scholar
Yang, Y., Hossain, M.Z., Gedeon, T., Rahman, S.: S2FGAN: semantically aware interactive sketch-to-face translation. arXiv preprint arXiv:2011.14785 (2020)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595 (2018)
Google Scholar
Zhao, B., Meng, L., Yin, W., Sigal, L.: Image generation from layout. In: Proceedings of CVPR (2019)
Google Scholar
Zhou, X., et al.: Full-resolution correspondence learning for image translation. CoRR abs/2012.02047 (2020). https://arxiv.org/abs/2012.02047
Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Proceedings of ECCV (2016)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)

Download references

Author information

Authors and Affiliations

Georgia Institute of Technology, Atlanta, USA
Cusuh Ham & James Hays
University of Surrey, Guildford, UK
Gemma Canet Tarrés, Tu Bui & John Collomosse
Adobe Inc., San Jose, USA
Zhe Lin & John Collomosse

Authors

Cusuh Ham
View author publications
You can also search for this author in PubMed Google Scholar
Gemma Canet Tarrés
View author publications
You can also search for this author in PubMed Google Scholar
Tu Bui
View author publications
You can also search for this author in PubMed Google Scholar
James Hays
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Lin
View author publications
You can also search for this author in PubMed Google Scholar
John Collomosse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cusuh Ham .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 19572 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ham, C., Tarrés, G.C., Bui, T., Hays, J., Lin, Z., Collomosse, J. (2022). CoGS: Controllable Generation and Search from Sketch and Style. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-19787-1_36
Published: 21 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19786-4
Online ISBN: 978-3-031-19787-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoGS: Controllable Generation and Search from Sketch and Style