Skip to main content
Log in

A hierarchical residual network with compact triplet-center loss for sketch recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the widespread use of touch-screen devices, it is more and more convenient for people to draw sketches on screen. This results in the demand for automatically understanding the sketches. Thus, the sketch recognition task becomes more significant than before. To accomplish this task, it is necessary to solve the critical issue of improving the distinction of the sketch features. To this end, we have made efforts in three aspects. First, a novel multi-scale residual block is designed. Compared with the conventional basic residual block, it can better perceive multi-scale information and reduce the number of parameters during training. Second, a hierarchical residual structure is built by stacking multi-scale residual blocks in a specific way. In contrast with the single-level residual structure, the learned features from this structure are more sufficient. Last but not least, the compact triplet-center loss is proposed specifically for the sketch recognition task. It can solve the problem that the triplet-center loss does not fully consider too large intra-class space and too small inter-class space in sketch field. By studying the above modules, a hierarchical residual network as a whole is proposed for sketch recognition and evaluated on Tu-Berlin benchmark thoroughly. The experimental results show that the proposed network outperforms most of baseline methods and it is excellent among non-sequential models at present.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Chang J, Lan Z, Cheng C et al (2020) Data uncertainty learning in face recognition[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 5709–5718.

  2. Chen J, Qin J, Liu L et al (2019) Deep sketch-shape hashing with segmented 3D stochastic viewing[C]. In: IEEE conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 791–800

  3. Eitz M, Haysy J, Alexa M (2012) How do humans sketch objects?[J]. ACM Trans Graph 31(4):1–10

    Google Scholar 

  4. He KM, Zhang XY, Ren SQ, et al (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA, pp. 770–778

  5. He JY, Wu X, Jiang YG et al (2017) Sketch recognition with deep visual-sequential fusion model[C]. In: Proceedings of the ACM multimedia conference. Mountain View, CA, USA, pp. 448–456

  6. He X, Zhou Y, Zhou Z et al (2018) Triplet-center loss for multi-view 3D object retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 1945–1954

  7. Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks[C]. In: IEEE conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 2261–2269

  8. Huang G, Chen DL, Li TH et al (2018) Multi-scale dense networks for resource efficient image classification[C]. In: International conference on learning representations. Vancouver, BC, Canada, pp. 4700–4708

  9. Ju D, Zhang PP, Wang D et al (2019) Video person re-identification by temporal residual learning[J]. IEEE Trans Image Process 28(3):1366–1377

    Article  MathSciNet  Google Scholar 

  10. Klare B, Li Z, Jain AK (2011) Matching forensic sketches to mug shot photos[J]. IEEE Trans Pattern Anal Mach Intell 33(3):639–646

    Article  Google Scholar 

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks[C]. In: Annual conference on neural information processing systems. Lake Tahoe, NV, USA, pp. 1106–1114

  12. Laurens VDM, Hinton G (2008) Visualizing Data using t-SNE[J]. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  13. Li Y, Hospedales TM, Song YZ et al (2015) Free-hand sketch recognition by multi-kernel feature learning[J]. Comput Vis Image Underst 137:1–11

    Article  Google Scholar 

  14. Lin H, Fu Y, Jiang YG et al (2020) Sketch-BERT: learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 6757–6766.

  15. Ouyang S, Hospedales TM, Song YZ et al (2020) Forgetmenot: memory-aware forensic facial sketch matching[C]. In IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, pp. 5571–5579

  16. Pang K, Li K, Yang Y et al (2019) Generalising fine-grained sketch-based image retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 677–686

  17. Pang K, Yang Y, Hospedales TM et al (2020) Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 10344–10352

  18. Pang Y, Zhao X, Zhang L et al (2020) Multi-scale interactive network for salient object detection[C]. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 9410–9419

  19. Qiu HQ, Li HL, Wu QB et al (2020) Offset bin classification network for accurate object detection[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 13185–13194.

  20. Schneider RG, Tuytelaarsy T (2014) Sketch classification and classification-driven analysis using fisher vectors[J]. ACM Trans Graph 33(6):174.1–174.9

    Article  Google Scholar 

  21. Sert M, Boyaci E (2019) Sketch recognition using transfer learning[J]. Multimed Tools Appl 78:17095–17112

    Article  Google Scholar 

  22. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition[C]. In: International conference on learning representations. San Diego, CA, USA, pp. 1–14

  23. Srinivas A, Lin TY, Parmar NK et al (2021) Bottleneck transformers for visual recognition[C]. In: IEEE Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, pp. 16514–16524

  24. Szegedy A, Liu W, Jia Y, et al (2015) Going deeper with convolutions[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp. 1–9

  25. Tan MX, Le QV. (2019) EfficientNet: rethinking model scaling for convolutional neural networks[C]. In: International conference on machine learning. Long Beach, CA, United states, pp. 10691–10700

  26. Wang M, Deng W (2020) Mitigating bias in face recognition using skewness-aware reinforcement learning[C]. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 9319–9328

  27. Wang F, Lang L, Li Y. (2015) Sketch-based 3d shape retrieval using convolutional neural networks[C]. In: IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp. 1875–1883

  28. Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification[C]. In: IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA, pp. 6450–6458

  29. Xu P, Huang YY, Yuan TT et al (2018) SketchMate: deep hashing for million-scale human sketch retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 8090–8098

  30. Yang L, Sain A, Li LP et al (2020) S3net: graph representational network for sketch recognition[C]. In: IEEE international conference on multimedia and expo. London, United kingdom, pp. 1–6

  31. Yeh CH, Huang CH, Kang LW Multi-scale deep residual learning-based single image haze removal via image decomposition[J]. IEEE Trans Image Process 29(12):3153–3167

  32. Yu Q, Liu F, Song YZ et al (2016) Sketch me that shoe[C]. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 799–807

  33. Yu Q, Yang Y, Liu F et al (2017) Sketch-a-net: a deep neural network that beats humans[J]. Int J Comput Vis 122(3):411–425

    Article  MathSciNet  Google Scholar 

  34. Zagoruyko S, Komodakis N (2016) Wide residual networks[C]. In: British machine vision conference. UK, 87.1–87.12

  35. Zhang H, Liu S, Zhang C et al (2016) SketchNet: sketch classification with web images[C]. In: IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA, pp. 1105–1113

  36. Zhang L, Jiao LC, Ma WP et al (2020) PolSAR image classification based on multi-scale stacked sparse autoencoder[J]. Neurocomputing 351:167–179

    Article  Google Scholar 

  37. Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection[C]. In: IEEE computer society conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 3080–3089

  38. Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network[C]. In: Proceedings of IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA, pp. 6230–6239.

  39. Zheng Y, Yao HX, Sun XS et al (2021) Sketch-specific data augmentation for freehand sketch recognition[J]. Neurocomputing. 456:528–539. https://doi.org/10.1016/j.neucom.2020.05.124

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported partly by the National Natural Science Foundation of China (No. 61379065) and the Natural Science Foundation of Hebei province in China (No. F2019203285).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shihui Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Zhang, S., He, H. et al. A hierarchical residual network with compact triplet-center loss for sketch recognition. Multimed Tools Appl 81, 15879–15899 (2022). https://doi.org/10.1007/s11042-022-12431-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12431-z

Keywords

Navigation