Abstract
With the widespread use of touch-screen devices, it is more and more convenient for people to draw sketches on screen. This results in the demand for automatically understanding the sketches. Thus, the sketch recognition task becomes more significant than before. To accomplish this task, it is necessary to solve the critical issue of improving the distinction of the sketch features. To this end, we have made efforts in three aspects. First, a novel multi-scale residual block is designed. Compared with the conventional basic residual block, it can better perceive multi-scale information and reduce the number of parameters during training. Second, a hierarchical residual structure is built by stacking multi-scale residual blocks in a specific way. In contrast with the single-level residual structure, the learned features from this structure are more sufficient. Last but not least, the compact triplet-center loss is proposed specifically for the sketch recognition task. It can solve the problem that the triplet-center loss does not fully consider too large intra-class space and too small inter-class space in sketch field. By studying the above modules, a hierarchical residual network as a whole is proposed for sketch recognition and evaluated on Tu-Berlin benchmark thoroughly. The experimental results show that the proposed network outperforms most of baseline methods and it is excellent among non-sequential models at present.
Similar content being viewed by others
References
Chang J, Lan Z, Cheng C et al (2020) Data uncertainty learning in face recognition[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 5709–5718.
Chen J, Qin J, Liu L et al (2019) Deep sketch-shape hashing with segmented 3D stochastic viewing[C]. In: IEEE conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 791–800
Eitz M, Haysy J, Alexa M (2012) How do humans sketch objects?[J]. ACM Trans Graph 31(4):1–10
He KM, Zhang XY, Ren SQ, et al (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA, pp. 770–778
He JY, Wu X, Jiang YG et al (2017) Sketch recognition with deep visual-sequential fusion model[C]. In: Proceedings of the ACM multimedia conference. Mountain View, CA, USA, pp. 448–456
He X, Zhou Y, Zhou Z et al (2018) Triplet-center loss for multi-view 3D object retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 1945–1954
Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks[C]. In: IEEE conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 2261–2269
Huang G, Chen DL, Li TH et al (2018) Multi-scale dense networks for resource efficient image classification[C]. In: International conference on learning representations. Vancouver, BC, Canada, pp. 4700–4708
Ju D, Zhang PP, Wang D et al (2019) Video person re-identification by temporal residual learning[J]. IEEE Trans Image Process 28(3):1366–1377
Klare B, Li Z, Jain AK (2011) Matching forensic sketches to mug shot photos[J]. IEEE Trans Pattern Anal Mach Intell 33(3):639–646
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks[C]. In: Annual conference on neural information processing systems. Lake Tahoe, NV, USA, pp. 1106–1114
Laurens VDM, Hinton G (2008) Visualizing Data using t-SNE[J]. J Mach Learn Res 9:2579–2605
Li Y, Hospedales TM, Song YZ et al (2015) Free-hand sketch recognition by multi-kernel feature learning[J]. Comput Vis Image Underst 137:1–11
Lin H, Fu Y, Jiang YG et al (2020) Sketch-BERT: learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 6757–6766.
Ouyang S, Hospedales TM, Song YZ et al (2020) Forgetmenot: memory-aware forensic facial sketch matching[C]. In IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, pp. 5571–5579
Pang K, Li K, Yang Y et al (2019) Generalising fine-grained sketch-based image retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 677–686
Pang K, Yang Y, Hospedales TM et al (2020) Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 10344–10352
Pang Y, Zhao X, Zhang L et al (2020) Multi-scale interactive network for salient object detection[C]. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 9410–9419
Qiu HQ, Li HL, Wu QB et al (2020) Offset bin classification network for accurate object detection[C]. In: IEEE conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 13185–13194.
Schneider RG, Tuytelaarsy T (2014) Sketch classification and classification-driven analysis using fisher vectors[J]. ACM Trans Graph 33(6):174.1–174.9
Sert M, Boyaci E (2019) Sketch recognition using transfer learning[J]. Multimed Tools Appl 78:17095–17112
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition[C]. In: International conference on learning representations. San Diego, CA, USA, pp. 1–14
Srinivas A, Lin TY, Parmar NK et al (2021) Bottleneck transformers for visual recognition[C]. In: IEEE Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, pp. 16514–16524
Szegedy A, Liu W, Jia Y, et al (2015) Going deeper with convolutions[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp. 1–9
Tan MX, Le QV. (2019) EfficientNet: rethinking model scaling for convolutional neural networks[C]. In: International conference on machine learning. Long Beach, CA, United states, pp. 10691–10700
Wang M, Deng W (2020) Mitigating bias in face recognition using skewness-aware reinforcement learning[C]. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Seattle, WA, USA, pp. 9319–9328
Wang F, Lang L, Li Y. (2015) Sketch-based 3d shape retrieval using convolutional neural networks[C]. In: IEEE conference on computer vision and pattern recognition. Boston, MA, USA, pp. 1875–1883
Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification[C]. In: IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA, pp. 6450–6458
Xu P, Huang YY, Yuan TT et al (2018) SketchMate: deep hashing for million-scale human sketch retrieval[C]. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 8090–8098
Yang L, Sain A, Li LP et al (2020) S3net: graph representational network for sketch recognition[C]. In: IEEE international conference on multimedia and expo. London, United kingdom, pp. 1–6
Yeh CH, Huang CH, Kang LW Multi-scale deep residual learning-based single image haze removal via image decomposition[J]. IEEE Trans Image Process 29(12):3153–3167
Yu Q, Liu F, Song YZ et al (2016) Sketch me that shoe[C]. In: IEEE conference on computer vision and pattern recognition. Salt Lake City, UT, USA, pp. 799–807
Yu Q, Yang Y, Liu F et al (2017) Sketch-a-net: a deep neural network that beats humans[J]. Int J Comput Vis 122(3):411–425
Zagoruyko S, Komodakis N (2016) Wide residual networks[C]. In: British machine vision conference. UK, 87.1–87.12
Zhang H, Liu S, Zhang C et al (2016) SketchNet: sketch classification with web images[C]. In: IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA, pp. 1105–1113
Zhang L, Jiao LC, Ma WP et al (2020) PolSAR image classification based on multi-scale stacked sparse autoencoder[J]. Neurocomputing 351:167–179
Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection[C]. In: IEEE computer society conference on computer vision and pattern recognition. Long Beach, CA, USA, pp. 3080–3089
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network[C]. In: Proceedings of IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA, pp. 6230–6239.
Zheng Y, Yao HX, Sun XS et al (2021) Sketch-specific data augmentation for freehand sketch recognition[J]. Neurocomputing. 456:528–539. https://doi.org/10.1016/j.neucom.2020.05.124
Acknowledgements
This work was supported partly by the National Natural Science Foundation of China (No. 61379065) and the Natural Science Foundation of Hebei province in China (No. F2019203285).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, L., Zhang, S., He, H. et al. A hierarchical residual network with compact triplet-center loss for sketch recognition. Multimed Tools Appl 81, 15879–15899 (2022). https://doi.org/10.1007/s11042-022-12431-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12431-z