SPARK: Spatial-Aware Online Incremental Attack Against Visual Tracking

Guo, Qing; Xie, Xiaofei; Juefei-Xu, Felix; Ma, Lei; Li, Zhongguo; Xue, Wanli; Feng, Wei; Liu, Yang

doi:10.1007/978-3-030-58595-2_13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12370))

Included in the following conference series:

European Conference on Computer Vision

3825 Accesses
26 Citations

Abstract

Adversarial attacks of deep neural networks have been intensively studied on image, audio, and natural language classification tasks. Nevertheless, as a typical while important real-world application, the adversarial attacks of online video tracking that traces an object’s moving trajectory instead of its category are rarely explored. In this paper, we identify a new task for the adversarial attack to visual tracking: online generating imperceptible perturbations that mislead trackers along with an incorrect (Untargeted Attack, UA) or specified trajectory (Targeted Attack, TA). To this end, we first propose a spatial-aware basic attack by adapting existing attack methods, i.e., FGSM, BIM, and C&W, and comprehensively analyze the attacking performance. We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency. To address these challenges, we further propose the spatial-aware online inc remental attac k (a.k.a. SPARK) that performs spatial-temporal sparse incremental perturbations online and makes the adversarial attack less perceptible. In addition, as an optimization-based method, SPARK quickly converges to very small losses within several iterations by considering historical incremental perturbations, making it much more efficient than basic attacks. The in-depth evaluation of the state-of-the-art trackers (i.e., SiamRPN++ with AlexNet, MobileNetv2, and ResNet-50, and SiamDW) on OTB100, VOT2018, UAV123, and LaSOT demonstrates the effectiveness and transferability of SPARK in misleading the trackers under both UA and TA with minor perturbations.

Q. Guo and X. Xie—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We select SiamRPN-AlexNet, since it is a representative Siamese network tracker and achieves high accuracy on modern benchmarks with beyond real-time speed.
2.
We use 30 as the attack interval since videos are usually at 30 fps and such setup naturally utilizes the potential delay between 29th and 30th frames.
3.
The 11 attributes are illumination variation (IV), scale variation (SV), in-plane rotation (IPR), outplane rotation (OPR), deformation (DEF), occlusion (OCC), motion blur (MB), fast motion (FM), background clutter (BC), out-of-view (OV), and low resolution (LR).

References

Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. arXiv preprint arXiv:1606.09549 (2016)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)
Google Scholar
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. arXiv:1801.01944 (2018)
Chen, Z., Guo, Q., Wan, L., Feng, W.: Background-suppressed correlation filters for visual tracking. In: ICME, pp. 1–6 (2018)
Google Scholar
Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured prediction models. arXiv:1707.05373 (2017)
Dai, K., Dong Wang, H.L., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4665–4674 (2019)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 6931–6939 (2017)
Google Scholar
Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 472–488. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_28
Chapter Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: CVPR, pp. 9185–9193 (2018)
Google Scholar
Du, X., Xie, X., Li, Y., Ma, L., Liu, Y., Zhao, J.: DeepStellar: model-based quantitative analysis of stateful deep learning systems. In: ESEC/FSE, pp. 477–487 (2019)
Google Scholar
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5369–5378 (2019)
Google Scholar
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR, pp. 7944–7953 (2019)
Google Scholar
Feng, W., Han, R., Guo, Q., Zhu, J., Wang, S.: Dynamic saliency-aware regularization for correlation filter-based object tracking. IEEE TIP 28(7), 3232–3245 (2019)
MathSciNet MATH Google Scholar
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: SPW, pp. 50–56 (2018)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: ICCV, pp. 1781–1789 (2017)
Google Scholar
Guo, Q., Feng, W., Zhou, C., Pun, C., Wu, B.: Structure-regularized compressive tracking with online data-driven sampling. IEEE TIP 26(12), 5692–5705 (2017)
MathSciNet MATH Google Scholar
Guo, Q., Han, R., Feng, W., Chen, Z., Wan, L.: Selective spatial regularization by reinforcement learned decision making for object tracking. IEEE TIP 29, 2999–3013 (2020)
Google Scholar
He, A., Luo, C., Tian, X., Zeng, W.: A twofold Siamese network for real-time object tracking. In: CVPR, pp. 4834–4843 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun., J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Chapter Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is bert really robust? natural language attack on text classification and entailment. arXiv:1907.11932 (2019)
Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1
Chapter Google Scholar
Kristan, M., et al.: The seventh visual object tracking vot2019 challenge results. In: ICCVW, pp. 2206–2241 (2019)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. ICLR (Workshop) (2017)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)
Google Scholar
Li, B., Wu, W., Zhu, Z., Yan, J., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)
Google Scholar
Li, Y., Tian, D., Chang, M.C., Bian, X., Lyu, S.: Robust adversarial perturbation on deep proposal-based models. In: BMVC, pp. 1–11 (2018)
Google Scholar
Lin, Y.C., Hong, Z.W., Liao, Y.H., Shi, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. In: IJCAI, pp. 3756–3762 (2017)
Google Scholar
Ling, X., et al.: DEEPSEC: a uniform platform for security analysis of deep learning model. In: IEEE Symposium on Security and Privacy (SP), pp. 673–690 (2019)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lukežič, A., Vojíř, T., Čehovin, L., Matas, J., Kristan, M.: Discriminative correlation filter with channel and spatial reliability. In: CVPR, pp. 4847–4856 (2017)
Google Scholar
Ma, L., et al.: DeepGauge: multi-granularity testing criteria for deep learning systems. In: ASE, pp. 120–131 (2018)
Google Scholar
Metzen, J.H., Kumar, M.C., Brox, T., Fischer, V.: Universal adversarial perturbations against semantic image segmentation. In: ICCV, pp. 2774–2783 (2017)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: CVPR, pp. 86–94 (2017)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)
Google Scholar
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Chapter Google Scholar
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Chapter Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)
Google Scholar
Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy (EuroS P), pp. 372–387 (2016)
Google Scholar
Qin, Y., Carlini, N., Goodfellow, I., Cottrell, G., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. arXiv:1903.10346 (2019)
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: ACL, pp. 1085–1097 (2019)
Google Scholar
Song, Y., et al.: Vital: visual tracking via adversarial learning. In: CVPR, pp. 8990–8999 (2018)
Google Scholar
Sun, J., et al.: Stealthy and efficient adversarial attacks against deep reinforcement learning. In: AAAI, pp. 5883–5891 (2020)
Google Scholar
Sun, Y., Sun, C., Wang, D., Lu, H., He, Y.: Roi pooled correlation filters for visual tracking. In: CVPR, pp. 5776–5784 (2019)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338 (2019)
Google Scholar
Wang, X., Li, C., Luo, B., Tang, J.: SINT++: robust visual tracking via adversarial positive instance generation. In: CVPR, pp. 4864–4873 (2018)
Google Scholar
Wei, X., Liang, S., Chen, N., Cao, X.: Transferable adversarial attacks for image and video object detection. In: IJCAI, pp. 954–960 (2019)
Google Scholar
Wei, X., Zhu, J., Yuan, S., Su, H.: Sparse adversarial perturbations for videos. In: AAAI, pp. 8973–8980 (2019)
Google Scholar
Wiyatno, R.R., Xu, A.: Physical adversarial textures that fool visual object tracking. arXiv:1904.11042 (2019)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE TPAMI 37(9), 1834–1848 (2015)
Article Google Scholar
Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.L.: Adversarial examples for semantic segmentation and object detection. In: ICCV, pp. 1378–1387 (2017)
Google Scholar
Xie, X., et al.: DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In: ISSTA, pp. 146–157 (2019)
Google Scholar
Zhang, H., Zhou, H., Miao, N., Li, L.: Generating fluent adversarial examples for natural languages. In: ACL, pp. 5564–5569 (2019)
Google Scholar
Zhang, P., Guo, Q., Feng, W.: Fast and object-adaptive spatial regularization for correlation filters based tracking. Neurocomputing 337, 129–143 (2019)
Article Google Scholar
Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., Lu, H.: Structured Siamese network for real-time visual tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 355–370. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_22
Chapter Google Scholar
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: CVPR, pp. 4586–4595 (2019)
Google Scholar
Zhao, Y., Zhu, H., Liang, R., Shen, Q., Zhang, S., Chen, K.: Seeing isn’t believing: practical adversarial attack against object detectors. In: CCS, pp. 1989–2004 (2019)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61671325, Grant 61572354, Grant 61672376, Grant U1803264, and Grant 61906135, the Singapore National Research Foundation under the National Cybersecurity R&D Program No. NRF2018NCR-NCR005-0001 and the NRF Investigatorship No. NRFI06-2020-0022, and the National Satellite of Excellence in Trustworthy Software System No. NRF2018NCR-NSOE003-0001. It was also supported by JSPS KAKENHI Grant No. 20H04168, 19K24348, 19H04086, and JST-Mirai Program Grant No. JPMJMI18BB, Japan. We also gratefully acknowledge the support of NVIDIA AI Tech Center (NVAITC) to our research.

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Qing Guo, Zhongguo Li & Wei Feng
Nanyang Technological University, Singapore, Singapore
Qing Guo, Xiaofei Xie & Yang Liu
Alibaba Group, San Mateo, USA
Felix Juefei-Xu
Kyushu University, Fukuoka, Japan
Lei Ma
Tianjin University of Technology, Tianjin, China
Wanli Xue

Authors

Qing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Felix Juefei-Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zhongguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Xue
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Feng .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 153 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Q. et al. (2020). SPARK: Spatial-Aware Online Incremental Attack Against Visual Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12370. Springer, Cham. https://doi.org/10.1007/978-3-030-58595-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-58595-2_13
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58594-5
Online ISBN: 978-3-030-58595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics