Abstract
Adversarial attacks of deep neural networks have been intensively studied on image, audio, and natural language classification tasks. Nevertheless, as a typical while important real-world application, the adversarial attacks of online video tracking that traces an object’s moving trajectory instead of its category are rarely explored. In this paper, we identify a new task for the adversarial attack to visual tracking: online generating imperceptible perturbations that mislead trackers along with an incorrect (Untargeted Attack, UA) or specified trajectory (Targeted Attack, TA). To this end, we first propose a spatial-aware basic attack by adapting existing attack methods, i.e., FGSM, BIM, and C&W, and comprehensively analyze the attacking performance. We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency. To address these challenges, we further propose the spatial-aware online inc remental attac k (a.k.a. SPARK) that performs spatial-temporal sparse incremental perturbations online and makes the adversarial attack less perceptible. In addition, as an optimization-based method, SPARK quickly converges to very small losses within several iterations by considering historical incremental perturbations, making it much more efficient than basic attacks. The in-depth evaluation of the state-of-the-art trackers (i.e., SiamRPN++ with AlexNet, MobileNetv2, and ResNet-50, and SiamDW) on OTB100, VOT2018, UAV123, and LaSOT demonstrates the effectiveness and transferability of SPARK in misleading the trackers under both UA and TA with minor perturbations.
Q. Guo and X. Xie—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We select SiamRPN-AlexNet, since it is a representative Siamese network tracker and achieves high accuracy on modern benchmarks with beyond real-time speed.
- 2.
We use 30 as the attack interval since videos are usually at 30 fps and such setup naturally utilizes the potential delay between 29th and 30th frames.
- 3.
The 11 attributes are illumination variation (IV), scale variation (SV), in-plane rotation (IPR), outplane rotation (OPR), deformation (DEF), occlusion (OCC), motion blur (MB), fast motion (FM), background clutter (BC), out-of-view (OV), and low resolution (LR).
References
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. arXiv preprint arXiv:1606.09549 (2016)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. arXiv:1801.01944 (2018)
Chen, Z., Guo, Q., Wan, L., Feng, W.: Background-suppressed correlation filters for visual tracking. In: ICME, pp. 1–6 (2018)
Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured prediction models. arXiv:1707.05373 (2017)
Dai, K., Dong Wang, H.L., Sun, C., Li, J.: Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR, pp. 4665–4674 (2019)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 6931–6939 (2017)
Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 472–488. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_28
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: CVPR, pp. 9185–9193 (2018)
Du, X., Xie, X., Li, Y., Ma, L., Liu, Y., Zhao, J.: DeepStellar: model-based quantitative analysis of stateful deep learning systems. In: ESEC/FSE, pp. 477–487 (2019)
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR, pp. 5369–5378 (2019)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR, pp. 7944–7953 (2019)
Feng, W., Han, R., Guo, Q., Zhu, J., Wang, S.: Dynamic saliency-aware regularization for correlation filter-based object tracking. IEEE TIP 28(7), 3232–3245 (2019)
Gao, J., Lanchantin, J., Soffa, M.L., Qi, Y.: Black-box generation of adversarial text sequences to evade deep learning classifiers. In: SPW, pp. 50–56 (2018)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:1412.6572 (2014)
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: ICCV, pp. 1781–1789 (2017)
Guo, Q., Feng, W., Zhou, C., Pun, C., Wu, B.: Structure-regularized compressive tracking with online data-driven sampling. IEEE TIP 26(12), 5692–5705 (2017)
Guo, Q., Han, R., Feng, W., Chen, Z., Wan, L.: Selective spatial regularization by reinforcement learned decision making for object tracking. IEEE TIP 29, 2999–3013 (2020)
He, A., Luo, C., Tian, X., Zeng, W.: A twofold Siamese network for real-time object tracking. In: CVPR, pp. 4834–4843 (2018)
He, K., Zhang, X., Ren, S., Sun., J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is bert really robust? natural language attack on text classification and entailment. arXiv:1907.11932 (2019)
Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1
Kristan, M., et al.: The seventh visual object tracking vot2019 challenge results. In: ICCVW, pp. 2206–2241 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. ICLR (Workshop) (2017)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR, pp. 4282–4291 (2019)
Li, B., Wu, W., Zhu, Z., Yan, J., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR, pp. 8971–8980 (2018)
Li, Y., Tian, D., Chang, M.C., Bian, X., Lyu, S.: Robust adversarial perturbation on deep proposal-based models. In: BMVC, pp. 1–11 (2018)
Lin, Y.C., Hong, Z.W., Liao, Y.H., Shi, M.L., Liu, M.Y., Sun, M.: Tactics of adversarial attack on deep reinforcement learning agents. In: IJCAI, pp. 3756–3762 (2017)
Ling, X., et al.: DEEPSEC: a uniform platform for security analysis of deep learning model. In: IEEE Symposium on Security and Privacy (SP), pp. 673–690 (2019)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lukežič, A., Vojíř, T., Čehovin, L., Matas, J., Kristan, M.: Discriminative correlation filter with channel and spatial reliability. In: CVPR, pp. 4847–4856 (2017)
Ma, L., et al.: DeepGauge: multi-granularity testing criteria for deep learning systems. In: ASE, pp. 120–131 (2018)
Metzen, J.H., Kumar, M.C., Brox, T., Fischer, V.: Universal adversarial perturbations against semantic image segmentation. In: ICCV, pp. 2774–2783 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: CVPR, pp. 86–94 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)
Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy (EuroS P), pp. 372–387 (2016)
Qin, Y., Carlini, N., Goodfellow, I., Cottrell, G., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. arXiv:1903.10346 (2019)
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: ACL, pp. 1085–1097 (2019)
Song, Y., et al.: Vital: visual tracking via adversarial learning. In: CVPR, pp. 8990–8999 (2018)
Sun, J., et al.: Stealthy and efficient adversarial attacks against deep reinforcement learning. In: AAAI, pp. 5883–5891 (2020)
Sun, Y., Sun, C., Wang, D., Lu, H., He, Y.: Roi pooled correlation filters for visual tracking. In: CVPR, pp. 5776–5784 (2019)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338 (2019)
Wang, X., Li, C., Luo, B., Tang, J.: SINT++: robust visual tracking via adversarial positive instance generation. In: CVPR, pp. 4864–4873 (2018)
Wei, X., Liang, S., Chen, N., Cao, X.: Transferable adversarial attacks for image and video object detection. In: IJCAI, pp. 954–960 (2019)
Wei, X., Zhu, J., Yuan, S., Su, H.: Sparse adversarial perturbations for videos. In: AAAI, pp. 8973–8980 (2019)
Wiyatno, R.R., Xu, A.: Physical adversarial textures that fool visual object tracking. arXiv:1904.11042 (2019)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE TPAMI 37(9), 1834–1848 (2015)
Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.L.: Adversarial examples for semantic segmentation and object detection. In: ICCV, pp. 1378–1387 (2017)
Xie, X., et al.: DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In: ISSTA, pp. 146–157 (2019)
Zhang, H., Zhou, H., Miao, N., Li, L.: Generating fluent adversarial examples for natural languages. In: ACL, pp. 5564–5569 (2019)
Zhang, P., Guo, Q., Feng, W.: Fast and object-adaptive spatial regularization for correlation filters based tracking. Neurocomputing 337, 129–143 (2019)
Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., Lu, H.: Structured Siamese network for real-time visual tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 355–370. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_22
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: CVPR, pp. 4586–4595 (2019)
Zhao, Y., Zhu, H., Liang, R., Shen, Q., Zhang, S., Chen, K.: Seeing isn’t believing: practical adversarial attack against object detectors. In: CCS, pp. 1989–2004 (2019)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Acknowledgements
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61671325, Grant 61572354, Grant 61672376, Grant U1803264, and Grant 61906135, the Singapore National Research Foundation under the National Cybersecurity R&D Program No. NRF2018NCR-NCR005-0001 and the NRF Investigatorship No. NRFI06-2020-0022, and the National Satellite of Excellence in Trustworthy Software System No. NRF2018NCR-NSOE003-0001. It was also supported by JSPS KAKENHI Grant No. 20H04168, 19K24348, 19H04086, and JST-Mirai Program Grant No. JPMJMI18BB, Japan. We also gratefully acknowledge the support of NVIDIA AI Tech Center (NVAITC) to our research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, Q. et al. (2020). SPARK: Spatial-Aware Online Incremental Attack Against Visual Tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12370. Springer, Cham. https://doi.org/10.1007/978-3-030-58595-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-58595-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58594-5
Online ISBN: 978-3-030-58595-2
eBook Packages: Computer ScienceComputer Science (R0)