Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Liu, Kang; Dolan-Gavitt, Brendan; Garg, Siddharth

doi:10.1007/978-3-030-00470-5_13

Kang Liu¹⁷,
Brendan Dolan-Gavitt¹⁷ &
Siddharth Garg¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11050))

Included in the following conference series:

International Symposium on Research in Attacks, Intrusions, and Defenses

7696 Accesses
299 Citations
5 Altmetric

Abstract

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a \(0.4\%\) drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that because DNNs are trained using heuristic procedures, this is the case even if the third-party is benign.
2.
Defined as the fraction of backdoored test images classified as the target.
3.
While Gu et al. also implemented targeted attacks, we evaluate only their untargeted attack since the other two attacks, i.e., on face and speech recognition, are targeted.
4.
Since the goal of untargeted attacks is to reduce the accuracy on clean inputs, we define the attack success rate as \(1-\frac{A_{backdoor}}{A_{clean}}\), where \(A_{backdoor}\) is the accuracy on backdoored inputs and \(A_{clean}\) is the accuracy on clean inputs.
5.
Consistent with prior work, we say “pruning a neuron” to mean reducing the number of output channels in a layer by one.

References

ImageNet large scale visual recognition competition. http://www.image-net.org/challenges/LSVRC/2012/ (2012)
Amazon Web Services Inc: Amazon Elastic Compute Cloud (Amazon EC2)
Google Scholar
Amazon.com, Inc.: Deep Learning AMI Amazon Linux Version
Google Scholar
Anwar, S.: Structured pruning of deep convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 13(3), 32 (2017)
Google Scholar
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, July 2018. https://arxiv.org/abs/1802.00420
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014)
Google Scholar
Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning be secure? In: Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security. ASIACCS 2006 (2006). https://doi.org/10.1145/1128817.1128824
Blum, A., Rivest, R.L.: Training a 3-node neural network is NP-complete. In: Advances in neural information processing systems, pp. 494–501 (1989)
Google Scholar
Carlini, N., Wagner, D.A.: Defensive distillation is not robust to adversarial examples. CoRR abs/1607.04311 (2016). http://arxiv.org/abs/1607.04311
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. ArXiv e-prints, December 2017
Google Scholar
Chung, S.P., Mok, A.K.: Allergy attack against automatic signature generation. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 61–80. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_4
Chapter Google Scholar
Chung, S.P., Mok, A.K.: Advanced allergy attacks: does a corpus really help? In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 236–255. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74320-0_13
Chapter Google Scholar
Dhillon, G.S., et al.: Stochastic activation pruning for robust adversarial defense. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=H1uR4GZRZ
Fogla, P., Lee, W.: Evading network anomaly detection systems: formal reasoning and practical techniques. In: Proceedings of the 13th ACM Conference on Computer and Communications Security. CCS 2006 (2006). https://doi.org/10.1145/1180405.1180414
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: USENIX-SS 2006 Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15 (2006)
Google Scholar
Google Inc: Google Cloud Machine Learning Engine. https://cloud.google.com/ml-engine/
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
Google Scholar
Gu, T., Garg, S., Dolan-Gavitt, B.: BadNets: identifying vulnerabilities in the machine learning model supply chain. In: NIPS Machine Learning and Computer Security Workshop (2017). https://arxiv.org/abs/1708.06733
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defense: ensembles of weak defenses are not strong. In: 11th USENIX Workshop on Offensive Technologies (WOOT 2017). USENIX Association, Vancouver, BC (2017). https://www.usenix.org/conference/woot17/workshop-program/presentation/he
Hermann, K.M., Blunsom, P.: Multilingual distributed representations without word alignment. In: Proceedings of ICLR, April 2014. http://arxiv.org/abs/1312.6173
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2592–2600 (2016)
Google Scholar
Karlberger, C., Bayler, G., Kruegel, C., Kirda, E.: Exploiting redundancy in natural language to penetrate bayesian spam filters. In: Proceedings of the First USENIX Workshop on Offensive Technologies. WOOT 2007 (2007)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Li, H., et al.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
Liu, C., Li, B., Vorobeychik, Y., Oprea, A.: Robust linear regression against training data poisoning. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 91–102. ACM (2017)
Google Scholar
Liu, Y., et al.: Trojaning attack on neural networks. In: 25nd Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, 18–21 February 2018. The Internet Society (2018)
Google Scholar
Liu, Y., Xie, Y., Srivastava, A.: Neural trojans. CoRR abs/1710.00942 (2017). http://arxiv.org/abs/1710.00942
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. KDD 2005, pp. 641–647. ACM, New York (2005). https://doi.org/10.1145/1081870.1081950
Lowd, D., Meek, C.: Good word attacks on statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS) (2005)
Google Scholar
Microsoft Corporation: Azure Batch AI Training. https://batchaitraining.azure.com/
Møgelmose, A., Liu, D., Trivedi, M.M.: Traffic sign detection for us roads: remaining challenges and a case for tracking. In: 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), pp. 1394–1399. IEEE (2014)
Google Scholar
Molchanov, P., et al.: Pruning convolutional neural networks for resource efficient inference (2016)
Google Scholar
Muñoz-González, L., et al.: Towards poisoning of deep learning algorithms with back-gradient optimization. CoRR abs/1708.08689 (2017). http://arxiv.org/abs/1708.08689
Nelson, B., et al.: Exploiting machine learning to subvert your spam filter. In: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats. LEET 2008, pp. 7:1–7:9. USENIX Association, Berkeley (2008)
Google Scholar
Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_5
Chapter Google Scholar
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597, May 2016. https://doi.org/10.1109/SP.2016.41
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Suciu, O., Marginean, R., Kaya, Y., Daumé III, H., Dumitras, T.: When does machine learning FAIL? Generalized transferability for evasion and poisoning attacks. In: 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore (2018). https://www.usenix.org/conference/usenixsecurity18/presentation/suciu
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)
Google Scholar
Tan, K.M.C., Killourhy, K.S., Maxion, R.A.: Undermining an anomaly-based intrusion detection system using common exploits. In: Proceedings of the 5th International Conference on Recent Advances in Intrusion Detection. RAID 2002 (2002)
Google Scholar
Tung, F., Muralidharan, S., Mori, G.: Fine-pruning: joint fine-tuning and compression of a convolutional network with Bayesian optimization. In: British Machine Vision Conference (BMVC) (2017)
Google Scholar
Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security. CCS 2002 (2002). https://doi.org/10.1145/586110.586145
Wittel, G.L., Wu, S.F.: On attacking statistical spam filters. In: Proceedings of the Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA (2004)
Google Scholar
Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: CVPR 2011, pp. 529–534, June 2011. https://doi.org/10.1109/CVPR.2011.5995566
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017). http://arxiv.org/abs/1708.07747
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Google Scholar
Yu, J., et al.: Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 548–560. ACM (2017)
Google Scholar

Download references

Acknowledgement

This research was partially supported by National Science Foundation CAREER Award #1553419.

Author information

Authors and Affiliations

New York University, Brooklyn, NY, USA
Kang Liu, Brendan Dolan-Gavitt & Siddharth Garg

Authors

Kang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Brendan Dolan-Gavitt
View author publications
You can also search for this author in PubMed Google Scholar
Siddharth Garg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kang Liu .

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Michael Bailey
Ruhr-Universität Bochum, Bochum, Germany
Thorsten Holz
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Manolis Stamatogiannakis
Foundation for Research & Technology – Hellas, Heraklion, Crete, Greece
Sotiris Ioannidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, K., Dolan-Gavitt, B., Garg, S. (2018). Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2018. Lecture Notes in Computer Science(), vol 11050. Springer, Cham. https://doi.org/10.1007/978-3-030-00470-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-00470-5_13
Published: 07 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00469-9
Online ISBN: 978-3-030-00470-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics