research-article

STRIP: a defence against trojan attacks on deep neural networks

Authors:
Yansong Gao

Nanjing University of Science and Technology, Sydney, Australia

Nanjing University of Science and Technology, Sydney, Australia
View Profile

,
Change Xu

CSIRO, Sydney, Australia

CSIRO, Sydney, Australia
View Profile

,
Derui Wang

Swinburne University of Technology and Data, Australia

Swinburne University of Technology and Data, Australia
View Profile

,
Shiping Chen

CSIRO, Sydney, Australia

CSIRO, Sydney, Australia
View Profile

,
Damith C. Ranasinghe

The University of Adelaide, SA, Australia

The University of Adelaide, SA, Australia
View Profile

,
Surya Nepal

CSIRO, Sydney, Australia

CSIRO, Sydney, Australia
View Profile

ACSAC '19: Proceedings of the 35th Annual Computer Security Applications ConferenceDecember 2019Pages 113–125https://doi.org/10.1145/3359789.3359790

Published:09 December 2019Publication History

ACSAC '19: Proceedings of the 35th Annual Computer Security Applications Conference

Pages 113–125

ABSTRACT

A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model---malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input---a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.

References

Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. 2018. Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring. In USENIX Security Symposium.Google Scholar
Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International Conference on Machine Learning. 173--182.Google ScholarDigital Library
Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. 2018. How To Backdoor Federated Learning. arXiv preprint arXiv:1807.00459 (2018).Google Scholar
Nathalie Baracaldo, Bryant Chen, Heiko Ludwig, and Jaehoon Amir Safavi. 2017. Mitigating Poisoning Attacks on Machine Learning Models: A Data Provenance Based Approach. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 103--110.Google ScholarDigital Library
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. 2018. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. arXiv preprint arXiv:1811.03728 (2018).Google Scholar
Huili Chen, Bita Darvish Rouhani, and Farinaz Koushanfar. 2018. BlackMarks: Black-box Multi-bit Watermarking for Deep Neural Networks. (2018).Google Scholar
Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).Google Scholar
Edward Chou, Florian Tramèr, Giancarlo Pellegrino, and Dan Boneh. 2018. SentiNet: Detecting Physical Attacks Against Deep Learning Systems. arXiv preprint arXiv:1812.00292 (2018).Google Scholar
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust Physical-World Attacks on Deep Learning Visual Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1625--1634.Google ScholarCross Ref
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017).Google Scholar
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2019. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 7 (2019), 47230--47244.Google ScholarCross Ref
Jia Guo and Miodrag Potkonjak. 2018. Watermarking deep neural networks for embedded systems. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1--8.Google ScholarDigital Library
Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. Lemna: Explaining deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 364--379.Google ScholarDigital Library
Wenbo Guo, Lun Wang, Xinyu Xing, Min Du, and Dawn Song. 2019. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems. arXiv preprint arXiv:1908.01763 (2019).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence. ACM, 43--58.Google ScholarDigital Library
Yujie Ji, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. 2018. Model-Reuse Attacks on Deep Learning Systems. In Proceedings of the ACM Conference on Computer and Communications Security. ACM, 349--363.Google ScholarDigital Library
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Cong Liao, Haoti Zhong, Anna Squicciarini, Sencun Zhu, and David Miller. 2018. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation. arXiv preprint arXiv:1808.10307 (2018).Google Scholar
Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In Proceedings of RAID.Google ScholarCross Ref
Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018. Trojaning attack on neural networks. In Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
Yuntao Liu, Yang Xie, and Ankur Srivastava. 2017. Neural trojans. In IEEE International Conference on Computer Design (ICCD). IEEE, 45--48.Google ScholarCross Ref
U.S. Army Research Office. May 2019. TrojAI. (May 2019). https://www.fbo.gov/index.php?s=opportunity&mode=form&id=be4e81b70688050fd4fc623fb24ead2c&tab=core&_cview=0Google Scholar
Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. 2016. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814 (2016).Google Scholar
Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter. 2016. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. ACM, 1528--1540.Google ScholarDigital Library
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32 (2012), 323--332.Google Scholar
Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W Mahoney, Randy Katz, Anthony D Joseph, Michael Jordan, Joseph M Hellerstein, Joseph E Gonzalez, et al. 2017. A Berkeley view of systems challenges for AI. arXiv preprint arXiv:1712.05855 (2017).Google Scholar
Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation (2019).Google Scholar
Tuan A Tang, Lotfi Mhamdi, Des McLernon, Syed Ali Raza Zaidi, and Mounir Ghogho. 2016. Deep learning approach for network intrusion detection in software defined networking. In International Conference on Wireless Networks and Mobile Communications (WINCOM). IEEE, 258--263.Google ScholarCross Ref
Brandon Tran, Jerry Li, and Aleksander Madry. 2018. Spectral signatures in backdoor attacks. In Advances in Neural Information Processing Systems. 8000--8010.Google Scholar
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proceedings of the 40th IEEE Symposium on Security and Privacy.Google ScholarCross Ref
Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G Ororbia II, Xinyu Xing, Xue Liu, and C Lee Giles. 2017. Adversary resistant deep neural networks with an application to malware detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1145--1153.Google ScholarDigital Library
Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. 2018. Protecting Intellectual Property of Deep Neural Networks with Watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM, 159--172.Google ScholarDigital Library
Minhui Zou, Yang Shi, Chengliang Wang, Fangyu Li, WenZhan Song, and Yu Wang. 2018. PoTrojan: powerful neural-level trojan designs in deep learning models. arXiv preprint arXiv:1802.03043 (2018).Google Scholar

Index Terms

STRIP: a defence against trojan attacks on deep neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Software and application security
    1. Domain-specific security and privacy architectures

Recommendations

An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

With the widespread use of deep neural networks (DNNs) in high-stake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to ...
Read More
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

With the prevalent use of Deep Neural Networks (DNNs) in many applications, security of these networks is of importance. Pre-trained DNNs may contain backdoors that are injected through poisoned training. These trojaned models perform well when regular ...
Read More
Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification
IH&MMSec '22: Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security

Machine learning models are vulnerable to adversarial attacks, where a small, invisible, malicious perturbation of the input changes the predicted label. A large area of research is concerned with verification techniques that attempt to decide whether a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACSAC '19: Proceedings of the 35th Annual Computer Security Applications Conference
December 2019
821 pages
ISBN:9781450376280
DOI:10.1145/3359789
Conference Chair:
David Balenson
SRI International
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Reusable
Author Tags
backdoor attack
deep learning
input-agnostic
trojan attack
Qualifiers
- research-article
Conference

Acceptance Rates
ACSAC '19 Paper Acceptance Rate60of266submissions,23%Overall Acceptance Rate104of497submissions,21%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 277
  Total Citations
  View Citations
- 2,467
  Total Downloads
- Downloads (Last 12 months)412
- Downloads (Last 6 weeks)41
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

STRIP: a defence against trojan attacks on deep neural networks

ACSAC '19: Proceedings of the 35th Annual Computer Security Applications Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks

Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features

Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification