skip to main content
10.1145/3319535.3363216acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

Published:06 November 2019Publication History

ABSTRACT

This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when the input is stamped with some special pattern called trojan trigger. We develop a novel technique that analyzes inner neuron behaviors by determining how output activations change when we introduce different levels of stimulation to a neuron. The neurons that substantially elevate the activation of a particular output label regardless of the provided input is considered potentially compromised. Trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. We evaluate our system ABS on 177 trojaned models that are trojaned with various attack methods that target both the input space and the feature space, and have various trojan trigger sizes and shapes, together with 144 benign models that are trained with different data and initial weight values. These models belong to 7 different model structures and 6 different datasets, including some complex ones such as ImageNet, VGG-Face and ResNet110. Our results show that ABS is highly effective, can achieve over 90% detection rate for most cases (and many 100%), when only one input sample is provided for each output label. It substantially out-performs the state-of-the-art technique Neural Cleanse that requires a lot of input samples and small trojan triggers to achieve good performance.

Skip Supplemental Material Section

Supplemental Material

p1265-lee.webm

webm

106.7 MB

References

  1. acoomans. 2013. . https://github.com/acoomans/instagram-filtersGoogle ScholarGoogle Scholar
  2. Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018).Google ScholarGoogle Scholar
  3. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et almbox. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).Google ScholarGoogle Scholar
  4. Tom B Brown, Dandelion Mané, Aurko Roy, Mart'in Abadi, and Justin Gilmer. 2017. Adversarial patch. arXiv preprint arXiv:1712.09665 (2017).Google ScholarGoogle Scholar
  5. Yinzhi Cao, Alexander Fangxiao Yu, Andrew Aday, Eric Stahl, Jon Merwine, and Junfeng Yang. 2018. Efficient repair of polluted machine learning systems via causal unlearning. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM, 735--747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).Google ScholarGoogle Scholar
  7. Edward Chou, Florian Tramèr, Giancarlo Pellegrino, and Dan Boneh. 2018. SentiNet: Detecting Physical Attacks Against Deep Learning Systems. arXiv preprint arXiv:1812.00292 (2018).Google ScholarGoogle Scholar
  8. Joseph Clements and Yingjie Lao. 2018. Hardware trojan attacks on neural networks. arXiv preprint arXiv:1806.05768 (2018).Google ScholarGoogle Scholar
  9. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09 .Google ScholarGoogle Scholar
  10. Minghong Fang, Guolei Yang, Neil Zhenqiang Gong, and Jia Liu. 2018. Poisoning attacks to graph-based recommender systems. In Proceedings of the 34th Annual Computer Security Applications Conference. ACM, 381--392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yarin Gal. 2016. Uncertainty in deep learning . Ph.D. Dissertation. PhD thesis, University of Cambridge.Google ScholarGoogle Scholar
  12. Yansong Gao, Chang Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. STRIP: A Defence Against Trojan Attacks on Deep Neural Networks. arXiv preprint arXiv:1902.06531 (2019).Google ScholarGoogle Scholar
  13. Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017).Google ScholarGoogle Scholar
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  16. Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, and Bo Li. 2018. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy (SP) . IEEE, 19--35.Google ScholarGoogle ScholarCross RefCross Ref
  17. Yujie Ji, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. 2018. Model-reuse attacks on deep learning systems. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 349--363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yujie Ji, Xinyang Zhang, and Ting Wang. 2017. Backdoor attacks against learning systems. In 2017 IEEE Conference on Communications and Network Security (CNS) .Google ScholarGoogle ScholarCross RefCross Ref
  19. Melissa King. 2019. TrojAI . https://www.iarpa.gov/index.php?option=com_content&view=article&id=1142&Itemid=443Google ScholarGoogle Scholar
  20. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images . Technical Report. Citeseer.Google ScholarGoogle Scholar
  21. Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et almbox. 1998. Gradient-based learning applied to document recognition. Proc. IEEE (1998).Google ScholarGoogle Scholar
  22. Gil Levi and Tal Hassner. 2015. Age and gender classification using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops . 34--42.Google ScholarGoogle ScholarCross RefCross Ref
  23. Wenshuo Li, Jincheng Yu, Xuefei Ning, Pengjun Wang, Qi Wei, Yu Wang, and Huazhong Yang. 2018. Hu-fu: Hardware and software collaborative attack framework against neural networks. In ISVLSI .Google ScholarGoogle Scholar
  24. Cong Liao, Haoti Zhong, Anna Squicciarini, Sencun Zhu, and David Miller. 2018. Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018).Google ScholarGoogle Scholar
  25. Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).Google ScholarGoogle Scholar
  26. Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018a. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 273--294.Google ScholarGoogle ScholarCross RefCross Ref
  27. Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang. 2018b. Trojaning Attack on Neural Networks. In 25nd Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18--221, 2018 . The Internet Society.Google ScholarGoogle Scholar
  28. Yuntao Liu, Yang Xie, and Ankur Srivastava. 2017. Neural trojans. In 2017 IEEE International Conference on Computer Design (ICCD). IEEE, 45--48.Google ScholarGoogle ScholarCross RefCross Ref
  29. Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In 26th Annual Network and Distributed System Security Symposium, NDSS .Google ScholarGoogle Scholar
  30. Wei Ma and Jun Lu. 2017. An Equivalence of Fully Connected Layer and Convolutional Layer. arXiv preprint arXiv:1712.01252 (2017).Google ScholarGoogle Scholar
  31. Andreas Møgelmose, Dongran Liu, and Mohan M Trivedi. 2014. Traffic sign detection for us roads: Remaining challenges and a case for tracking. In 17th International IEEE Conference on Intelligent Transportation Systems (ITSC) .Google ScholarGoogle ScholarCross RefCross Ref
  32. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1765--1773.Google ScholarGoogle ScholarCross RefCross Ref
  33. Mehran Mozaffari-Kermani, Susmita Sur-Kolay, Anand Raghunathan, and Niraj K Jha. 2015. Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE journal of biomedical and health informatics (2015).Google ScholarGoogle ScholarCross RefCross Ref
  34. Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. ACM, 506--519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P) .Google ScholarGoogle ScholarCross RefCross Ref
  36. Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, et almbox. 2015. Deep face recognition.. In bmvc, Vol. 1. 6.Google ScholarGoogle Scholar
  37. Andrea Paudice, Luis Mu noz-González, and Emil C Lupu. 2018. Label sanitization against label flipping poisoning attacks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 5--15.Google ScholarGoogle Scholar
  38. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.Google ScholarGoogle ScholarCross RefCross Ref
  39. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision , Vol. 115, 3 (2015), 211--252.Google ScholarGoogle Scholar
  40. Ali Shafahi, W Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. In NeuralIPS .Google ScholarGoogle Scholar
  41. Mahmood Sharif, Lujo Bauer, and Michael K Reiter. 2018. On the suitability of lp-norms for creating and preventing adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops .Google ScholarGoogle ScholarCross RefCross Ref
  42. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  43. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2011. The German Traffic Sign Recognition Benchmark: A multi-class classification competition.. In IJCNN, Vol. 6. 7.Google ScholarGoogle ScholarCross RefCross Ref
  44. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks , Vol. 32 (2012), 323--332.Google ScholarGoogle Scholar
  45. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google ScholarGoogle Scholar
  46. Guanhong Tao, Shiqing Ma, Yingqi Liu, and Xiangyu Zhang. 2018. Attacks meet interpretability: Attribute-steered detection of adversarial samples. In Advances in Neural Information Processing Systems. 7717--7728.Google ScholarGoogle Scholar
  47. Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2018. Clean-Label Backdoor Attacks. (2018).Google ScholarGoogle Scholar
  48. Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. IEEE, 0.Google ScholarGoogle Scholar
  49. Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et almbox. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing , Vol. 13, 4 (2004), 600--612.Google ScholarGoogle Scholar
  50. wikipedia. 2019. Electrical brain stimulation - Wikipedia . https://en.wikipedia.org/wiki/Electrical_brain_stimulationGoogle ScholarGoogle Scholar
  51. Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, and Somesh Jha. 2017. Reinforcing adversarial robustness using model confidence induced by adversarial training. arXiv preprint arXiv:1711.08001 (2017).Google ScholarGoogle Scholar
  52. Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).Google ScholarGoogle Scholar
  53. Wei Yang, Deguang Kong, Tao Xie, and Carl A Gunter. 2017. Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps. In Proceedings of the 33rd Annual Computer Security Applications Conference .Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Minhui Zou, Yang Shi, Chengliang Wang, Fangyu Li, WenZhan Song, and Yu Wang. 2018. Potrojan: powerful neural-level trojan designs in deep learning models. arXiv preprint arXiv:1802.03043 (2018).Google ScholarGoogle Scholar

Index Terms

  1. ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '19: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security
        November 2019
        2755 pages
        ISBN:9781450367479
        DOI:10.1145/3319535

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 November 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '19 Paper Acceptance Rate149of934submissions,16%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader