ABSTRACT
Deep Learning (DL) has achieved tremendous success in many cutting-edge applications. However, the state-of-the-art DL systems still suffer from quality issues. While some recent progress has been made on the analysis of feed-forward DL systems, little study has been done on the Recurrent Neural Network (RNN)-based stateful DL systems, which are widely used in audio, natural languages and video processing, etc. In this paper, we initiate the very first step towards the quantitative analysis of RNN-based DL systems. We model RNN as an abstract state transition system to characterize its internal behaviors. Based on the abstract model, we design two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs. We further propose two algorithms powered by the quantitative measures for adversarial sample detection and coverage-guided test generation. We evaluate DeepStellar on four RNN-based systems covering image classification and automated speech recognition. The results demonstrate that the abstract model is useful in capturing the internal behaviors of RNNs, and confirm that (1) the similarity metrics could effectively capture the differences between samples even with very small perturbations (achieving 97% accuracy for detecting adversarial samples) and (2) the coverage criteria are useful in revealing erroneous behaviors (generating three times more adversarial samples than random testing and hundreds times more than the unrolling approach).
- 2018. Mozilla Common Voice. https://voice.mozilla.org/en. 2018. Mozilla’s DeepSpeech. https://github.com/mozilla/DeepSpeech. 2019. DeepStellar. https://sites.google.com/view/deepstellar/home 2019. Google cloud text-to-speech. https://cloud.google.com/text-to-speech/ 2019. kaggle: Audio data augmentation. https://www.kaggle.com/CVxTz/audiodata-augmentation 2019. Levenshtein Distance. https://en.wikipedia.org/wiki/Levenshtein_distanceGoogle Scholar
- Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, and et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265–283. Google ScholarDigital Library
- Nicholas Carlini and David Wagner. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. (jan 2018). arXiv: 1801.01944 http://arxiv.org/abs/ 1801.01944Google Scholar
- Adelmo Luis Cechin, Denise Regina Pechmann Simon, and Klaus Stertz. 2003. State Automata Extraction from Recurrent Neural Nets Using k-Means and Fuzzy Clustering. In Proceedings of the XXIII International Conference of the Chilean Computer Science Society. 73. Google ScholarDigital Library
- Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).Google Scholar
- François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google Scholar
- Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In CVPR. 3642–3649. Google ScholarDigital Library
- Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2019. SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems. arXiv preprint arXiv:1901.07846 (2019).Google Scholar
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861–874. Google ScholarDigital Library
- Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).Google Scholar
- Arthur Gill. 1962. Introduction to the Theory of Finite-State Machines. McGraw-Hill. https://books.google.com.sg/books?id=2IzQAAAAMAAJGoogle Scholar
- Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations. http://arxiv.org/abs/1412.6572Google Scholar
- Google Accident. 2016. A Google self-driving car caused a crash for the first time. https://www.theverge.com/2016/2/29/11134344/google-self-driving-carcrash-reportGoogle Scholar
- Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google Scholar
- Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.Google ScholarCross Ref
- Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber, et al. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
- Bill G. Horne, C. Lee Giles, Pete C. Collingwood, School Of Computing, Man Sci, Peter Tino, and Peter Tino. 1998. Finite State Machines and Recurrent Neural Networks – Automata and Dynamical Systems Approaches. In Neural Networks and Pattern Recognition. Academic Press, 171–220.Google Scholar
- Bo-Jian Hou and Zhi-Hua Zhou. 2018. Learning with Interpretable Structure from RNN. (oct 2018). arXiv: 1810.10708 http://arxiv.org/abs/1810.10708Google Scholar
- Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, Fernando Mujica, Adam Coates, and Andrew Y. Ng. 2015. An Empirical Evaluation of Deep Learning on Highway Driving. CoRR abs/1504.01716 (2015).Google Scholar
- arXiv: 1504.01716 http://arxiv.org/abs/1504.01716Google Scholar
- Ian Jolliffe. 2011. Principal Component Analysis. In International Encyclopedia of Statistical Science. Springer, 1094–1096.Google Scholar
- Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). 1039–1049. Google ScholarDigital Library
- Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google Scholar
- Shiyu Liang, Yixuan Li, and R Srikant. 2017. Enhancing the reliability of out-ofdistribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).Google Scholar
- Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, and Simon See. 2018. Secure Deep Learning Engineering: A Software Quality Assurance Perspective. arXiv e-prints (Oct. 2018), arXiv:1810.04538.Google Scholar
- Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614–618.Google Scholar
- Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. In Proc. of the 33rd ACM/IEEE Intl. Conf. on Automated Software Engineering (ASE 2018). 120–131. Google ScholarDigital Library
- Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. {n. d.}. DeepMutation: Mutation Testing of Deep Learning Systems. In 29th IEEE International Symposium on Software Reliability Engineering (ISSRE), Memphis, USA, Oct. 15-18, 2018. 100–111.Google Scholar
- Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In NDSS. 24–27.Google Scholar
- Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60.Google Scholar
- Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).Google Scholar
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. CVPR (2016), 2574–2582.Google Scholar
- J. R. Norris. 1997. Markov Chains. Cambridge University Press.Google Scholar
- Augustus Odena and Ian Goodfellow. 2018. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. (2018). arXiv: 1807.10875Google Scholar
- Christian W Omlin and C Lee Giles. 1996. Extraction of rules from discrete-time recurrent neural networks. Neural networks 9, 1 (1996), 41–52. Google ScholarDigital Library
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In SOSP. 1–18. Google ScholarDigital Library
- Pushpendre Rastogi, Ryan Cotterell, and Jason Eisner. 2016. Weighting Finite-State Transductions with Neural Context. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 623–633.Google ScholarCross Ref
- Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google Scholar
- Charles Spearman. 1987. The proof and measurement of association between two things. The American journal of psychology 100, 3/4 (1987), 441–471.Google Scholar
- Youcheng Sun, Xiaowei Huang, and Daniel Kroening. 2018. Testing Deep Neural Networks. arXiv preprint arXiv:1803.04792 (2018).Google Scholar
- Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. (2018).Google Scholar
- arXiv: 1805.00089Google Scholar
- The BBC. 2016. AI image recognition fooled by single pixel change. https: //www.bbc.com/news/technology-41845878Google Scholar
- The New York Times. 2016. Alexa and Siri Can Hear This Hidden Command. You Can’t. https://www.nytimes.com/2018/05/10/technology/alexa-siri-hiddencommand-audio-attacks.htmlGoogle Scholar
- Joe F Thompson, Bharat K Soni, and Nigel P Weatherill. 1998. Handbook of Grid Generation. CRC press.Google Scholar
- Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In ICSE. ACM, 303–314. Google ScholarDigital Library
- Uber Accident. 2018. After Fatal Uber Crash, a Self-Driving Start-Up Moves Forward. https://www.nytimes.com/2018/05/07/technology/uber-crashautonomous-driveai.htmlGoogle Scholar
- Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2018. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. arXiv preprint arXiv:1812.05793 (2018). Google ScholarDigital Library
- Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia, II, Xinyu Xing, Xue Liu, and C. Lee Giles. 2018. An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks. Neural Comput. 30, 9 (Sept. 2018), 2568–2591. Google ScholarDigital Library
- Gail Weiss, Yoav Goldberg, and Eran Yahav. 2017. Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. arXiv preprint arXiv:1711.09576 (2017).Google Scholar
- Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In ISSTA. Google ScholarDigital Library
- Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: Detecting Disagreements for Deep Neural Networks. In IJCAI.Google Scholar
- Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).Google Scholar
- Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. arXiv e-prints (Jun 2019), arXiv:1906.10742.Google Scholar
Index Terms
- DeepStellar: model-based quantitative analysis of stateful deep learning systems
Recommendations
Deep Learning-Based Dynamic Community Discovery
Database Systems for Advanced Applications. DASFAA 2021 International WorkshopsAbstractRecurrent neural networks (RNNs) have been effective methods for time series analyses. The network representation learning model and method based on deep learning can excellently analyze and predict the community structure of social networks. ...
A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs): Deep Learning for N-IDSs
Recently, due to the advance and impressive results of deep learning techniques in the fields of image recognition, natural language processing and speech recognition for various long-standing artificial intelligence (AI) tasks, there has been a great ...
A hybrid deep learning model by combining convolutional neural network and recurrent neural network to detect forest fire
AbstractForest fire poses a serious threat to wildlife, environment, and all mankind. This threat has prompted the development of various intelligent and computer vision based systems to detect forest fire. This article proposes a novel hybrid deep ...
Comments