research-article

DeepStellar: model-based quantitative analysis of stateful deep learning systems

Authors:
Xiaoning Du

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Xiaofei Xie

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Yi Li

Nanyang Technological University, Singapore

Nanyang Technological University, Singapore
View Profile

,
Lei Ma

Kyushu University, Japan

Kyushu University, Japan
View Profile

,
Yang Liu

Nanyang Technological University, Singapore / Zhejiang Sci-Tech University, China

Nanyang Technological University, Singapore / Zhejiang Sci-Tech University, China
View Profile

,
Jianjun Zhao

Kyushu University, Japan

Kyushu University, Japan
View Profile

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringAugust 2019Pages 477–487https://doi.org/10.1145/3338906.3338954

Published:12 August 2019Publication History

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 477–487

ABSTRACT

Deep Learning (DL) has achieved tremendous success in many cutting-edge applications. However, the state-of-the-art DL systems still suffer from quality issues. While some recent progress has been made on the analysis of feed-forward DL systems, little study has been done on the Recurrent Neural Network (RNN)-based stateful DL systems, which are widely used in audio, natural languages and video processing, etc. In this paper, we initiate the very first step towards the quantitative analysis of RNN-based DL systems. We model RNN as an abstract state transition system to characterize its internal behaviors. Based on the abstract model, we design two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs. We further propose two algorithms powered by the quantitative measures for adversarial sample detection and coverage-guided test generation. We evaluate DeepStellar on four RNN-based systems covering image classification and automated speech recognition. The results demonstrate that the abstract model is useful in capturing the internal behaviors of RNNs, and confirm that (1) the similarity metrics could effectively capture the differences between samples even with very small perturbations (achieving 97% accuracy for detecting adversarial samples) and (2) the coverage criteria are useful in revealing erroneous behaviors (generating three times more adversarial samples than random testing and hundreds times more than the unrolling approach).

References

2018. Mozilla Common Voice. https://voice.mozilla.org/en. 2018. Mozilla’s DeepSpeech. https://github.com/mozilla/DeepSpeech. 2019. DeepStellar. https://sites.google.com/view/deepstellar/home 2019. Google cloud text-to-speech. https://cloud.google.com/text-to-speech/ 2019. kaggle: Audio data augmentation. https://www.kaggle.com/CVxTz/audiodata-augmentation 2019. Levenshtein Distance. https://en.wikipedia.org/wiki/Levenshtein_distanceGoogle Scholar
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, and et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265–283. Google ScholarDigital Library
Nicholas Carlini and David Wagner. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. (jan 2018). arXiv: 1801.01944 http://arxiv.org/abs/ 1801.01944Google Scholar
Adelmo Luis Cechin, Denise Regina Pechmann Simon, and Klaus Stertz. 2003. State Automata Extraction from Recurrent Neural Nets Using k-Means and Fuzzy Clustering. In Proceedings of the XXIII International Conference of the Chilean Computer Science Society. 73. Google ScholarDigital Library
Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).Google Scholar
François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google Scholar
Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In CVPR. 3642–3649. Google ScholarDigital Library
Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2019. SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems. arXiv preprint arXiv:1901.07846 (2019).Google Scholar
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861–874. Google ScholarDigital Library
Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).Google Scholar
Arthur Gill. 1962. Introduction to the Theory of Finite-State Machines. McGraw-Hill. https://books.google.com.sg/books?id=2IzQAAAAMAAJGoogle Scholar
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations. http://arxiv.org/abs/1412.6572Google Scholar
Google Accident. 2016. A Google self-driving car caused a crash for the first time. https://www.theverge.com/2016/2/29/11134344/google-self-driving-carcrash-reportGoogle Scholar
Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google Scholar
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.Google ScholarCross Ref
Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber, et al. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
Bill G. Horne, C. Lee Giles, Pete C. Collingwood, School Of Computing, Man Sci, Peter Tino, and Peter Tino. 1998. Finite State Machines and Recurrent Neural Networks – Automata and Dynamical Systems Approaches. In Neural Networks and Pattern Recognition. Academic Press, 171–220.Google Scholar
Bo-Jian Hou and Zhi-Hua Zhou. 2018. Learning with Interpretable Structure from RNN. (oct 2018). arXiv: 1810.10708 http://arxiv.org/abs/1810.10708Google Scholar
Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, Fernando Mujica, Adam Coates, and Andrew Y. Ng. 2015. An Empirical Evaluation of Deep Learning on Highway Driving. CoRR abs/1504.01716 (2015).Google Scholar
arXiv: 1504.01716 http://arxiv.org/abs/1504.01716Google Scholar
Ian Jolliffe. 2011. Principal Component Analysis. In International Encyclopedia of Statistical Science. Springer, 1094–1096.Google Scholar
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). 1039–1049. Google ScholarDigital Library
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google Scholar
Shiyu Liang, Yixuan Li, and R Srikant. 2017. Enhancing the reliability of out-ofdistribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).Google Scholar
Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, and Simon See. 2018. Secure Deep Learning Engineering: A Software Quality Assurance Perspective. arXiv e-prints (Oct. 2018), arXiv:1810.04538.Google Scholar
Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614–618.Google Scholar
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. In Proc. of the 33rd ACM/IEEE Intl. Conf. on Automated Software Engineering (ASE 2018). 120–131. Google ScholarDigital Library
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. {n. d.}. DeepMutation: Mutation Testing of Deep Learning Systems. In 29th IEEE International Symposium on Software Reliability Engineering (ISSRE), Memphis, USA, Oct. 15-18, 2018. 100–111.Google Scholar
Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In NDSS. 24–27.Google Scholar
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60.Google Scholar
Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).Google Scholar
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. CVPR (2016), 2574–2582.Google Scholar
J. R. Norris. 1997. Markov Chains. Cambridge University Press.Google Scholar
Augustus Odena and Ian Goodfellow. 2018. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. (2018). arXiv: 1807.10875Google Scholar
Christian W Omlin and C Lee Giles. 1996. Extraction of rules from discrete-time recurrent neural networks. Neural networks 9, 1 (1996), 41–52. Google ScholarDigital Library
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In SOSP. 1–18. Google ScholarDigital Library
Pushpendre Rastogi, Ryan Cotterell, and Jason Eisner. 2016. Weighting Finite-State Transductions with Neural Context. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 623–633.Google ScholarCross Ref
Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google Scholar
Charles Spearman. 1987. The proof and measurement of association between two things. The American journal of psychology 100, 3/4 (1987), 441–471.Google Scholar
Youcheng Sun, Xiaowei Huang, and Daniel Kroening. 2018. Testing Deep Neural Networks. arXiv preprint arXiv:1803.04792 (2018).Google Scholar
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. (2018).Google Scholar
arXiv: 1805.00089Google Scholar
The BBC. 2016. AI image recognition fooled by single pixel change. https: //www.bbc.com/news/technology-41845878Google Scholar
The New York Times. 2016. Alexa and Siri Can Hear This Hidden Command. You Can’t. https://www.nytimes.com/2018/05/10/technology/alexa-siri-hiddencommand-audio-attacks.htmlGoogle Scholar
Joe F Thompson, Bharat K Soni, and Nigel P Weatherill. 1998. Handbook of Grid Generation. CRC press.Google Scholar
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In ICSE. ACM, 303–314. Google ScholarDigital Library
Uber Accident. 2018. After Fatal Uber Crash, a Self-Driving Start-Up Moves Forward. https://www.nytimes.com/2018/05/07/technology/uber-crashautonomous-driveai.htmlGoogle Scholar
Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2018. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. arXiv preprint arXiv:1812.05793 (2018). Google ScholarDigital Library
Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia, II, Xinyu Xing, Xue Liu, and C. Lee Giles. 2018. An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks. Neural Comput. 30, 9 (Sept. 2018), 2568–2591. Google ScholarDigital Library
Gail Weiss, Yoav Goldberg, and Eran Yahav. 2017. Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. arXiv preprint arXiv:1711.09576 (2017).Google Scholar
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In ISSTA. Google ScholarDigital Library
Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: Detecting Disagreements for Deep Neural Networks. In IJCAI.Google Scholar
Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).Google Scholar
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. arXiv e-prints (Jun 2019), arXiv:1906.10742.Google Scholar

Index Terms

DeepStellar: model-based quantitative analysis of stateful deep learning systems
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Deep Learning-Based Dynamic Community Discovery
Database Systems for Advanced Applications. DASFAA 2021 International Workshops
Abstract
Recurrent neural networks (RNNs) have been effective methods for time series analyses. The network representation learning model and method based on deep learning can excellently analyze and predict the community structure of social networks. ...
Read More
A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs): Deep Learning for N-IDSs

Recently, due to the advance and impressive results of deep learning techniques in the fields of image recognition, natural language processing and speech recognition for various long-standing artificial intelligence (AI) tasks, there has been a great ...
Read More
A hybrid deep learning model by combining convolutional neural network and recurrent neural network to detect forest fire
Abstract
Forest fire poses a serious threat to wildlife, environment, and all mankind. This threat has prompted the development of various intelligent and computer vision based systems to detect forest fire. This article proposes a novel hybrid deep ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
General Chairs:
Marlon Dumas
University of Tartu, Estonia
,
Dietmar Pfahl
University of Tartu, Estonia
,
Program Chairs:
Sven Apel
Saarland University, Germany
,
Alessandra Russo
Imperial College, UK
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
adversarial sample
model-based analysis
recurrent neural network
testing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 93
  Total Citations
  View Citations
- 1,569
  Total Downloads
- Downloads (Last 12 months)202
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DeepStellar: model-based quantitative analysis of stateful deep learning systems

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep Learning-Based Dynamic Community Discovery

A Comparative Analysis of Deep Learning Approaches for Network Intrusion Detection Systems (N-IDSs): Deep Learning for N-IDSs

A hybrid deep learning model by combining convolutional neural network and recurrent neural network to detect forest fire