skip to main content
10.1145/3338906.3338954acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

DeepStellar: model-based quantitative analysis of stateful deep learning systems

Authors Info & Claims
Published:12 August 2019Publication History

ABSTRACT

Deep Learning (DL) has achieved tremendous success in many cutting-edge applications. However, the state-of-the-art DL systems still suffer from quality issues. While some recent progress has been made on the analysis of feed-forward DL systems, little study has been done on the Recurrent Neural Network (RNN)-based stateful DL systems, which are widely used in audio, natural languages and video processing, etc. In this paper, we initiate the very first step towards the quantitative analysis of RNN-based DL systems. We model RNN as an abstract state transition system to characterize its internal behaviors. Based on the abstract model, we design two trace similarity metrics and five coverage criteria which enable the quantitative analysis of RNNs. We further propose two algorithms powered by the quantitative measures for adversarial sample detection and coverage-guided test generation. We evaluate DeepStellar on four RNN-based systems covering image classification and automated speech recognition. The results demonstrate that the abstract model is useful in capturing the internal behaviors of RNNs, and confirm that (1) the similarity metrics could effectively capture the differences between samples even with very small perturbations (achieving 97% accuracy for detecting adversarial samples) and (2) the coverage criteria are useful in revealing erroneous behaviors (generating three times more adversarial samples than random testing and hundreds times more than the unrolling approach).

References

  1. 2018. Mozilla Common Voice. https://voice.mozilla.org/en. 2018. Mozilla’s DeepSpeech. https://github.com/mozilla/DeepSpeech. 2019. DeepStellar. https://sites.google.com/view/deepstellar/home 2019. Google cloud text-to-speech. https://cloud.google.com/text-to-speech/ 2019. kaggle: Audio data augmentation. https://www.kaggle.com/CVxTz/audiodata-augmentation 2019. Levenshtein Distance. https://en.wikipedia.org/wiki/Levenshtein_distanceGoogle ScholarGoogle Scholar
  2. Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, and et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265–283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nicholas Carlini and David Wagner. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. (jan 2018). arXiv: 1801.01944 http://arxiv.org/abs/ 1801.01944Google ScholarGoogle Scholar
  4. Adelmo Luis Cechin, Denise Regina Pechmann Simon, and Klaus Stertz. 2003. State Automata Extraction from Recurrent Neural Nets Using k-Means and Fuzzy Clustering. In Proceedings of the XXIII International Conference of the Chilean Computer Science Society. 73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).Google ScholarGoogle Scholar
  6. François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google ScholarGoogle Scholar
  7. Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In CVPR. 3642–3649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2019. SirenAttack: Generating Adversarial Audio for End-to-End Acoustic Systems. arXiv preprint arXiv:1901.07846 (2019).Google ScholarGoogle Scholar
  9. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861–874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).Google ScholarGoogle Scholar
  11. Arthur Gill. 1962. Introduction to the Theory of Finite-State Machines. McGraw-Hill. https://books.google.com.sg/books?id=2IzQAAAAMAAJGoogle ScholarGoogle Scholar
  12. Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations. http://arxiv.org/abs/1412.6572Google ScholarGoogle Scholar
  13. Google Accident. 2016. A Google self-driving car caused a crash for the first time. https://www.theverge.com/2016/2/29/11134344/google-self-driving-carcrash-reportGoogle ScholarGoogle Scholar
  14. Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google ScholarGoogle Scholar
  15. Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber, et al. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.Google ScholarGoogle Scholar
  17. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bill G. Horne, C. Lee Giles, Pete C. Collingwood, School Of Computing, Man Sci, Peter Tino, and Peter Tino. 1998. Finite State Machines and Recurrent Neural Networks – Automata and Dynamical Systems Approaches. In Neural Networks and Pattern Recognition. Academic Press, 171–220.Google ScholarGoogle Scholar
  19. Bo-Jian Hou and Zhi-Hua Zhou. 2018. Learning with Interpretable Structure from RNN. (oct 2018). arXiv: 1810.10708 http://arxiv.org/abs/1810.10708Google ScholarGoogle Scholar
  20. Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, Fernando Mujica, Adam Coates, and Andrew Y. Ng. 2015. An Empirical Evaluation of Deep Learning on Highway Driving. CoRR abs/1504.01716 (2015).Google ScholarGoogle Scholar
  21. arXiv: 1504.01716 http://arxiv.org/abs/1504.01716Google ScholarGoogle Scholar
  22. Ian Jolliffe. 2011. Principal Component Analysis. In International Encyclopedia of Statistical Science. Springer, 1094–1096.Google ScholarGoogle Scholar
  23. Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). 1039–1049. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google ScholarGoogle Scholar
  25. Shiyu Liang, Yixuan Li, and R Srikant. 2017. Enhancing the reliability of out-ofdistribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).Google ScholarGoogle Scholar
  26. Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, and Simon See. 2018. Secure Deep Learning Engineering: A Software Quality Assurance Perspective. arXiv e-prints (Oct. 2018), arXiv:1810.04538.Google ScholarGoogle Scholar
  27. Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614–618.Google ScholarGoogle Scholar
  28. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity Testing Criteria for Deep Learning Systems. In Proc. of the 33rd ACM/IEEE Intl. Conf. on Automated Software Engineering (ASE 2018). 120–131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. {n. d.}. DeepMutation: Mutation Testing of Deep Learning Systems. In 29th IEEE International Symposium on Software Reliability Engineering (ISSRE), Memphis, USA, Oct. 15-18, 2018. 100–111.Google ScholarGoogle Scholar
  30. Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In NDSS. 24–27.Google ScholarGoogle Scholar
  31. Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60.Google ScholarGoogle Scholar
  32. Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).Google ScholarGoogle Scholar
  33. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. 2016. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. CVPR (2016), 2574–2582.Google ScholarGoogle Scholar
  34. J. R. Norris. 1997. Markov Chains. Cambridge University Press.Google ScholarGoogle Scholar
  35. Augustus Odena and Ian Goodfellow. 2018. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. (2018). arXiv: 1807.10875Google ScholarGoogle Scholar
  36. Christian W Omlin and C Lee Giles. 1996. Extraction of rules from discrete-time recurrent neural networks. Neural networks 9, 1 (1996), 41–52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In SOSP. 1–18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Pushpendre Rastogi, Ryan Cotterell, and Jason Eisner. 2016. Weighting Finite-State Transductions with Neural Context. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 623–633.Google ScholarGoogle ScholarCross RefCross Ref
  39. Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google ScholarGoogle Scholar
  40. Charles Spearman. 1987. The proof and measurement of association between two things. The American journal of psychology 100, 3/4 (1987), 441–471.Google ScholarGoogle Scholar
  41. Youcheng Sun, Xiaowei Huang, and Daniel Kroening. 2018. Testing Deep Neural Networks. arXiv preprint arXiv:1803.04792 (2018).Google ScholarGoogle Scholar
  42. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. (2018).Google ScholarGoogle Scholar
  43. arXiv: 1805.00089Google ScholarGoogle Scholar
  44. The BBC. 2016. AI image recognition fooled by single pixel change. https: //www.bbc.com/news/technology-41845878Google ScholarGoogle Scholar
  45. The New York Times. 2016. Alexa and Siri Can Hear This Hidden Command. You Can’t. https://www.nytimes.com/2018/05/10/technology/alexa-siri-hiddencommand-audio-attacks.htmlGoogle ScholarGoogle Scholar
  46. Joe F Thompson, Bharat K Soni, and Nigel P Weatherill. 1998. Handbook of Grid Generation. CRC press.Google ScholarGoogle Scholar
  47. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In ICSE. ACM, 303–314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Uber Accident. 2018. After Fatal Uber Crash, a Self-Driving Start-Up Moves Forward. https://www.nytimes.com/2018/05/07/technology/uber-crashautonomous-driveai.htmlGoogle ScholarGoogle Scholar
  49. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2018. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. arXiv preprint arXiv:1812.05793 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia, II, Xinyu Xing, Xue Liu, and C. Lee Giles. 2018. An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks. Neural Comput. 30, 9 (Sept. 2018), 2568–2591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Gail Weiss, Yoav Goldberg, and Eran Yahav. 2017. Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. arXiv preprint arXiv:1711.09576 (2017).Google ScholarGoogle Scholar
  52. Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In ISSTA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xiaofei Xie, Lei Ma, Haijun Wang, Yuekang Li, Yang Liu, and Xiaohong Li. 2019. DiffChaser: Detecting Disagreements for Deep Neural Networks. In IJCAI.Google ScholarGoogle Scholar
  54. Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).Google ScholarGoogle Scholar
  55. Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. arXiv e-prints (Jun 2019), arXiv:1906.10742.Google ScholarGoogle Scholar

Index Terms

  1. DeepStellar: model-based quantitative analysis of stateful deep learning systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        August 2019
        1264 pages
        ISBN:9781450355728
        DOI:10.1145/3338906

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader