skip to main content
10.1145/3236024.3236032acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

NAR-miner: discovering negative association rules from code for bug detection

Published:26 October 2018Publication History

ABSTRACT

Inferring programming rules from source code based on data mining techniques has been proven to be effective to detect software bugs. Existing studies focus on discovering positive rules in the form of AB, indicating that when operation A appears, operation B should also be here. Unfortunately, the negative rules (A ⇒ ¬ B), indicating the mutual suppression or conflict relationships among program elements, have not gotten the attention they deserve. In fact, violating such negative rules can also result in serious bugs.

In this paper, we propose a novel method called NAR-Miner to automatically extract negative association programming rules from large-scale systems, and detect their violations to find bugs. However, mining negative rules faces a more serious rule explosion problem than mining positive ones. Most of the obtained negative rules are uninteresting and can lead to unacceptable false alarms. To address the issue, we design a semantics-constrained mining algorithm to focus rule mining on the elements with strong semantic relationships. Furthermore, we introduce information entropy to rank candidate negative rules and highlight the interesting ones. Consequently, we effectively mitigate the rule explosion problem. We implement NAR-Miner and apply it to a Linux kernel (v4.12-rc6). The experiments show that the uninteresting rules are dramatically reduced and 17 detected violations have been confirmed as real bugs and patched by kernel community. We also apply NAR-Miner to PostgreSQL, OpenSSL and FFmpeg and discover six real bugs.

References

  1. Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. 2007. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2007, Dubrovnik, Croatia, September 3-7, 2007. 25–34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile. 487–499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: principles, techniques, and tools. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Glenn Ammons, Rastislav Bodík, and James R Larus. 2002. Mining specifications. ACM Sigplan Notices 37, 1 (2002), 4–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D. Ernst. 2011. Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). 267–277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pan Bian, Bin Liang, Yan Zhang, Chaoqun Yang, Wenchang Shi, and Yan Cai. 2018. Detecting Bugs by Discovering Expectations and Their Violations. IEEE Transactions on Software Engineering (2018).Google ScholarGoogle Scholar
  7. 2816639Google ScholarGoogle Scholar
  8. Ray-Yaung Chang, Andy Podgurski, and Jiong Yang. 2007. Finding what’s not there: a new approach to revealing neglected conditions in software. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2007, London, UK, July 9-12, 2007. 163–173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Xiao Cheng, Zhiming Peng, Lingxiao Jiang, Hao Zhong, Haibo Yu, and Jianjun Zhao. 2016. Mining revision histories to detect cross-language clones without intermediates. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 696– 701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brian Chess and Gary McGraw. 2004. Static analysis for security. IEEE Security & Privacy 2, 6 (2004), 76–79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. M. Clarke, E. A. Emerson, and A. P. Sistla. 1986. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Program. Lang. Syst. 8, 2 (April 1986), 244–263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brian Cole, Daniel Hakim, David Hovemeyer, Reuven Lazarus, William Pugh, and Kristin Stephens. 2006. Improving your software using static analysis to find bugs. In Companion to the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2006, October 22-26, 2006, Portland, Oregon, USA. 673–674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (1991), 451–490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. 2000. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of 4th Symposium on Operating System Design and Implementation (OSDI 2000), San Diego, California, USA, October 23-25, 2000. 1–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In Proceedings of the Proceedings of the 18th ACM Symposium on Operating System Principles, SOSP 2001, Chateau Lake Louise, Banff, Alberta, Canada, October 21-24, 2001. 57–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mark Gabel and Zhendong Su. 2008. Javert: Fully Automatic Mining of General Temporal Properties from Dynamic Traces. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT ’08/FSE-16). 339–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler. 2002. A system and language for building system-specific, static analyses. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Berlin, Germany, June 17-19, 2002. 69–82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data mining: concepts and techniques, 3rd edition. Morgan Kaufmann, Chapter 13.3, 607–615. http://hanj. cs.illinois.edu/bk3/Google ScholarGoogle Scholar
  20. Suman Jana, Yuan Jochen Kang, Samuel Roth, and Baishakhi Ray. 2016. Automatically detecting error handling bugs using error specifications. In Proceedings of the 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016. 345–362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuan Jochen Kang, Baishakhi Ray, and Suman Jana. 2016. APEx: automated inference of error specifications for C APIs. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 472–482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Samantha Syeda Khairunnesa, Hoan Anh Nguyen, Tien N. Nguyen, and Hridesh Rajan. 2017. Exploiting Implicit Beliefs to Resolve Sparse Usage Problem in Usage-based Specification Mining. Proc. ACM Program. Lang. 1, OOPSLA, Article 83 (2017), 83:1–83:29 pages.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ted Kremenek, Paul Twohey, Godmar Back, Andrew Ng, and Dawson Engler. 2006. From uncertainty to belief: Inferring the specification within. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI ’06), Seattle, WA, USA, November 6-8. 161–176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ivo Krka, Yuriy Brun, and Nenad Medvidovic. 2014. Automatic Mining of Specifications from Invocation Traces and Method Invariants. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). 178–189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. 2004. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA, December 6-8, 2004. 289–302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 306–315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: mining more bugs by reducing noise interference. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 333–344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: finding common error patterns by mining software revision histories. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005 (ESEC/FSE-13). 296–305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Benjamin Livshits and Monica S. Lam. 2005. Finding Security Vulnerabilities in Java Applications with Static Analysis. In Proceedings of the 14th USENIX Security Symposium, Baltimore, MD, USA, July 31 - August 5, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. David Lo, Siau-Cheng Khoo, and Chao Liu. 2008. Mining past-time temporal rules from execution traces. In Proceedings of the 2008 International Workshop on Dynamic Analysis: held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2008), WODA 2008, Seattle, Washington, USA, July 21, 2008. 50–56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A Popa, and Yuanyuan Zhou. 2007. MUVI: automatically inferring multivariable access correlations and detecting related semantic and concurrency bugs. In Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, October 14-17, 2007. 103–116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Molnar, Xue Cong Li, and David A. Wagner. 2009. Dynamic test generation to find integer bugs in x86 binary Linux programs. In Proceedings of the 18th USENIX Security Symposium, Montreal, Canada, August 10-14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Madanlal Musuvathi, David Y. W. Park, Andy Chou, Dawson R. Engler, and David L. Dill. 2002. CMC: A pragmatic approach to model checking real code. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 75–88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hoan Anh Nguyen, Robert Dyer, Tien N. Nguyen, and Hridesh Rajan. 2014. Mining Preconditions of APIs in Large-scale Code Corpus. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). 166–177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’09). 383–392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. Vccfinder: finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-6, 2015. 426–437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Michael Pradel, Ciera Jaspan, Jonathan Aldrich, and Thomas R Gross. 2012. Statically checking API protocol conformance with mined multi-object specifications. In Proceedings of the 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. 925–935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Corina S. Păsăreanu and Neha Rungta. 2010. Symbolic PathFinder: Symbolic Execution of Java Bytecode. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE ’10). 179–180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Static specification inference using predicate mining. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007. 123–134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bug report list for FFmpeg. March 2018. https://trac.ffmpeg.org.Google ScholarGoogle Scholar
  41. Bug report list for OpenSSL. March 2018. https://github.com/openssl/openssl/ issues. ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA P. Bian, B. Liang, W. Shi, J. Huang, and Y. CaiGoogle ScholarGoogle Scholar
  42. Bug report mailing list for PostgreSQL. March 2018. https://www.postgresql.org/ list/pgsql-bugs.Google ScholarGoogle Scholar
  43. Cindy Rubio-González and Ben Liblit. 2011. Defective error/pointer interactions in the linux kernel. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 111–121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ashok Savasere, Edward Omiecinski, and Shamkant B. Navathe. 1998. Mining for strong negative associations in a large database of customer transactions. In Proceedings of the Fourteenth International Conference on Data Engineering, Orlando, Florida, USA, February 23-27, 1998. 494–502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Boya Sun, Gang Shu, Andy Podgurski, and Brian Robinson. 2012. Extending static analysis by mining project-specific rules. In Proceedings of the 34th International Conference on Software Engineering, ICSE 2012, Zurich, Switzerland, June 2-9, 2012. 1054–1063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Laszlo Szathmary, Amedeo Napoli, and Petko Valtchev. 2007. Towards rare itemset mining. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), Patras, Greece, October 29-31, 2007. 305– 312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: bugs or bad comments?*. In Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, October 14-17, 2007. 145–158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lin Tan, Xiaolan Zhang, Xiao Ma, Weiwei Xiong, and Yuanyuan Zhou. 2008. AutoISES: automatically inferring security specification and detecting violations. In Proceedings of the 17th USENIX Security Symposium, July 28-August 1, 2008, San Jose, CA, USA. 379–394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu, HI, USA, May 21-28, 2011. 11–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Suresh Thummalapenta and Tao Xie. 2009. Alattin: Mining alternative patterns for detecting neglected conditions. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, Auckland, New Zealand, November 16-20, 2009. 283–294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. John Toman and Dan Grossman. 2017. Taming the Static Analysis Beast. In Proceedings of the 2nd Summit on Advances in Programming Languages, SNAPL 2017, Asilomar, CA, USA, 7-10, May, 2017. 18:1–18:14.Google ScholarGoogle Scholar
  52. Olivier Vandecruys, David Martens, Bart Baesens, Christophe Mues, Manu De Backer, and Raf Haesen. 2008. Mining software repositories for comprehensible software fault prediction models. Journal of Systems and software 81, 5 (2008), 823–839. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 708–719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Andrzej Wasylkowski and Andreas Zeller. 2011. Mining temporal specifications from object usage. Automated Software Engineering 18, 3-4 (2011), 263–292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting Object Usage Anomalies. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE ’07). 35–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Westley Weimer and George Necula. 2005. Mining temporal specifications for error detection. Proceedings of the 11th Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2005, Edinburgh, UK, April 4-8, 2005, 461–476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xindong Wu, Chengqi Zhang, and Shichao Zhang. 2004. Efficient mining of both positive and negative association rules. ACM Trans. Inf. Syst. 22, 3 (July 2004), 381–405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015. 797–812. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. 2013. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS’13, Berlin, Germany, November 4-8, 2013. 499–510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. 2006. Using model checking to find serious file system errors. ACM Trans. Comput. Syst. 24, 4 (Nov. 2006), 393–423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISan: sanitizing API usages through semantic cross-checking. In Proceedings of the 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016. 363–378. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. NAR-miner: discovering negative association rules from code for bug detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
      October 2018
      987 pages
      ISBN:9781450355735
      DOI:10.1145/3236024

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader