skip to main content
10.1145/3213846.3213870acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections

Comparing developer-provided to user-provided tests for fault localization and automated program repair

Published:12 July 2018Publication History

ABSTRACT

To realistically evaluate a software testing or debugging technique, it must be run on defects and tests that are characteristic of those a developer would encounter in practice. For example, to determine the utility of a fault localization or automated program repair technique, it could be run on real defects from a bug tracking system, using real tests that are committed to the version control repository along with the fixes. Although such a methodology uses real tests, it may not use tests that are characteristic of the information a developer or tool would have in practice. The tests that a developer commits after fixing a defect may encode more information than was available to the developer when initially diagnosing the defect.

This paper compares, both quantitatively and qualitatively, the developer-provided tests committed along with fixes (as found in the version control repository) versus the user-provided tests extracted from bug reports (as found in the issue tracker). It provides evidence that developer-provided tests are more targeted toward the defect and encode more information than user-provided tests. For fault localization, developer-provided tests overestimate a technique’s ability to rank a defective statement in the list of the top-n most suspicious statements. For automated program repair, developer-provided tests overestimate a technique’s ability to (efficiently) generate correct patches—user-provided tests lead to fewer correct patches and increased repair time. This paper also provides suggestions for improving the design and evaluation of fault localization and automated program repair techniques.

References

  1. Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPARTMUTATION ’07). Washington, DC, USA, 89–98. http://dl.acm.org/citation.cfm? id=1308173.1308264 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aaron Ang, Alexandre Perez, Arie van Deursen, and Rui Abreu. 2017. Revisiting the Practical Use of Automated Software Fault Localization Techniques. IEEE, United States, 175–182.Google ScholarGoogle Scholar
  3. J. Aranda and G. Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. In ICSE 2009, Proceedings of the 31st International Conference on Software Engineering. Vancouver, BC, Canada, 298– 308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aritra Bandyopadhyay. 2011. Improving Spectrum-based Fault Localization Using Proximity-based Weighting of Test Cases. In ASE 2011: Proceedings of the 26th Annual International Conference on Automated Software Engineering (ASE ’11). Lawrence, KS, USA, 660–664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What Makes a Good Bug Report?. In FSE 2008: Proceedings of the ACM SIGSOFT 16th Symposium on the Foundations of Software Engineering (SIGSOFT ’08/FSE-16). New York, NY, USA, 308–318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Extracting Structural Information from Bug Reports. In Proceedings of the 2008 International Working Conference on Mining Software Repositories (MSR ’08). New York, NY, USA, 27–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller. 2017. Where is the Bug and How is It Fixed? An Experiment with Practitioners. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). New York, NY, USA, 117–128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. 2005. Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and Its Potential Impact. Empirical Softw. Engg. 10, 4 (Oct. 2005), 405–435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Transactions on Software Engineering 41, 12 (2015), 1236–1256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. ieeecomputersociety.org/10.1109/TSE.2015.2454513Google ScholarGoogle Scholar
  11. Ralph Guderlei, René Just, and Christoph Schneckenburger. 2008. Benchmarking testing strategies with tools from mutation analysis. In International Conference on Software Testing Verification and Validation Workshop (ICSTW). 360–364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mary Jean Harrold, Gregg Rothermel, Kent Sayre, Rui Wu, and Liu Yi. 2000. An empirical investigation of the relationship between spectra differences and regression faults. Software Testing, Verification and Reliability 10, 3 (2000), 171–194.Google ScholarGoogle ScholarCross RefCross Ref
  13. Monica Hutchins, Herb Foster, Tarak Goradia, and Thomas Ostrand. 1994. Experiments of the Effectiveness of Dataflow- and Controlflow-based Test Adequacy Criteria. In Proceedings of the 16th International Conference on Software Engineering (ICSE ’94). Los Alamitos, CA, USA, 191–200. http://dl.acm.org/citation.cfm? id=257734.257766 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically Generating Commit Messages from Diffs Using Neural Machine Translation. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). Piscataway, NJ, USA, 135–146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. James A. Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-localization Technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering (ASE ’05). New York, NY, USA, 273–282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. René Just. 2014. The Major Mutation Framework: Efficient and Scalable Mutation Analysis for Java. In ISSTA 2014, Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 433–436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014). New York, NY, USA, 437–440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 2628055Google ScholarGoogle Scholar
  19. René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). New York, NY, USA, 654–665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. René Just and Franz Schweiggert. 2011. Automating unit and integration testing with partial oracles. Software Quality Journal (SQJ) 19, 4 (2011), 753–769. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fabian Keller, Lars Grunske, Simon Heiden, Antonio Filieri, Andre van Hoorn, and David Lo. 2017. A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In International Conference on Software Quality, Reliability and Security (QRS). 114–125.Google ScholarGoogle ScholarCross RefCross Ref
  22. Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 165–176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tien-Duy B. Le, Richard J. Oentaryo, and David Lo. 2015. Information Retrieval and Spectrum Based Bug Localization: Better Together. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). New York, NY, USA, 579–590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, and Michael D. Ernst. 2017. Program synthesis from natural language using recurrent neural networks. Technical Report UW-CSE-17-03-01. University of Washington Department of Computer Science and Engineering, Seattle, WA, USA.Google ScholarGoogle Scholar
  25. Fan Long and Martin Rinard. 2016. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). New York, NY, USA, 702–713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library for Java. In Proceedings of ISSTA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. Comput. Surveys 51, 1, Article 17 (Jan. 2018), 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the Mutants: Mutating Faulty Programs for Fault Localization. In Proceedings of the 2014 IEEE International Conference on Software Testing, Verification, and Validation (ICST ’14). Washington, DC, USA, 153–162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. 2017. Do Automated Program Repair Techniques Repair Hard and Important Bugs? Empirical Software Engineering Journal (ESEM) (2017), 1–47.Google ScholarGoogle Scholar
  30. Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: Mutation-based Fault Localization. Softw. Test. Verif. Reliab. 25, 5-7 (Aug. 2015), 605–628. org/10.1002/stvr.1509 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Chris Parnin and Alessandro Orso. 2011. Are Automated Debugging Techniques Actually Helping Programmers?. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA ’11). New York, NY, USA, 199–209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and Improving Fault Localization. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). Piscataway, NJ, USA, 609–620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). New York, NY, USA, 24–36. 2771783.2771791 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Eric F. Rizzi, Sebastian Elbaum, and Matthew B. Dwyer. 2016. On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). New York, NY, USA, 132–143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. K. Saha, J. Lawall, S. Khurshid, and D. E. Perry. 2014. On the Effectiveness of Information Retrieval Based Bug Localization for C Programs. In 2014 IEEE International Conference on Software Maintenance and Evolution. Jaipur, India, 161–170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Edward K Smith, Earl T Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? Overfitting in automated program repair. In ESEC/FSE 2015: The 10th joint meeting of the European Software Engineering Conference (ESEC) and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE). Bergamo, Italy, 532–543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Qianqian Wang, Chris Parnin, and Alessandro Orso. 2015. Evaluating the Usefulness of IR-based Fault Localization Techniques. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). New York, NY, USA, 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization. IEEE Trans. Softw. Eng. 42, 8 (Aug. 2016), 707–740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Franz Wotawa, Markus Stumptner, and Wolfgang Mayer. 2002. Model-Based Debugging or How to Diagnose Programs Automatically. In Proceedings of the 15th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems: Developments in Applied Artificial Intelligence (IEA/AIE ’02). London, UK, UK, 746–757. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). Piscataway, NJ, USA, 416–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Baowen Xu, Ju Qian, Xiaofang Zhang, Zhongqiang Wu, and Lin Chen. 2005. A Brief Survey of Program Slicing. SIGSOFT Softw. Eng. Notes 30, 2 (March 2005), 1–36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparing developer-provided to user-provided tests for fault localization and automated program repair

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
      July 2018
      379 pages
      ISBN:9781450356992
      DOI:10.1145/3213846
      • General Chair:
      • Frank Tip,
      • Program Chair:
      • Eric Bodden

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate58of213submissions,27%

      Upcoming Conference

      ISSTA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader