research-article

Comparing developer-provided to user-provided tests for fault localization and automated program repair

Authors:
René Just

University of Massachusetts at Amherst, USA

University of Massachusetts at Amherst, USA
View Profile

,
Chris Parnin

North Carolina State University, USA

North Carolina State University, USA
View Profile

,
Ian Drosos

University of California at San Diego, USA

University of California at San Diego, USA
View Profile

,
Michael D. Ernst

University of Washington, USA

University of Washington, USA
View Profile

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and AnalysisJuly 2018Pages 287–297https://doi.org/10.1145/3213846.3213870

Published:12 July 2018Publication History

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 287–297

ABSTRACT

To realistically evaluate a software testing or debugging technique, it must be run on defects and tests that are characteristic of those a developer would encounter in practice. For example, to determine the utility of a fault localization or automated program repair technique, it could be run on real defects from a bug tracking system, using real tests that are committed to the version control repository along with the fixes. Although such a methodology uses real tests, it may not use tests that are characteristic of the information a developer or tool would have in practice. The tests that a developer commits after fixing a defect may encode more information than was available to the developer when initially diagnosing the defect.

This paper compares, both quantitatively and qualitatively, the developer-provided tests committed along with fixes (as found in the version control repository) versus the user-provided tests extracted from bug reports (as found in the issue tracker). It provides evidence that developer-provided tests are more targeted toward the defect and encode more information than user-provided tests. For fault localization, developer-provided tests overestimate a technique’s ability to rank a defective statement in the list of the top-n most suspicious statements. For automated program repair, developer-provided tests overestimate a technique’s ability to (efficiently) generate correct patches—user-provided tests lead to fewer correct patches and increased repair time. This paper also provides suggestions for improving the design and evaluation of fault localization and automated program repair techniques.

References

Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPARTMUTATION ’07). Washington, DC, USA, 89–98. http://dl.acm.org/citation.cfm? id=1308173.1308264 Google ScholarDigital Library
Aaron Ang, Alexandre Perez, Arie van Deursen, and Rui Abreu. 2017. Revisiting the Practical Use of Automated Software Fault Localization Techniques. IEEE, United States, 175–182.Google Scholar
J. Aranda and G. Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. In ICSE 2009, Proceedings of the 31st International Conference on Software Engineering. Vancouver, BC, Canada, 298– 308. Google ScholarDigital Library
Aritra Bandyopadhyay. 2011. Improving Spectrum-based Fault Localization Using Proximity-based Weighting of Test Cases. In ASE 2011: Proceedings of the 26th Annual International Conference on Automated Software Engineering (ASE ’11). Lawrence, KS, USA, 660–664. Google ScholarDigital Library
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What Makes a Good Bug Report?. In FSE 2008: Proceedings of the ACM SIGSOFT 16th Symposium on the Foundations of Software Engineering (SIGSOFT ’08/FSE-16). New York, NY, USA, 308–318. Google ScholarDigital Library
Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Extracting Structural Information from Bug Reports. In Proceedings of the 2008 International Working Conference on Mining Software Repositories (MSR ’08). New York, NY, USA, 27–30. Google ScholarDigital Library
Marcel Böhme, Ezekiel O. Soremekun, Sudipta Chattopadhyay, Emamurho Ugherughe, and Andreas Zeller. 2017. Where is the Bug and How is It Fixed? An Experiment with Practitioners. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). New York, NY, USA, 117–128. Google ScholarDigital Library
Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. 2005. Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and Its Potential Impact. Empirical Softw. Engg. 10, 4 (Oct. 2005), 405–435. Google ScholarDigital Library
Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Transactions on Software Engineering 41, 12 (2015), 1236–1256.Google ScholarDigital Library
ieeecomputersociety.org/10.1109/TSE.2015.2454513Google Scholar
Ralph Guderlei, René Just, and Christoph Schneckenburger. 2008. Benchmarking testing strategies with tools from mutation analysis. In International Conference on Software Testing Verification and Validation Workshop (ICSTW). 360–364. Google ScholarDigital Library
Mary Jean Harrold, Gregg Rothermel, Kent Sayre, Rui Wu, and Liu Yi. 2000. An empirical investigation of the relationship between spectra differences and regression faults. Software Testing, Verification and Reliability 10, 3 (2000), 171–194.Google ScholarCross Ref
Monica Hutchins, Herb Foster, Tarak Goradia, and Thomas Ostrand. 1994. Experiments of the Effectiveness of Dataflow- and Controlflow-based Test Adequacy Criteria. In Proceedings of the 16th International Conference on Software Engineering (ICSE ’94). Los Alamitos, CA, USA, 191–200. http://dl.acm.org/citation.cfm? id=257734.257766 Google ScholarDigital Library
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically Generating Commit Messages from Diffs Using Neural Machine Translation. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). Piscataway, NJ, USA, 135–146. Google ScholarDigital Library
James A. Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-localization Technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering (ASE ’05). New York, NY, USA, 273–282. Google ScholarDigital Library
René Just. 2014. The Major Mutation Framework: Efficient and Scalable Mutation Analysis for Java. In ISSTA 2014, Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 433–436. Google ScholarDigital Library
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014). New York, NY, USA, 437–440. Google ScholarDigital Library
2628055Google Scholar
René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). New York, NY, USA, 654–665. Google ScholarDigital Library
René Just and Franz Schweiggert. 2011. Automating unit and integration testing with partial oracles. Software Quality Journal (SQJ) 19, 4 (2011), 753–769. Google ScholarDigital Library
Fabian Keller, Lars Grunske, Simon Heiden, Antonio Filieri, Andre van Hoorn, and David Lo. 2017. A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In International Conference on Software Quality, Reliability and Security (QRS). 114–125.Google ScholarCross Ref
Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 165–176. Google ScholarDigital Library
Tien-Duy B. Le, Richard J. Oentaryo, and David Lo. 2015. Information Retrieval and Spectrum Based Bug Localization: Better Together. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). New York, NY, USA, 579–590. Google ScholarDigital Library
Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, and Michael D. Ernst. 2017. Program synthesis from natural language using recurrent neural networks. Technical Report UW-CSE-17-03-01. University of Washington Department of Computer Science and Engineering, Seattle, WA, USA.Google Scholar
Fan Long and Martin Rinard. 2016. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). New York, NY, USA, 702–713. Google ScholarDigital Library
Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library for Java. In Proceedings of ISSTA. Google ScholarDigital Library
Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. Comput. Surveys 51, 1, Article 17 (Jan. 2018), 24 pages. Google ScholarDigital Library
Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the Mutants: Mutating Faulty Programs for Fault Localization. In Proceedings of the 2014 IEEE International Conference on Software Testing, Verification, and Validation (ICST ’14). Washington, DC, USA, 153–162. Google ScholarDigital Library
Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. 2017. Do Automated Program Repair Techniques Repair Hard and Important Bugs? Empirical Software Engineering Journal (ESEM) (2017), 1–47.Google Scholar
Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: Mutation-based Fault Localization. Softw. Test. Verif. Reliab. 25, 5-7 (Aug. 2015), 605–628. org/10.1002/stvr.1509 Google ScholarDigital Library
Chris Parnin and Alessandro Orso. 2011. Are Automated Debugging Techniques Actually Helping Programmers?. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA ’11). New York, NY, USA, 199–209. Google ScholarDigital Library
Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and Improving Fault Localization. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). Piscataway, NJ, USA, 609–620. Google ScholarDigital Library
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). New York, NY, USA, 24–36. 2771783.2771791 Google ScholarDigital Library
Eric F. Rizzi, Sebastian Elbaum, and Matthew B. Dwyer. 2016. On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). New York, NY, USA, 132–143. Google ScholarDigital Library
R. K. Saha, J. Lawall, S. Khurshid, and D. E. Perry. 2014. On the Effectiveness of Information Retrieval Based Bug Localization for C Programs. In 2014 IEEE International Conference on Software Maintenance and Evolution. Jaipur, India, 161–170. Google ScholarDigital Library
Edward K Smith, Earl T Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? Overfitting in automated program repair. In ESEC/FSE 2015: The 10th joint meeting of the European Software Engineering Conference (ESEC) and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE). Bergamo, Italy, 532–543. Google ScholarDigital Library
Qianqian Wang, Chris Parnin, and Alessandro Orso. 2015. Evaluating the Usefulness of IR-based Fault Localization Techniques. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). New York, NY, USA, 1–11. Google ScholarDigital Library
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization. IEEE Trans. Softw. Eng. 42, 8 (Aug. 2016), 707–740. Google ScholarDigital Library
Franz Wotawa, Markus Stumptner, and Wolfgang Mayer. 2002. Model-Based Debugging or How to Diagnose Programs Automatically. In Proceedings of the 15th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems: Developments in Applied Artificial Intelligence (IEA/AIE ’02). London, UK, UK, 746–757. Google ScholarDigital Library
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). Piscataway, NJ, USA, 416–426. Google ScholarDigital Library
Baowen Xu, Ju Qian, Xiaofang Zhang, Zhongqiang Wu, and Lin Chen. 2005. A Brief Survey of Program Slicing. SIGSOFT Softw. Eng. Notes 30, 2 (March 2005), 1–36. Google ScholarDigital Library

Index Terms

Comparing developer-provided to user-provided tests for fault localization and automated program repair
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Can automated program repair refine fault localization? a unified debugging approach
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

A large body of research efforts have been dedicated to automated software debugging, including both automated fault localization and program repair. However, existing fault localization techniques have limited effectiveness on real-world software ...
Read More
Using automated program repair for evaluating the effectiveness of fault localization techniques
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Many techniques on automated fault localization (AFL) have been introduced to assist developers in debugging. Prior studies evaluate the localization technique from the viewpoint of developers: measuring how many benefits that developers can obtain ...
Read More
Combining mutation and fault localization for automated program debugging

Combining mutation and software fault localization for automated bug-fixing.Using only carefully selected mutant operators.Generating mutants only with respect to the most suspicious statements.Fixing software bugs without human intervention.Examining ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2018
379 pages
ISBN:9781450356992
DOI:10.1145/3213846
General Chair:
Frank Tip
Northeastern University, USA
,
Program Chair:
Eric Bodden
University of Paderborn, Germany / Fraunhofer IEM, Germany
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Functional
Author Tags
Automated program repair
Fault localization
Test effectiveness
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 430
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Comparing developer-provided to user-provided tests for fault localization and automated program repair

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Can automated program repair refine fault localization? a unified debugging approach

Using automated program repair for evaluating the effectiveness of fault localization techniques

Combining mutation and fault localization for automated program debugging