Skip to main content
Log in

A correlation study between automated program repair and test-suite metrics

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Automated program repair is increasingly gaining traction, due to its potential to reduce debugging cost greatly. The feasibility of automated program repair has been shown in a number of works, and the research focus is gradually shifting toward the quality of generated patches. One promising direction is to control the quality of generated patches by controlling the quality of test-suites used for automated program repair. In this paper, we ask the following research question: “Can traditional test-suite metrics proposed for the purpose of software testing also be used for the purpose of automated program repair?” We empirically investigate whether traditional test-suite metrics such as statement/branch coverage and mutation score are effective in controlling the reliability of generated repairs (the likelihood that repairs cause regression errors). We conduct the largest-scale experiments of this kind to date with real-world software, and for the first time perform a correlation study between various test-suite metrics and the reliability of generated repairs. Our results show that in general, with the increase of traditional test suite metrics, the reliability of repairs tend to increase. In particular, such a trend is most strongly observed in statement coverage. Our results imply that the traditional test suite metrics proposed for software testing can also be used for automated program repair to improve the reliability of repairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. One exception is DirectFix (Mechtaev et al. 2015) where fault localization and edit parts are fused.

  2. A mutant m is considered killed when the test result of m for at least on test in the provided test-suite is different from the test result of the original program for the same test.

  3. Only positive tests are considered; an output change for negative tests is not a regression.

  4. We used the original GenProg benchmark. At the time of writing this paper, the benchmark was updated after a few problems in the test scripts of php and libtiff are reported in Qi et al. (2015).

  5. The grep subject in CoREBench contains real errors unlike the grep in SIR that contains seeded errors.

  6. While php contains 8471 tests, we randomly selected 200 tests out of them to deal with long running time of the php tests.

  7. tot_info includes non-linear arithmetic expressions which are not currently supported by the underlying SMT solver SemFix uses.

  8. We extended its parser to handle the large subjects (php, libtiff, grep, and findutils).

  9. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

  10. The minimum statement/branch coverage of php is 0 because some tests do not execute the marked source files.

  11. Min/Max/Mean values of the table are different from those of Table 3, because there we consider only test-suites from which repairs are generated, whereas in Table 5, we consider all test-suites.

  12. The “coverage” referred to in Smith et al. (2015) essentially means how many tests of a given test-universe are covered.

References

  • Andrews JH, Briand LC, Labiche Y, Namin AS (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624

    Article  Google Scholar 

  • Artzi S, Dolby J, Tip F, Pistoia M (2010) Directed test generation for effective fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pp 49–60

  • Assiri FY, Bieman JM (2014) An assessment of the quality of automated program operator repair. In: Proceedings of the 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation, ICSE ’14, pp 273–282

  • Baudry B, Fleurey F, Le Traon Y (2006) Improving test suites for efficient fault localization. In: 82–91

  • Böhme M, Roychoudhury A (2014) CoREBench: Studying complexity of regression errors. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA ’14, pp 105–115

  • Böhme M, Oliveira BCdS, Roychoudhury A (2013a) Partition-based regression verification. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 302–311

  • Böhme M, Oliveira BCdS, Roychoudhury A (2013b) Regression tests to expose change interaction errors. In: Proceedings of the 2013 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’13, pp 334–344

  • Cadar C, Engler D (2005) Execution generated test cases: How to make systems code crash itself. In: Proceedings of the 12th International Conference on Model Checking Software, SPIN ’05, pp 2–23

  • Cadar C, Dunbar D, Engler D (2008). In: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’ 08, pp 209–224

  • Dallmeier V, Zeller A, Meyer B (2009) Generating fixes from object behavior anomalies. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pp 550–554

  • Debroy V, Wong WE (2010) Using mutation to automatically suggest fixes for faulty programs. In: Proceedings of the Third International Conference on Software Testing, Verification and Validation, ICST ’10, pp 65–74

  • Debroy V, Wong WE (2014) Combining mutation and fault localization for automated program debugging. J Syst Softw 90:45–60

    Article  Google Scholar 

  • Do H, Elbaum SG, Rothermel G (2005) Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir Softw Eng 10(4):405–435

    Article  Google Scholar 

  • Elkarablieh B, Khurshid S (2008) Juzi: A tool for repairing complex data structures. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pp 855–858

  • Godefroid P, Klarlund N, Sen K (2005) DART: Directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pp 213–223

  • Gopinath D, Malik MZ, Khurshid S (2011) Specification-based program repair using SAT. In: Proceedings of the 17th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Part of the Joint European Conferences on Theory and Practice of Software, TACAS ’11/ETAPS ’11, pp 173–188

  • He H, Gupta N (2004) Automated debugging using path-based weakest preconditions. In: Proceedings of the 7th International Conference on Fundamental Approaches to Software Engineering, FASE ’04, pp 267–280

  • Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678

    Article  Google Scholar 

  • Jobstmann B, Griesmayer A, Bloem R (2005) Program repair as a game. In: Proceedings of the 17th International Conference on Computer Aided Verification, CAV ’05, pp 226–238

  • Jones JA, Harrold MJ, Stasko JT (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pp 467–477

  • Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, ASE ’15, pp 295–306

  • Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33 (3):239–251

    Article  MathSciNet  MATH  Google Scholar 

  • Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 802–811

  • Kong X, Zhang L, Wong WE, Li B (2015) Experience report: How do techniques, programs, and tests impact automated program repair?. In: Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering, ISSRE ’15, pp 194–204

  • Könighofer R, Bloem R (2011) Automated error localization and correction for imperative programs. In: Proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD ’11, pp 91–100

  • Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012a) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 3–13

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2012b) GenProg: A generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72

  • Le Goues C, Forrest S, Weimer W (2013) Current challenges in automatic software repair. Softw Qual J 21(3):421–443

    Article  Google Scholar 

  • Liblit B, Aiken A, Zheng AX, Jordan MI (2003) Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming Language Design and Implementation, PLDI ’03, pp 141–154

  • Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’15, pp 166–178

  • Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 702–713

  • Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16, pp 298–312

  • Long F, Sidiroglou-Douskos S, Rinard M (2014) Automatic runtime error repair and containment via recovery shepherding. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp 227–238

  • Maldonado JC, Delamaro ME, Fabbri SCPF, Simão A d S, Sugeta T, Vincenzi AMR, Masiero PC (2001) Proteum: A family of tools to support specification and program testing based on mutation. In: Wong W E (ed) Mutation Testing for the New Century, Kluwer Academic Publishers, Norwell, pp 113–116

  • Mechtaev S, Yi J, Roychoudhury A (2015) DirectFix: Looking for simple program repairs. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, ICSE ’15, pp 448–458

  • Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: Scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 691–701

  • Miller W, Spooner DL (1976) Automatic generation of floating-point test data. IEEE Trans Softw Eng 2(3):223–226

    Article  MathSciNet  Google Scholar 

  • Namin AS, Andrews JH (2009) The influence of size and coverage on test suite effectiveness. In: Proceedings of the 8th International Symposium on Software Testing and Analysis, ISSTA ’09, pp 57–68

  • Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) SemFix: Program repair via semantic analysis. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 772–781

  • Pearson K (1895) Note on regression and inheritance in the case of two parents. Proc Royal Soc Lond 58:240–242

    Article  Google Scholar 

  • Pei Y, Furia C, Nordio M, Wei Y, Meyer B, Zeller A (2014) Automated fixing of programs with contracts. IEEE Trans Softw Eng 40(5):427–449

    Article  Google Scholar 

  • Perkins JH, Kim S, Larsen S, Amarasinghe S, Bachrach J, Carbin M, Pacheco C, Sherwood F, Sidiroglou S, Sullivan G, Wong WF, Zibin Y, Ernst MD, Rinard M (2009) Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pp 87–102

  • Person S, Yang G, Rungta N, Khurshid S (2011) Directed incremental symbolic execution. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pp 504–515

  • Qi Y, Mao X, Lei Y (2013) Efficient automated program repair through fault-recorded testing prioritization. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM ’13, pp 180–189

  • Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 254–265

  • Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA, pp 24–36

  • Samimi H, Aung ED, Millstein T (2010) Falling back on executable specifications. In: Proceedings of the 24th European Conference on Object-oriented Programming, ECOOP’10, pp 552–576

  • Samimi H, Schäfer M, Artzi S, Millstein T, Tip F, Hendren L (2012) Automated repair of HTML generation errors in PHP applications using string constraint solving. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 277–287

  • Santelices R, Chittimalli PK, Apiwattanapong T, Orso A, Harrold MJ (2008) Test-suite augmentation for evolving software. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE ’08, pp 218–227

  • Shoenauer M, Xanthakis S (1993) Constrained GA optimization. In: Proceedings of the 5th International Conference on Genetic Algorithms, ICGA ’93, pp 573–580

  • Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? overfitting in automated program repair. In: Proceedings of the 2015 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’15, pp 532–543

  • Tan SH, Roychoudhury A (2015) relifix: Automated repair of software regressions. In: Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, ICSE ’15, pp 471–482

  • Tan SH, Yoshida H, Prasad MR, Roychoudhury A (2016) Anti-patterns in search-based program repair. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE’16, pp 727–738

  • Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: Models and first results. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE ’13, pp 356–366

  • White DR, Arcuri A, Clark JA (2011) Evolutionary improvement of programs. IEEE Trans Evol Comput 15(4):515–538

    Article  Google Scholar 

  • Xuan J, Martinez M, Demarco F, Clement M, Marcote SRL, Durieux T, Berre DL, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in Java programs. IEEE Trans Softw Eng 43(1):34–55

    Article  Google Scholar 

  • Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 919–930

Download references

Acknowledgements

This research is supported in part by the National Research Foundation, Prime Minister’s Office, Singapore under its National Cybersecurity R&D Program (TSUNAMi project, Award No. NRF2014NCR-NCR001-21) and administered by the National Cybersecurity R&D Directorate. The first author thanks Innopolis University for its support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jooyong Yi.

Additional information

Communicated by: Martin Monperrus and Westley Weimer

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yi, J., Tan, S.H., Mechtaev, S. et al. A correlation study between automated program repair and test-suite metrics. Empir Software Eng 23, 2948–2979 (2018). https://doi.org/10.1007/s10664-017-9552-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9552-y

Keywords

Navigation