A correlation study between automated program repair and test-suite metrics

Yi, Jooyong; Tan, Shin Hwei; Mechtaev, Sergey; Böhme, Marcel; Roychoudhury, Abhik

doi:10.1007/s10664-017-9552-y

A correlation study between automated program repair and test-suite metrics

Published: 30 September 2017

Volume 23, pages 2948–2979, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Jooyong Yi ORCID: orcid.org/0000-0002-7215-0855¹,
Shin Hwei Tan²,
Sergey Mechtaev²,
Marcel Böhme² &
…
Abhik Roychoudhury²

1094 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

Automated program repair is increasingly gaining traction, due to its potential to reduce debugging cost greatly. The feasibility of automated program repair has been shown in a number of works, and the research focus is gradually shifting toward the quality of generated patches. One promising direction is to control the quality of generated patches by controlling the quality of test-suites used for automated program repair. In this paper, we ask the following research question: “Can traditional test-suite metrics proposed for the purpose of software testing also be used for the purpose of automated program repair?” We empirically investigate whether traditional test-suite metrics such as statement/branch coverage and mutation score are effective in controlling the reliability of generated repairs (the likelihood that repairs cause regression errors). We conduct the largest-scale experiments of this kind to date with real-world software, and for the first time perform a correlation study between various test-suite metrics and the reliability of generated repairs. Our results show that in general, with the increase of traditional test suite metrics, the reliability of repairs tend to increase. In particular, such a trend is most strongly observed in statement coverage. Our results imply that the traditional test suite metrics proposed for software testing can also be used for automated program repair to improve the reliability of repairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Do automated program repair techniques repair hard and important bugs?

Article 18 November 2017

Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system

Article 12 May 2018

A comprehensive study of code-removal patches in automated program repair

Article Open access 05 May 2022

Notes

One exception is DirectFix (Mechtaev et al. 2015) where fault localization and edit parts are fused.
A mutant m is considered killed when the test result of m for at least on test in the provided test-suite is different from the test result of the original program for the same test.
Only positive tests are considered; an output change for negative tests is not a regression.
We used the original GenProg benchmark. At the time of writing this paper, the benchmark was updated after a few problems in the test scripts of php and libtiff are reported in Qi et al. (2015).
The grep subject in CoREBench contains real errors unlike the grep in SIR that contains seeded errors.
While php contains 8471 tests, we randomly selected 200 tests out of them to deal with long running time of the php tests.
tot_info includes non-linear arithmetic expressions which are not currently supported by the underlying SMT solver SemFix uses.
We extended its parser to handle the large subjects (php, libtiff, grep, and findutils).
https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
The minimum statement/branch coverage of php is 0 because some tests do not execute the marked source files.
Min/Max/Mean values of the table are different from those of Table 3, because there we consider only test-suites from which repairs are generated, whereas in Table 5, we consider all test-suites.
The “coverage” referred to in Smith et al. (2015) essentially means how many tests of a given test-universe are covered.

References

Andrews JH, Briand LC, Labiche Y, Namin AS (2006) Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans Softw Eng 32(8):608–624
Article Google Scholar
Artzi S, Dolby J, Tip F, Pistoia M (2010) Directed test generation for effective fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pp 49–60
Assiri FY, Bieman JM (2014) An assessment of the quality of automated program operator repair. In: Proceedings of the 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation, ICSE ’14, pp 273–282
Baudry B, Fleurey F, Le Traon Y (2006) Improving test suites for efficient fault localization. In: 82–91
Böhme M, Roychoudhury A (2014) CoREBench: Studying complexity of regression errors. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis, ISSTA ’14, pp 105–115
Böhme M, Oliveira BCdS, Roychoudhury A (2013a) Partition-based regression verification. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 302–311
Böhme M, Oliveira BCdS, Roychoudhury A (2013b) Regression tests to expose change interaction errors. In: Proceedings of the 2013 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’13, pp 334–344
Cadar C, Engler D (2005) Execution generated test cases: How to make systems code crash itself. In: Proceedings of the 12th International Conference on Model Checking Software, SPIN ’05, pp 2–23
Cadar C, Dunbar D, Engler D (2008). In: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’ 08, pp 209–224
Dallmeier V, Zeller A, Meyer B (2009) Generating fixes from object behavior anomalies. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, pp 550–554
Debroy V, Wong WE (2010) Using mutation to automatically suggest fixes for faulty programs. In: Proceedings of the Third International Conference on Software Testing, Verification and Validation, ICST ’10, pp 65–74
Debroy V, Wong WE (2014) Combining mutation and fault localization for automated program debugging. J Syst Softw 90:45–60
Article Google Scholar
Do H, Elbaum SG, Rothermel G (2005) Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir Softw Eng 10(4):405–435
Article Google Scholar
Elkarablieh B, Khurshid S (2008) Juzi: A tool for repairing complex data structures. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pp 855–858
Godefroid P, Klarlund N, Sen K (2005) DART: Directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pp 213–223
Gopinath D, Malik MZ, Khurshid S (2011) Specification-based program repair using SAT. In: Proceedings of the 17th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Part of the Joint European Conferences on Theory and Practice of Software, TACAS ’11/ETAPS ’11, pp 173–188
He H, Gupta N (2004) Automated debugging using path-based weakest preconditions. In: Proceedings of the 7th International Conference on Fundamental Approaches to Software Engineering, FASE ’04, pp 267–280
Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678
Article Google Scholar
Jobstmann B, Griesmayer A, Bloem R (2005) Program repair as a game. In: Proceedings of the 17th International Conference on Computer Aided Verification, CAV ’05, pp 226–238
Jones JA, Harrold MJ, Stasko JT (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pp 467–477
Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search (t). In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering, ASE ’15, pp 295–306
Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33 (3):239–251
Article MathSciNet MATH Google Scholar
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 802–811
Kong X, Zhang L, Wong WE, Li B (2015) Experience report: How do techniques, programs, and tests impact automated program repair?. In: Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering, ISSRE ’15, pp 194–204
Könighofer R, Bloem R (2011) Automated error localization and correction for imperative programs. In: Proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD ’11, pp 91–100
Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012a) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 3–13
Le Goues C, Nguyen T, Forrest S, Weimer W (2012b) GenProg: A generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72
Le Goues C, Forrest S, Weimer W (2013) Current challenges in automatic software repair. Softw Qual J 21(3):421–443
Article Google Scholar
Liblit B, Aiken A, Zheng AX, Jordan MI (2003) Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming Language Design and Implementation, PLDI ’03, pp 141–154
Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’15, pp 166–178
Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 702–713
Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16, pp 298–312
Long F, Sidiroglou-Douskos S, Rinard M (2014) Automatic runtime error repair and containment via recovery shepherding. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pp 227–238
Maldonado JC, Delamaro ME, Fabbri SCPF, Simão A d S, Sugeta T, Vincenzi AMR, Masiero PC (2001) Proteum: A family of tools to support specification and program testing based on mutation. In: Wong W E (ed) Mutation Testing for the New Century, Kluwer Academic Publishers, Norwell, pp 113–116
Mechtaev S, Yi J, Roychoudhury A (2015) DirectFix: Looking for simple program repairs. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, ICSE ’15, pp 448–458
Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: Scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 691–701
Miller W, Spooner DL (1976) Automatic generation of floating-point test data. IEEE Trans Softw Eng 2(3):223–226
Article MathSciNet Google Scholar
Namin AS, Andrews JH (2009) The influence of size and coverage on test suite effectiveness. In: Proceedings of the 8th International Symposium on Software Testing and Analysis, ISSTA ’09, pp 57–68
Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) SemFix: Program repair via semantic analysis. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp 772–781
Pearson K (1895) Note on regression and inheritance in the case of two parents. Proc Royal Soc Lond 58:240–242
Article Google Scholar
Pei Y, Furia C, Nordio M, Wei Y, Meyer B, Zeller A (2014) Automated fixing of programs with contracts. IEEE Trans Softw Eng 40(5):427–449
Article Google Scholar
Perkins JH, Kim S, Larsen S, Amarasinghe S, Bachrach J, Carbin M, Pacheco C, Sherwood F, Sidiroglou S, Sullivan G, Wong WF, Zibin Y, Ernst MD, Rinard M (2009) Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pp 87–102
Person S, Yang G, Rungta N, Khurshid S (2011) Directed incremental symbolic execution. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pp 504–515
Qi Y, Mao X, Lei Y (2013) Efficient automated program repair through fault-recorded testing prioritization. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM ’13, pp 180–189
Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 254–265
Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA, pp 24–36
Samimi H, Aung ED, Millstein T (2010) Falling back on executable specifications. In: Proceedings of the 24th European Conference on Object-oriented Programming, ECOOP’10, pp 552–576
Samimi H, Schäfer M, Artzi S, Millstein T, Tip F, Hendren L (2012) Automated repair of HTML generation errors in PHP applications using string constraint solving. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 277–287
Santelices R, Chittimalli PK, Apiwattanapong T, Orso A, Harrold MJ (2008) Test-suite augmentation for evolving software. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE ’08, pp 218–227
Shoenauer M, Xanthakis S (1993) Constrained GA optimization. In: Proceedings of the 5th International Conference on Genetic Algorithms, ICGA ’93, pp 573–580
Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? overfitting in automated program repair. In: Proceedings of the 2015 Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’15, pp 532–543
Tan SH, Roychoudhury A (2015) relifix: Automated repair of software regressions. In: Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, ICSE ’15, pp 471–482
Tan SH, Yoshida H, Prasad MR, Roychoudhury A (2016) Anti-patterns in search-based program repair. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE’16, pp 727–738
Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: Models and first results. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, ASE ’13, pp 356–366
White DR, Arcuri A, Clark JA (2011) Evolutionary improvement of programs. IEEE Trans Evol Comput 15(4):515–538
Article Google Scholar
Xuan J, Martinez M, Demarco F, Clement M, Marcote SRL, Durieux T, Berre DL, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in Java programs. IEEE Trans Softw Eng 43(1):34–55
Article Google Scholar
Yao X, Harman M, Jia Y (2014) A study of equivalent and stubborn mutation operators using human analysis of equivalence. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 919–930

Download references

Acknowledgements

This research is supported in part by the National Research Foundation, Prime Minister’s Office, Singapore under its National Cybersecurity R&D Program (TSUNAMi project, Award No. NRF2014NCR-NCR001-21) and administered by the National Cybersecurity R&D Directorate. The first author thanks Innopolis University for its support.

Author information

Authors and Affiliations

Institute of Technologies and Software Development, Innopolis University, Innopolis, Russia
Jooyong Yi
School of Computing, National University of Singapore, Singapore, Singapore
Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme & Abhik Roychoudhury

Authors

Jooyong Yi
View author publications
You can also search for this author in PubMed Google Scholar
Shin Hwei Tan
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Mechtaev
View author publications
You can also search for this author in PubMed Google Scholar
Marcel Böhme
View author publications
You can also search for this author in PubMed Google Scholar
Abhik Roychoudhury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jooyong Yi.

Additional information

Communicated by: Martin Monperrus and Westley Weimer

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yi, J., Tan, S.H., Mechtaev, S. et al. A correlation study between automated program repair and test-suite metrics. Empir Software Eng 23, 2948–2979 (2018). https://doi.org/10.1007/s10664-017-9552-y

Download citation

Published: 30 September 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10664-017-9552-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A correlation study between automated program repair and test-suite metrics

Abstract

Access this article

Similar content being viewed by others

Do automated program repair techniques repair hard and important bugs?

Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system

A comprehensive study of code-removal patches in automated program repair

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A correlation study between automated program repair and test-suite metrics

Abstract

Access this article

Similar content being viewed by others

Do automated program repair techniques repair hard and important bugs?

Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system

A comprehensive study of code-removal patches in automated program repair

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation