Skip to main content
Log in

FOREPOST: finding performance problems automatically with feedback-directed learning software testing

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

A goal of performance testing is to find situations when applications unexpectedly exhibit worsened characteristics for certain combinations of input values. A fundamental question of performance testing is how to select a manageable subset of the input data faster in order to automatically find performance bottlenecks in applications. We propose FOREPOST, a novel solution, for automatically finding performance bottlenecks in applications using black-box software testing. Our solution is an adaptive, feedback-directed learning testing system that learns rules from execution traces of applications. Theses rules are then used to automatically select test input data for performance testing. We hypothesize that FOREPOST can find more performance bottlenecks as compared to random testing. We have implemented our solution and applied it to a medium-size industrial application at a major insurance company and to two open-source applications. Performance bottlenecks were found automatically and confirmed by experienced testers and developers. We also thoroughly studied the factors (or independent variables) that impact the results of FOREPOST.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://eclipse.org/tptp, last checked August 12, 2015

  2. http://weka.sourceforge.net/doc.stable/weka/classifiers/rules/JRip.html, last checked Apr 10, 2015

  3. http://sourceforge.net/projects/ibatisjpetstore, last checked Apr 10, 2015

  4. http://en.community.dell.com/techcenter/extras/w/wiki/dvd-store.aspx, last checked Apr 10, 2015

  5. http://linux.dell.com/dvdstore/, last checked Apr 10, 2015

  6. http://www.cs.wm.edu/semeru/data/EMSE-forepost/

References

  • Achenbach M, Ostermann K (2009) Engineering abstractions in model checking and testing. IEEE Intl Workshop SCAM:137–146

  • Aguilera MK, Mogul JC, Wiener JL, Reynolds P, Muthitacharoen A (2003) Performance debugging for distributed systems of black boxes. In: SOSP, pp 74–89

  • Ammann P, Offutt J (2008) Introduction to software testing. Cambridge University Press

  • Ammons G, Choi JD, Gupta M, Swamy N (2004) Finding and removing performance bottlenecks in large systems. In: ECOOP, pp 170–194

  • Arcuri A, Briand LC (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE, pp 1–10

  • Ashley C (2006) Application performance management market offers attractive benefits to european service providers. The Yankee Group

  • Avritzer A, Weyuker EJ (1994) Generating test suites for software load testing. In: ISSTA, pp 44–57

  • Avritzer A, Weyuker EJ (1996) Deriving workloads for performance testing, vol 26. Wiley, New York, pp 613–633

    Google Scholar 

  • Avritzer A, de Souza e Silva E, Leão RMM, Weyuker EJ (2011) Automated generation of test cases using a performability model. Software IET 5(2):113–119

    Article  Google Scholar 

  • Barna C, Litoiu M, Ghanbari H (2011) Autonomic load-testing framework. In: roceedings of the 8th ACM international conference on autonomic computing, ICAC ’11. ACM, USA, pp 91–100

  • Bayan M, Cangussu JaW (2008) Automatic feedback, control-based, stress and load testing. In: Proceedings of the 2008 ACM symposium on applied computing, SAC 08. ACM, USA, pp 661–666

  • Beck K (2003) Test-driven development: by example. The Addison-Wesley Signature Series. Addison-Wesley

  • Bird DL, Munoz CU (1983) Automatic generation of random self-checking test cases. IBM Syst J 22:229–245

    Article  Google Scholar 

  • Bishop CM (2006) Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Secaucus

    MATH  Google Scholar 

  • Briand LC, Labiche Y, Shousha M (2005) Stress testing real-time systems with genetic algorithms. In: Proceedings of the 7th annual conference on genetic and evolutionary computation, GECCO 05. ACM, USA, pp 1021–1028

  • Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic Press

  • Cohen WW (1995) Fast effective rule induction. In: Twelfth ICML, pp 115–123

  • Cornelissen W, Klaassen A, Matsinger A, van Wee G (1995) How to make intuitive testing more systematic. IEEE Softw 12(5):87–89

    Article  Google Scholar 

  • Dickinson W, Leon D, Podgurski A (2001) Finding failures by cluster analysis of execution profiles. In: ICSE, pp 339–348

  • Dijkstra EW (1976) A discipline of programming, vol 1. Englewood Cliffs: prentice-hall

  • Dustin E, Rashka J, Paul J (1999) Automated software testing: introduction, management, and performance. Addison-Wesley Professional

  • Fewster M, Graham D (1999) Software test automation: effective use of test execution tools. ACM Press/Addison-Wesley Publishing Co.

  • Foo KC, Jiang ZM, Adams B, Hassan AE, Zou Y, Flora P (2010) Mining performance regression testing repositories for automated performance analysis. In: 10th international conference on quality software (QSIC), IEEE, pp 32–41

  • Freeman S, Mackinnon T, Pryce N, Walnes J (2004) Mock roles, objects. In: Companion to OOPSLA ’04, pp 236–246

  • Furnkranz J, Widmer G (1994) Incremental reduced error pruning. In: International conference on machine learning, pp 70–77

  • Garbani JP (2008) Market overview: the application performance management market. Forrester Research

  • Glenford JM (1979) The art of software testing. Wiley. ISBN 10:0471043281

  • Grant S, Cordy JR, Skillicorn D (2008) Automated concept location using independent component analysis. In: WCRE ’08, pp 138–142

  • Grechanik M, Fu C, Xie Q (2012) Automatically finding performance problems with feedback-directed learning software testing. In: 34th international conference on software engineering (ICSE), pp 156– 166

  • Grindal M, Offutt J, Andler SF (2005) Combination testing strategies: a survey. Software Testing, Verification, and Reliability 15:167–199

    Article  Google Scholar 

  • Group TY (2005) Enterprise application management survey. The Yankee Group

  • Hamlet D (2006) When only random testing will do. In: Proceedings of the 1st international workshop on random testing, RT ’06. ACM, USA, pp 1–9. doi:10.1145/1145735.1145737

  • Hamlet R (1994) Random testing. In: Encyclopedia of Software Engineering. Wiley, pp 970–978

  • Haran M, Karr A, Orso A, Porter A, Sanil A (2005) Applying classification techniques to remotely-collected program execution data. In: ESEC/FSE-13, pp 146–155

  • Hull E, Jackson K, Dick J (2005) Requirements engineering. Springer

  • Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430

    Article  Google Scholar 

  • IEEE (1991) IEEE standard computer dictionary: a compilation of ieee standard computer glossaries

  • Isaacs R, Barham P (2002) Performance analysis in loosely-coupled distributed systems. In: 7th CaberNet Radicals Workshop

  • Jiang ZM, Hassan AE, Hamann G, Flora P (2009) Automated performance analysis of load tests. In: ICSM, pp 125–134

  • Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the 33rd ACM SIGPLAN conference on programming language design and implementation, pp 77–88

  • Jovic M, Adamoli A, Hauswirth M (2011) Catch me if you can: performance bug detection in the wild. ACM SIGPLAN Not 46(10):155–170

    Article  Google Scholar 

  • Kaner C (1997) Improving the maintainability of automated test suites. Software QA 4(4)

  • Kaner C (2003) What is a good test case? In: Software Testing Analysis & Review Conference (STAR) East

  • Koziolek H (2005) Operational profiles for software reliability. In: Seminar on Dependability Engineering, Germany, Citeseer

  • Linares-Vásquez M, Mcmillan C, Poshyvanyk D, Grechanik M (2014) On using machine learning to automatically classify software applications into domain categories. Empirical Softw Engg 19(3):582–618. doi:10.1007/s10664-012-9230-z

    Article  Google Scholar 

  • Lowry R (2014) Concepts and applications of inferential statistics. R. Lowry

  • Malik H, Adams B, Hassan AE (2010) Pinpointing the subsystems responsible for the performance deviations in a load test. In: IEEE 21st international symposium on software reliability engineering (ISSRE), IEEE, pp 201–210

  • Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 international conference on software engineering, IEEE Press, pp 1012–1021

  • McMillan C, Linares-Vasquez M, Poshyvanyk D, Grechanik M (2011) Categorizing software applications for maintenance. In: Proceedings of the 2011 27th IEEE international conference on software maintenance, ICSM 11. IEEE Computer Society, USA, pp 343–352. doi:10.1109/ICSM.2011.6080801

  • Menascé DA (2002) Load testing, benchmarking, and application performance management for the web. In: International CMG Conference, pp 271–282

  • Molyneaux I (2009) The art of application performance testing: help for programmers and quality assurance. O’Reilly Media, Inc

  • Murphy TE (2008) Managing test data for maximum productivity. Tech. rep

  • Musa JD (1993) Operational profiles in software-reliability engineering, vol 10. IEEE Computer Society Press, Los Alamitos, pp 14–32

    Google Scholar 

  • Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. In: Proceedings of the 10th international workshop on mining software repositories, pp 237–246

  • Nistor A, Chang PC, Radoi C, Lu S (2015) Caramel: detecting and fixing performance problems that have non-intrusive fixes. ICSE

  • Park S, Hossain BMM, Hussain I, Csallner C, Grechanik M, Taneja K, Fu C, Xie Q (2012) Carfast: Achieving higher statement coverage faster. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE 12. ACM, USA, pp 35:1–35:11. doi:10.1145/2393596.2393636

  • Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15:1053–1058

    Article  Google Scholar 

  • Parsons S (2005) Independent component analysis: a tutorial introduction. Knowl Eng Rev 20(2):198–199

    Article  Google Scholar 

  • Schwaber C, Mines C, Hogan L (2006) Performance-driven software development: How it shops can more efficiently meet performance requirements. Forrester Research

  • Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611

    Article  MathSciNet  MATH  Google Scholar 

  • Shen D, Luo Q, Poshyvanyk D, Grechanik M (2015) Automating performance bottleneck detection using search-based application profiling. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 15. ACM, USA, pp 270–281. doi:10.1145/2771783.2771816

  • Syer MD, Jiang ZM, Nagappan M, Hassan AE, Nasser M, Flora P (2013) Leveraging performance counters and execution logs to diagnose memory-related performance issues. In: 29th IEEE international conference on software maintenance (ICSM), IEEE, pp 110–119

  • Tarr PL, Ossher H, Harrison WH Jr (1999) SMS, Degrees of separation: Multi-dimensional separation of concerns. In: ICSE, pp 107–119

  • Tian K, Revelle M, Poshyvanyk D (2009) Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR 09. IEEE Computer Society, USA, pp 163–166. doi:10.1109/MSR.2009.5069496

  • Westcott MR (1968) Toward a contemporary psychology of intuition: a historical, theoretical, and empirical inquiry. Holt, Rinehart and Winston

  • Weyuker EJ, Vokolos FI (2000) Experience with performance testing of software systems: Issues, an approach, and case study. IEEE Trans Softw Eng 26(12):1147–1156

    Article  Google Scholar 

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1 (6):80–83

    Article  Google Scholar 

  • Wildstrom J, Stone P, Witchel E, Dahlin M (2007) Machine learning for on-line hardware reconfiguration. In: IJCAI’07, pp 1113–1118

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann

  • Yuhanna N (2009) Dbms selection: look beyond basic functions. Forrester Research

  • Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories, ACM, pp 93–102

  • Zaman S, Adams B, Hassan AE (2012) A qualitative study on performance bugs. In: 9th IEEE working conference on mining software repositories (MSR), pp 199–208

  • Zaparanuks D, Hauswirth M (2012) Algorithmic profiling. ACM SIGPLAN Not 47(6):67–76

    Article  Google Scholar 

  • Zhang P, Elbaum SG, Dwyer MB (2011) Automatic generation of load tests. In: ASE, pp 43–52

Download references

Acknowledgments

We are grateful to the anonymous ICSE’12 and EMSE journal reviewers for their relevant and useful comments and suggestions, which helped us to significantly improve an earlier version of this paper. We would like to thank Bogdan Dit and Kevin Moran for reading the paper and providing the feedback on early drafts. We would also like to thank Du Shen for his pertinent feedback on improving the current version of FOREPOST as well as pointing out areas for improvement. We also thank Chen Fu and Qin Xie for their contributions to the earlier version of this work. This work is supported by NSF CCF-0916139, NSF CCF-1017633, NSF CCF-1218129, NSF CCF-1525902, a major insurance company, and Accenture.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Luo.

Additional information

Communicated by: Ahmed E. Hassan

Appendix

Appendix

Table 8 Six Injected bottlenecks in JPetStore, where the length of delay is measured in seconds
Table 9 Nine injected bottlenecks in JPetStore, where the length of delay is measured in seconds
Table 10 Twelve Injected bottlenecks in JPetStore, where the length of delay is measured in seconds
Table 11 Injected bottlenecks in Dell DVD Store and the standard jar file mysql-connector-java.jar, where the length of delay is measured in seconds
Table 12 Ranks of bottlenecks for FOREPOST in JPetStore, where there are five original bottlenecks and nine artificial bottlenecks
Table 13 Precision for FOREPOST when n u = 5 and n p = 15
Table 14 Recall for FOREPOST when n u = 5 and n p = 15
Table 15 F-score for FOREPOST when n u = 5 and n p = 15
Table 16 Precision for FOREPOST when n u = 5 and n p = 20
Table 17 Recall for FOREPOST when n u = 5 and n p = 20
Table 18 F-score for FOREPOST when n u = 5 and n p = 20
Table 19 Precision for FOREPOST when n u = 10 and n p = 10
Table 20 Recall for FOREPOST when n u = 10 and n p = 10
Table 21 F-score for FOREPOST when n u = 10 and n p = 10
Table 22 Precision for FOREPOST when n u = 10 and n p = 15
Table 23 Recall for FOREPOST when n u = 10 and n p = 15
Table 24 F-score for FOREPOST when n u = 10 and n p = 15
Table 25 Precision for FOREPOST when n u = 10 and n p = 20
Table 26 Recall for FOREPOST when n u = 10 and n p = 20
Table 27 F-score for FOREPOST when n u = 10 and n p = 20
Table 28 Precision for FOREPOST when n u = 15 and n p = 10
Table 29 Recall for FOREPOST when n u = 15 and n p = 10
Table 30 F-score for FOREPOST when n u = 15 and n p = 10
Table 31 Precision for FOREPOST when n u = 15 and n p = 15
Table 32 Recall for FOREPOST when n u = 15 and n p = 15
Table 33 F-score for FOREPOST when n u = 15 and n p = 15
Table 34 Precision for FOREPOST when n u = 15 and n p = 20
Table 35 Recall for FOREPOST when n u = 15 and n p = 20
Table 36 F-score for FOREPOST when n u = 15 and n p = 20

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Q., Nair, A., Grechanik, M. et al. FOREPOST: finding performance problems automatically with feedback-directed learning software testing. Empir Software Eng 22, 6–56 (2017). https://doi.org/10.1007/s10664-015-9413-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9413-5

Keywords

Navigation