Abstract
Users of highly-configurable software systems often want to optimize a particular objective such as improving a functional outcome or increasing system performance. One approach is to use an evolutionary algorithm. However, many applications today are data-driven, meaning they depend on inputs or data which can be complex and varied. Hence, a search needs to be run (and re-run) for all inputs, making optimization a heavy-weight and potentially impractical process. In this paper, we explore this issue on a data-driven highly-configurable scientific application. We build an exhaustive database containing 3,000 configurations and 10,000 inputs, leading to almost 100 million records as our oracle, and then run a genetic algorithm individually on each of the 10,000 inputs. We ask if (1) a genetic algorithm can find configurations to improve functional objectives; (2) whether patterns of best configurations over all input data emerge; and (3) if we can we use sampling to approximate the results. We find that the original (default) configuration is best only 34% of the time, while clear patterns emerge of other best configurations. Out of 3,000 possible configurations, only 112 distinct configurations achieve the optimal result at least once across all 10,000 inputs, suggesting the potential for lighter weight optimization approaches. We show that sampling of the input data finds similar patterns at a lower cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Each run of the application can return multiple answers leading to many more records than 3,000 \(\times \) 10,000.
- 2.
Supplementary data website: https://github.com/LavaOps/ssbse-2020-FrDdE.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215 (2018). https://blast.ncbi.nlm.nih.gov/
Cashman, M., Cohen, M.B., Ranjan, P., Cottingham, R.W.: Navigating the maze: the impact of configurability in bioinformatics software. In: International Conference on Automated Software Engineering, pp. 757–767. ASE, September 2018
Garvin, B.J., Cohen, M.B., Dwyer, M.B.: Evaluating improvements to a meta-heuristic search for constrained interaction testing. Empir. Softw. Eng. (EMSE) 16, 61–102 (2010)
Garvin, B.J., Cohen, M.B., Dwyer, M.B.: Failure avoidance in configurable systems through feature locality. In: Cámara, J., de Lemos, R., Ghezzi, C., Lopes, A. (eds.) Assurances for Self-Adaptive Systems. LNCS, vol. 7740, pp. 266–296. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36249-1_10
Henard, C., Papadakis, M., Harman, M., Le Traon, Y.: Combining multi-objective search and constraint solving for configuring large software product lines. In: IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 517–528 (2015)
Henard, C., Papadakis, M., Perrouin, G., Klein, J., Heymans, P., Le Traon, Y.: Bypassing the combinatorial explosion: using similarity to generate and prioritize T-wise test configurations for software product lines. IEEE Trans. Softw. Eng. 40(7), 650–670 (2014)
Jamshidi, P., Siegmund, N., Velez, M., Kästner, C., Patel, A., Agarwal, Y.: Transfer learning for performance modeling of configurable systems: an exploratory analysis. In: International Conference on Automated Software Engineering (ASE), pp. 497–508, November 2017
Jamshidi, P., Velez, M., Kästner, C., Siegmund, N.: Learning to sample: exploiting similarities across environments to learn performance models for configurable systems. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 71–82. ESEC/FSE (2018)
Jia, Y., Cohen, M.B., Harman, M., Petke, J.: Learning combinatorial interaction test generation strategies using hyperheuristic search. In: IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1, pp. 540–550 (2015)
Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 122–131 (2016)
Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. 45(11), 1081–1105 (2019)
Langdon, W.B.: Big data driven genetic improvement for maintenance of legacy software systems. SIGEVOlution Newsl. ACM Spec. Interes. Group Genet. Evol. Comput. 12(3), 6–9 (2019)
Langdon, W.B., Krauss, O.: Evolving sqrt into 1/x via software data maintenance. In: Coello, C.A.C. (ed.) GECCO 2020: Genetic and Evolutionary Computation Conference, Companion Volume, pp. 1928–1936. ACM, July 2020
Medeiros, F., Kästner, C., Ribeiro, M., Gheyi, R., Apel, S.: A comparison of 10 sampling algorithms for configurable systems. In: International Conference on Software Engineering (ICSE), pp. 643–654. ACM, May 2016
Meinicke, J., Wong, C.P., Kästner, C., Thüm, T., Saake, G.: On essential configuration complexity: measuring interactions in highly-configurable systems. In: International Conference on Automated Software Engineering (ASE), pp. 483–494. ACM, September 2016
Nair, V., et al.: Data-driven search-based software engineering. In: IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 341–352 (2018)
Nair, V., Menzies, T., Siegmund, N., Apel, S.: Using bad learners to find good configurations. In: Joint Meeting on Foundations of Software Engineering, pp. 257–267. ESEC/FSE (2017)
Oh, J., Batory, D., Myers, M., Siegmund, N.: Finding near-optimal configurations in product lines by random sampling. In: Joint Meeting on Foundations of Software Engineering, p. 61–71. ESEC/FSE (2017)
Qu, X., Cohen, M.B., Rothermel, G.: Configuration-aware regression testing: an empirical study of sampling and prioritization. In: International Symposium on Software Testing and Analysis, pp. 75–86. ISSTA, ACM (2008)
Siegmund, N., Grebhahn, A., Kästner, C., Apel, S.: Performance-influence models for highly configurable systems. In: European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 284–294. ACM Press, August 2015
Xiang, Y., Zhou, Y., Zheng, Z., Li, M.: Configuring software product lines by combining many-objective optimization and sat solvers. ACM Trans. Softw. Eng. Methodol. 26(4), 1–46 (2018)
Yilmaz, C., Dumlu, E., Cohen, M.B., Porter, A.: Reducing masking effects in combinatorial interaction testing: a feedback driven adaptive approach. IEEE Trans. Softw. Eng. 40(1), 43–66 (2014)
Acknowledgements
This work is supported in part by NSF Grant CCF-1901543 and by The Center for Bioenergy Innovation (CBI) which is supported by the Office of Biological and Environmental Research in the DOE Office of Science.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sinha, U., Cashman, M., Cohen, M.B. (2020). Using a Genetic Algorithm to Optimize Configurations in a Data-Driven Application. In: Aleti, A., Panichella, A. (eds) Search-Based Software Engineering. SSBSE 2020. Lecture Notes in Computer Science(), vol 12420. Springer, Cham. https://doi.org/10.1007/978-3-030-59762-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-59762-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59761-0
Online ISBN: 978-3-030-59762-7
eBook Packages: Computer ScienceComputer Science (R0)