Techniques Based on Data Science for Software Processes: A Systematic Literature Review

Fernández Del Carpio, Alvaro; Angarita, Leonardo Bermón

doi:10.1007/978-3-030-00623-5_2

Alvaro Fernández Del Carpio¹² &
Leonardo Bermón Angarita¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 918))

Included in the following conference series:

International Conference on Software Process Improvement and Capability Determination

1042 Accesses
3 Citations

Abstract

Software quality is an important topic for software practitioners in order to guarantee how the system is built and performed. In last years, techniques related to data science have been utilized in software engineering field as sup-port for building mainly prediction models. These approaches focused on trying to minimize software problems during the development and performance of software, helping to make right decisions. This systematic literature review (SLR) aims at investigating the significant techniques of data sciences used in software processes, identifying their major impacts and problems/challenges of use. This review will be of interest for software practitioners concerned on software quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, L.: Data science: a comprehensive overview. ACM Comput. Surv. (CSUR) 50(3), 43 (2017)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Godfrey, M.W., Hassan, A.E., Herbsleb, J., Murphy, G.C., Robillard, M., Devanbu, P., et al.: Future of mining software archives: a roundtable. IEEE Softw. 26, 67–70 (2009)
Article Google Scholar
Kitchenham, B.: Procedures for performing systematic reviews. Keele, UK, Keele University 33, 1–26 (2004)
Google Scholar
Licorish, S.A., MacDonell, S.G.: Communication and personality profiles of global software developers. Inf. Softw. Technol. 64, 113–131 (2015)
Article Google Scholar
Finlay, J., Pears, R., Connor, A.M.: Data stream mining for predicting software build outcomes using source code metrics. Inf. Softw. Technol. 56(2), 183–198 (2014)
Article Google Scholar
Rodríguez, D., Sicilia, M.A., García, E., Harrison, R.: Empirical findings on team size and productivity in software development. J. Syst. Softw. 85(3), 562–570 (2012)
Article Google Scholar
André, M., Baldoquín, M.G., Acuña, S.T.: Formal model for assigning human resources to teams in software projects. Inf. Softw. Technol. 53(3), 259–275 (2011)
Article Google Scholar
Li, J., Li, M., Wu, D., Dai, Q., Song, H.: A Bayesian networks-based risk identification approach for software process risk: the context of chinese trustworthy software. Int. J. Inf. Technol. Decis. Making 15(06), 1391–1412 (2016)
Article Google Scholar
Madera, M., Tomoń, R.: A case study on machine learning model for code review expert system in software engineering. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Prague, pp. 1357–1363 (2017)
Google Scholar
Alipour, A., Hindle, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection. In: 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, pp. 183–192 (2013)
Google Scholar
Araújo, A.A., Paixao, M., Yeltsin, I., et al.: An architecture based on interactive optimization and machine learning applied to the next release problem. Autom. Softw. Eng. 24, 623 (2017)
Article Google Scholar
Murillo-Morera, J., Castro-Herrera, C., Arroyo, J., Fuentes-Fernández, R.: An automated defect prediction framework using genetic algorithms: a validation of empirical studies. Intel. Artif. 19(57), 114–137 (2016)
Article Google Scholar
Huang, J., Li, Y-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)
Article Google Scholar
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43(1), 1–18 (2017)
Google Scholar
Barcelos-Tronto, I.F., Simões da Silva, J.D., Sant’Anna, N.: An investigation of artificial neural networks based prediction systems in software project management. J. Syst. Softw. 81(3), 356–367 (2008)
Google Scholar
He, Z., Shu, F., Yang, Y., et al.: An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Engi. 19, 167 (2012)
Article Google Scholar
Seo, Y.-S., Bae, D.-H., Jeffery, R.: AREION: software effort estimation based on multiple regressions with adaptive recursive data partitioning. Inf. Softw. Technol. 55(10), 1710–1725 (2013)
Article Google Scholar
Jonsson, L., Borg, M., Broman, D., et al.: Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir. Softw. Eng. 21, 1533 (2016)
Article Google Scholar
Pandey, N., Sanyal, D.K., Hudait, A., et al.: Automated classification of software issue reports using machine learning techniques: an empirical study. Innov. Syst. Softw. Eng. 13, 279 (2017)
Article Google Scholar
Vargas-Baldrich, S., Linares-Vásquez, M., Poshyvanyk, D.: Automated tagging of software projects using bytecode and dependencies (N). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, pp. 289–294 (2015)
Google Scholar
Bhattacharya, P., Neamtiu, J., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)
Article Google Scholar
Mendes, E., Mosley, N.: Bayesian network models for web effort prediction: a comparative study. IEEE Trans. Softw. Eng. 34(6), 723–737 (2008)
Article Google Scholar
Misirli, T., Bener, A.B.: Bayesian networks for evidence-based decision-making in software engineering. IEEE Trans. Softw. Eng. 40(6), 533–554 (2014)
Article Google Scholar
Mauša, G., Galinac-Grbac, T.: Co-evolutionary multi-population genetic programming for classification in software defect prediction: an empirical case study. Appl. Soft Comput. 55, 331–351 (2017)
Article Google Scholar
Bibi, S., Stamelos, I., Angelis, L.: Combining probabilistic models for explanatory productivity estimation. Inf. Softw. Technol. 50(7–8), 656–669 (2008)
Article Google Scholar
Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Softw. Eng. 38(2), 375–397 (2012)
Article Google Scholar
Ryu, D., Baik, J.: Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. 49, 1062–1077 (2016)
Article Google Scholar
Keung, J., Kocaguneli, E., Menzies, T.: Finding conclusion stability for selecting the best effort predictor in software effort estimation. Autom. Softw. Eng. 20, 543 (2013)
Article Google Scholar
Huang, Q., Shihab, E., Xia, X., et al.: Identifying self-admitted technical debt in open source projects using text mining. Empir. Softw. Eng. 23, 418 (2018)
Article Google Scholar
Idri, A., Hosni, M., Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. 49, 990–1019 (2016)
Article Google Scholar
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: Increasing the accuracy of software development effort estimation using projects clustering. IET Softw. 6(6), 461–473 (2012)
Article Google Scholar
Kaushik, A., Tayal, D.K., Yadav, K., Kaur, A.: Integrating firefly algorithm in artificial neural network models for accurate software cost predictions. J. Softw. Evol. Process 28(8), 665–688 (2016)
Google Scholar
Menzies, T., et al.: Learning project management decisions: a case study with case-based reasoning versus data farming. IEEE Trans. Softw. Eng. 39(12), 1698–1713 (2013)
Article Google Scholar
Menzies, T., et al.: Local versus global lessons for defect prediction and effort estimation. IEEE Trans. Softw. Eng. 39(6), 822–834 (2013)
Article Google Scholar
Malhotra, R., Jangra, R.: Prediction & assessment of change prone classes using statistical & machine learning techniques. J. Inf. Process. Syst. 13(4), 778–804 (2017)
Google Scholar
Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans. Softw. Eng. 39(4), 537–551 (2013)
Article Google Scholar
Bou-Nassif, A., Ho, D., Capretz, L.F.: Towards an early software estimation using log-linear regression and a multilayer perceptron model. J. Syst. Softw. 86(1), 144–160 (2013)
Article Google Scholar
Zhang, F., Mockus, A., Keivanloo, I., et al.: Towards building a universal defect prediction model with rank transformed predictors. Empir. Softw. Eng. 21, 2107 (2016)
Article Google Scholar
Limsettho, N., Hata, H., Monden, A., Matsumoto, K.: Unsupervised bug report categorization using clustering and labeling algorithm. Int. J. Softw. Eng. Knowl. Eng. 26(07), 1027–1053 (2016)
Article Google Scholar
Zhang, W., Yang, Y., Wang, Q.: Using Bayesian regression and EM algorithm with missing handling for software effort prediction. Inf. Softw. Technol. 58, 58–70 (2015)
Article Google Scholar
Rossi, B., Russo, B., Succi, G.: Analysis of open source software development iterations by means of burst detection techniques. In: Boldyreff, C., Crowston, K., Lundell, B., Wasserman, A.I. (eds.) OSS 2009. IFIP, vol. 299. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02032-2_9
Google Scholar
Sehra, S.K., Kaur, J., Bra, Y.S., Kaur, N.: Analysis of data mining techniques for software effort estimation. In: 2014 11th International Conference on Information Technology: New Generations, Las Vegas, NV, pp. 633–638 (2014)
Google Scholar
Gupta, S., Suma, V.: Data mining: a tool for knowledge discovery in human aspect of software engineering. In: 2015 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, pp. 1289–1293 (2015)
Google Scholar
Han, W., Lung, C.H., Ajila, S.A.: Empirical investigation of code and process metrics for defect prediction. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), Taipei, pp. 436–439 (2016)
Google Scholar
Karna, H., Gotovac, S.: Estimating software development effort using Bayesian networks. In: 2015 23rd International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, pp. 229–233 (2015)
Google Scholar
Parashar, A., Chhabra, J.K.: Mining Class Association Rules from Dynamic Class Coupling Data to Measure Class Reusability Pattern. Tan Y., Shi Y., Chai Y., Wang G., (eds.) ICSI 2011. LNCS, vol. 6729, pp. 146–156. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21524-7_18
Google Scholar
Damevski, K., Shepherd, D. C., Schneider, J. Pollock, L.: Mining sequences of developer interactions in visual studio for usage smells. IEEE Trans. Softw. Eng. 43(4), 359–371 (2017)
Article Google Scholar
Chang, C-P., Chu, C-P.: Software defect prediction using intertransaction association rule mining. Int. J. Softw. Eng. Knowl. Eng. 19(06), 747–764 (2009)
Article Google Scholar
Nessa, S., Abedin, M., Wong, W.E., Khan, L., Qi, Y.: Software fault localization using N-gram analysis. In: Li, Y., Huynh, D.T., Das, S.K., Du, D.Z. (eds.) WASA 2008. LNCS, vol. 5258, pp. 548–559. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88582-5_51
Chapter Google Scholar
Eichinger, F., Krogmann, K., Klug, R., Böhm, K.: Software-defect localisation by mining dataflow-enabled call graphs. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 425–441. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_33
Chapter Google Scholar
CASP, Critical Appraisal Skills Programme. https://casp-uk.net/. Accessed 15 Mar 2018

Download references

Author information

Authors and Affiliations

Software Engineering Department, Universidad La Salle, Av. Alfonso Ugarte 517, Arequipa, Peru
Alvaro Fernández Del Carpio
Computing Department, Universidad Nacional de Colombia, Campus La Nubia, Manizales-Caldas, Colombia
Leonardo Bermón Angarita

Authors

Alvaro Fernández Del Carpio
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Bermón Angarita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alvaro Fernández Del Carpio .

Editor information

Editors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Ioannis Stamelos
School of Computing, Dublin City University, Dublin, Ireland
Rory V. O'Connor
Software Quality institute, Griffith University, Brisbane, QLD, Australia
Terry Rout
Impronova AB, Askim, Sweden
Alec Dorling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernández Del Carpio, A., Angarita, L.B. (2018). Techniques Based on Data Science for Software Processes: A Systematic Literature Review. In: Stamelos, I., O'Connor, R., Rout, T., Dorling, A. (eds) Software Process Improvement and Capability Determination. SPICE 2018. Communications in Computer and Information Science, vol 918. Springer, Cham. https://doi.org/10.1007/978-3-030-00623-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-00623-5_2
Published: 16 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00622-8
Online ISBN: 978-3-030-00623-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics