Abstract
Software quality is an important topic for software practitioners in order to guarantee how the system is built and performed. In last years, techniques related to data science have been utilized in software engineering field as sup-port for building mainly prediction models. These approaches focused on trying to minimize software problems during the development and performance of software, helping to make right decisions. This systematic literature review (SLR) aims at investigating the significant techniques of data sciences used in software processes, identifying their major impacts and problems/challenges of use. This review will be of interest for software practitioners concerned on software quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cao, L.: Data science: a comprehensive overview. ACM Comput. Surv. (CSUR) 50(3), 43 (2017)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Godfrey, M.W., Hassan, A.E., Herbsleb, J., Murphy, G.C., Robillard, M., Devanbu, P., et al.: Future of mining software archives: a roundtable. IEEE Softw. 26, 67–70 (2009)
Kitchenham, B.: Procedures for performing systematic reviews. Keele, UK, Keele University 33, 1–26 (2004)
Licorish, S.A., MacDonell, S.G.: Communication and personality profiles of global software developers. Inf. Softw. Technol. 64, 113–131 (2015)
Finlay, J., Pears, R., Connor, A.M.: Data stream mining for predicting software build outcomes using source code metrics. Inf. Softw. Technol. 56(2), 183–198 (2014)
Rodríguez, D., Sicilia, M.A., García, E., Harrison, R.: Empirical findings on team size and productivity in software development. J. Syst. Softw. 85(3), 562–570 (2012)
André, M., Baldoquín, M.G., Acuña, S.T.: Formal model for assigning human resources to teams in software projects. Inf. Softw. Technol. 53(3), 259–275 (2011)
Li, J., Li, M., Wu, D., Dai, Q., Song, H.: A Bayesian networks-based risk identification approach for software process risk: the context of chinese trustworthy software. Int. J. Inf. Technol. Decis. Making 15(06), 1391–1412 (2016)
Madera, M., Tomoń, R.: A case study on machine learning model for code review expert system in software engineering. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Prague, pp. 1357–1363 (2017)
Alipour, A., Hindle, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection. In: 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, pp. 183–192 (2013)
Araújo, A.A., Paixao, M., Yeltsin, I., et al.: An architecture based on interactive optimization and machine learning applied to the next release problem. Autom. Softw. Eng. 24, 623 (2017)
Murillo-Morera, J., Castro-Herrera, C., Arroyo, J., Fuentes-Fernández, R.: An automated defect prediction framework using genetic algorithms: a validation of empirical studies. Intel. Artif. 19(57), 114–137 (2016)
Huang, J., Li, Y-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43(1), 1–18 (2017)
Barcelos-Tronto, I.F., Simões da Silva, J.D., Sant’Anna, N.: An investigation of artificial neural networks based prediction systems in software project management. J. Syst. Softw. 81(3), 356–367 (2008)
He, Z., Shu, F., Yang, Y., et al.: An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Engi. 19, 167 (2012)
Seo, Y.-S., Bae, D.-H., Jeffery, R.: AREION: software effort estimation based on multiple regressions with adaptive recursive data partitioning. Inf. Softw. Technol. 55(10), 1710–1725 (2013)
Jonsson, L., Borg, M., Broman, D., et al.: Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir. Softw. Eng. 21, 1533 (2016)
Pandey, N., Sanyal, D.K., Hudait, A., et al.: Automated classification of software issue reports using machine learning techniques: an empirical study. Innov. Syst. Softw. Eng. 13, 279 (2017)
Vargas-Baldrich, S., Linares-Vásquez, M., Poshyvanyk, D.: Automated tagging of software projects using bytecode and dependencies (N). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, pp. 289–294 (2015)
Bhattacharya, P., Neamtiu, J., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)
Mendes, E., Mosley, N.: Bayesian network models for web effort prediction: a comparative study. IEEE Trans. Softw. Eng. 34(6), 723–737 (2008)
Misirli, T., Bener, A.B.: Bayesian networks for evidence-based decision-making in software engineering. IEEE Trans. Softw. Eng. 40(6), 533–554 (2014)
Mauša, G., Galinac-Grbac, T.: Co-evolutionary multi-population genetic programming for classification in software defect prediction: an empirical case study. Appl. Soft Comput. 55, 331–351 (2017)
Bibi, S., Stamelos, I., Angelis, L.: Combining probabilistic models for explanatory productivity estimation. Inf. Softw. Technol. 50(7–8), 656–669 (2008)
Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Softw. Eng. 38(2), 375–397 (2012)
Ryu, D., Baik, J.: Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. 49, 1062–1077 (2016)
Keung, J., Kocaguneli, E., Menzies, T.: Finding conclusion stability for selecting the best effort predictor in software effort estimation. Autom. Softw. Eng. 20, 543 (2013)
Huang, Q., Shihab, E., Xia, X., et al.: Identifying self-admitted technical debt in open source projects using text mining. Empir. Softw. Eng. 23, 418 (2018)
Idri, A., Hosni, M., Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. 49, 990–1019 (2016)
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: Increasing the accuracy of software development effort estimation using projects clustering. IET Softw. 6(6), 461–473 (2012)
Kaushik, A., Tayal, D.K., Yadav, K., Kaur, A.: Integrating firefly algorithm in artificial neural network models for accurate software cost predictions. J. Softw. Evol. Process 28(8), 665–688 (2016)
Menzies, T., et al.: Learning project management decisions: a case study with case-based reasoning versus data farming. IEEE Trans. Softw. Eng. 39(12), 1698–1713 (2013)
Menzies, T., et al.: Local versus global lessons for defect prediction and effort estimation. IEEE Trans. Softw. Eng. 39(6), 822–834 (2013)
Malhotra, R., Jangra, R.: Prediction & assessment of change prone classes using statistical & machine learning techniques. J. Inf. Process. Syst. 13(4), 778–804 (2017)
Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans. Softw. Eng. 39(4), 537–551 (2013)
Bou-Nassif, A., Ho, D., Capretz, L.F.: Towards an early software estimation using log-linear regression and a multilayer perceptron model. J. Syst. Softw. 86(1), 144–160 (2013)
Zhang, F., Mockus, A., Keivanloo, I., et al.: Towards building a universal defect prediction model with rank transformed predictors. Empir. Softw. Eng. 21, 2107 (2016)
Limsettho, N., Hata, H., Monden, A., Matsumoto, K.: Unsupervised bug report categorization using clustering and labeling algorithm. Int. J. Softw. Eng. Knowl. Eng. 26(07), 1027–1053 (2016)
Zhang, W., Yang, Y., Wang, Q.: Using Bayesian regression and EM algorithm with missing handling for software effort prediction. Inf. Softw. Technol. 58, 58–70 (2015)
Rossi, B., Russo, B., Succi, G.: Analysis of open source software development iterations by means of burst detection techniques. In: Boldyreff, C., Crowston, K., Lundell, B., Wasserman, A.I. (eds.) OSS 2009. IFIP, vol. 299. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02032-2_9
Sehra, S.K., Kaur, J., Bra, Y.S., Kaur, N.: Analysis of data mining techniques for software effort estimation. In: 2014 11th International Conference on Information Technology: New Generations, Las Vegas, NV, pp. 633–638 (2014)
Gupta, S., Suma, V.: Data mining: a tool for knowledge discovery in human aspect of software engineering. In: 2015 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, pp. 1289–1293 (2015)
Han, W., Lung, C.H., Ajila, S.A.: Empirical investigation of code and process metrics for defect prediction. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), Taipei, pp. 436–439 (2016)
Karna, H., Gotovac, S.: Estimating software development effort using Bayesian networks. In: 2015 23rd International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, pp. 229–233 (2015)
Parashar, A., Chhabra, J.K.: Mining Class Association Rules from Dynamic Class Coupling Data to Measure Class Reusability Pattern. Tan Y., Shi Y., Chai Y., Wang G., (eds.) ICSI 2011. LNCS, vol. 6729, pp. 146–156. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21524-7_18
Damevski, K., Shepherd, D. C., Schneider, J. Pollock, L.: Mining sequences of developer interactions in visual studio for usage smells. IEEE Trans. Softw. Eng. 43(4), 359–371 (2017)
Chang, C-P., Chu, C-P.: Software defect prediction using intertransaction association rule mining. Int. J. Softw. Eng. Knowl. Eng. 19(06), 747–764 (2009)
Nessa, S., Abedin, M., Wong, W.E., Khan, L., Qi, Y.: Software fault localization using N-gram analysis. In: Li, Y., Huynh, D.T., Das, S.K., Du, D.Z. (eds.) WASA 2008. LNCS, vol. 5258, pp. 548–559. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88582-5_51
Eichinger, F., Krogmann, K., Klug, R., Böhm, K.: Software-defect localisation by mining dataflow-enabled call graphs. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 425–441. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_33
CASP, Critical Appraisal Skills Programme. https://casp-uk.net/. Accessed 15 Mar 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Fernández Del Carpio, A., Angarita, L.B. (2018). Techniques Based on Data Science for Software Processes: A Systematic Literature Review. In: Stamelos, I., O'Connor, R., Rout, T., Dorling, A. (eds) Software Process Improvement and Capability Determination. SPICE 2018. Communications in Computer and Information Science, vol 918. Springer, Cham. https://doi.org/10.1007/978-3-030-00623-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-00623-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00622-8
Online ISBN: 978-3-030-00623-5
eBook Packages: Computer ScienceComputer Science (R0)