Skip to main content

Techniques Based on Data Science for Software Processes: A Systematic Literature Review

  • Conference paper
  • First Online:
Software Process Improvement and Capability Determination (SPICE 2018)

Abstract

Software quality is an important topic for software practitioners in order to guarantee how the system is built and performed. In last years, techniques related to data science have been utilized in software engineering field as sup-port for building mainly prediction models. These approaches focused on trying to minimize software problems during the development and performance of software, helping to make right decisions. This systematic literature review (SLR) aims at investigating the significant techniques of data sciences used in software processes, identifying their major impacts and problems/challenges of use. This review will be of interest for software practitioners concerned on software quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cao, L.: Data science: a comprehensive overview. ACM Comput. Surv. (CSUR) 50(3), 43 (2017)

    Article  Google Scholar 

  2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  3. Godfrey, M.W., Hassan, A.E., Herbsleb, J., Murphy, G.C., Robillard, M., Devanbu, P., et al.: Future of mining software archives: a roundtable. IEEE Softw. 26, 67–70 (2009)

    Article  Google Scholar 

  4. Kitchenham, B.: Procedures for performing systematic reviews. Keele, UK, Keele University 33, 1–26 (2004)

    Google Scholar 

  5. Licorish, S.A., MacDonell, S.G.: Communication and personality profiles of global software developers. Inf. Softw. Technol. 64, 113–131 (2015)

    Article  Google Scholar 

  6. Finlay, J., Pears, R., Connor, A.M.: Data stream mining for predicting software build outcomes using source code metrics. Inf. Softw. Technol. 56(2), 183–198 (2014)

    Article  Google Scholar 

  7. Rodríguez, D., Sicilia, M.A., García, E., Harrison, R.: Empirical findings on team size and productivity in software development. J. Syst. Softw. 85(3), 562–570 (2012)

    Article  Google Scholar 

  8. André, M., Baldoquín, M.G., Acuña, S.T.: Formal model for assigning human resources to teams in software projects. Inf. Softw. Technol. 53(3), 259–275 (2011)

    Article  Google Scholar 

  9. Li, J., Li, M., Wu, D., Dai, Q., Song, H.: A Bayesian networks-based risk identification approach for software process risk: the context of chinese trustworthy software. Int. J. Inf. Technol. Decis. Making 15(06), 1391–1412 (2016)

    Article  Google Scholar 

  10. Madera, M., Tomoń, R.: A case study on machine learning model for code review expert system in software engineering. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Prague, pp. 1357–1363 (2017)

    Google Scholar 

  11. Alipour, A., Hindle, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection. In: 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, pp. 183–192 (2013)

    Google Scholar 

  12. Araújo, A.A., Paixao, M., Yeltsin, I., et al.: An architecture based on interactive optimization and machine learning applied to the next release problem. Autom. Softw. Eng. 24, 623 (2017)

    Article  Google Scholar 

  13. Murillo-Morera, J., Castro-Herrera, C., Arroyo, J., Fuentes-Fernández, R.: An automated defect prediction framework using genetic algorithms: a validation of empirical studies. Intel. Artif. 19(57), 114–137 (2016)

    Article  Google Scholar 

  14. Huang, J., Li, Y-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)

    Article  Google Scholar 

  15. Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43(1), 1–18 (2017)

    Google Scholar 

  16. Barcelos-Tronto, I.F., Simões da Silva, J.D., Sant’Anna, N.: An investigation of artificial neural networks based prediction systems in software project management. J. Syst. Softw. 81(3), 356–367 (2008)

    Google Scholar 

  17. He, Z., Shu, F., Yang, Y., et al.: An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Engi. 19, 167 (2012)

    Article  Google Scholar 

  18. Seo, Y.-S., Bae, D.-H., Jeffery, R.: AREION: software effort estimation based on multiple regressions with adaptive recursive data partitioning. Inf. Softw. Technol. 55(10), 1710–1725 (2013)

    Article  Google Scholar 

  19. Jonsson, L., Borg, M., Broman, D., et al.: Automated bug assignment: ensemble-based machine learning in large scale industrial contexts. Empir. Softw. Eng. 21, 1533 (2016)

    Article  Google Scholar 

  20. Pandey, N., Sanyal, D.K., Hudait, A., et al.: Automated classification of software issue reports using machine learning techniques: an empirical study. Innov. Syst. Softw. Eng. 13, 279 (2017)

    Article  Google Scholar 

  21. Vargas-Baldrich, S., Linares-Vásquez, M., Poshyvanyk, D.: Automated tagging of software projects using bytecode and dependencies (N). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, pp. 289–294 (2015)

    Google Scholar 

  22. Bhattacharya, P., Neamtiu, J., Shelton, C.R.: Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J. Syst. Softw. 85(10), 2275–2292 (2012)

    Article  Google Scholar 

  23. Mendes, E., Mosley, N.: Bayesian network models for web effort prediction: a comparative study. IEEE Trans. Softw. Eng. 34(6), 723–737 (2008)

    Article  Google Scholar 

  24. Misirli, T., Bener, A.B.: Bayesian networks for evidence-based decision-making in software engineering. IEEE Trans. Softw. Eng. 40(6), 533–554 (2014)

    Article  Google Scholar 

  25. Mauša, G., Galinac-Grbac, T.: Co-evolutionary multi-population genetic programming for classification in software defect prediction: an empirical case study. Appl. Soft Comput. 55, 331–351 (2017)

    Article  Google Scholar 

  26. Bibi, S., Stamelos, I., Angelis, L.: Combining probabilistic models for explanatory productivity estimation. Inf. Softw. Technol. 50(7–8), 656–669 (2008)

    Article  Google Scholar 

  27. Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Softw. Eng. 38(2), 375–397 (2012)

    Article  Google Scholar 

  28. Ryu, D., Baik, J.: Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl. Soft Comput. 49, 1062–1077 (2016)

    Article  Google Scholar 

  29. Keung, J., Kocaguneli, E., Menzies, T.: Finding conclusion stability for selecting the best effort predictor in software effort estimation. Autom. Softw. Eng. 20, 543 (2013)

    Article  Google Scholar 

  30. Huang, Q., Shihab, E., Xia, X., et al.: Identifying self-admitted technical debt in open source projects using text mining. Empir. Softw. Eng. 23, 418 (2018)

    Article  Google Scholar 

  31. Idri, A., Hosni, M., Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. 49, 990–1019 (2016)

    Article  Google Scholar 

  32. Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: Increasing the accuracy of software development effort estimation using projects clustering. IET Softw. 6(6), 461–473 (2012)

    Article  Google Scholar 

  33. Kaushik, A., Tayal, D.K., Yadav, K., Kaur, A.: Integrating firefly algorithm in artificial neural network models for accurate software cost predictions. J. Softw. Evol. Process 28(8), 665–688 (2016)

    Google Scholar 

  34. Menzies, T., et al.: Learning project management decisions: a case study with case-based reasoning versus data farming. IEEE Trans. Softw. Eng. 39(12), 1698–1713 (2013)

    Article  Google Scholar 

  35. Menzies, T., et al.: Local versus global lessons for defect prediction and effort estimation. IEEE Trans. Softw. Eng. 39(6), 822–834 (2013)

    Article  Google Scholar 

  36. Malhotra, R., Jangra, R.: Prediction & assessment of change prone classes using statistical & machine learning techniques. J. Inf. Process. Syst. 13(4), 778–804 (2017)

    Google Scholar 

  37. Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans. Softw. Eng. 39(4), 537–551 (2013)

    Article  Google Scholar 

  38. Bou-Nassif, A., Ho, D., Capretz, L.F.: Towards an early software estimation using log-linear regression and a multilayer perceptron model. J. Syst. Softw. 86(1), 144–160 (2013)

    Article  Google Scholar 

  39. Zhang, F., Mockus, A., Keivanloo, I., et al.: Towards building a universal defect prediction model with rank transformed predictors. Empir. Softw. Eng. 21, 2107 (2016)

    Article  Google Scholar 

  40. Limsettho, N., Hata, H., Monden, A., Matsumoto, K.: Unsupervised bug report categorization using clustering and labeling algorithm. Int. J. Softw. Eng. Knowl. Eng. 26(07), 1027–1053 (2016)

    Article  Google Scholar 

  41. Zhang, W., Yang, Y., Wang, Q.: Using Bayesian regression and EM algorithm with missing handling for software effort prediction. Inf. Softw. Technol. 58, 58–70 (2015)

    Article  Google Scholar 

  42. Rossi, B., Russo, B., Succi, G.: Analysis of open source software development iterations by means of burst detection techniques. In: Boldyreff, C., Crowston, K., Lundell, B., Wasserman, A.I. (eds.) OSS 2009. IFIP, vol. 299. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02032-2_9

    Google Scholar 

  43. Sehra, S.K., Kaur, J., Bra, Y.S., Kaur, N.: Analysis of data mining techniques for software effort estimation. In: 2014 11th International Conference on Information Technology: New Generations, Las Vegas, NV, pp. 633–638 (2014)

    Google Scholar 

  44. Gupta, S., Suma, V.: Data mining: a tool for knowledge discovery in human aspect of software engineering. In: 2015 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, pp. 1289–1293 (2015)

    Google Scholar 

  45. Han, W., Lung, C.H., Ajila, S.A.: Empirical investigation of code and process metrics for defect prediction. In: 2016 IEEE Second International Conference on Multimedia Big Data (BigMM), Taipei, pp. 436–439 (2016)

    Google Scholar 

  46. Karna, H., Gotovac, S.: Estimating software development effort using Bayesian networks. In: 2015 23rd International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, pp. 229–233 (2015)

    Google Scholar 

  47. Parashar, A., Chhabra, J.K.: Mining Class Association Rules from Dynamic Class Coupling Data to Measure Class Reusability Pattern. Tan Y., Shi Y., Chai Y., Wang G., (eds.) ICSI 2011. LNCS, vol. 6729, pp. 146–156. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21524-7_18

    Google Scholar 

  48. Damevski, K., Shepherd, D. C., Schneider, J. Pollock, L.: Mining sequences of developer interactions in visual studio for usage smells. IEEE Trans. Softw. Eng. 43(4), 359–371 (2017)

    Article  Google Scholar 

  49. Chang, C-P., Chu, C-P.: Software defect prediction using intertransaction association rule mining. Int. J. Softw. Eng. Knowl. Eng. 19(06), 747–764 (2009)

    Article  Google Scholar 

  50. Nessa, S., Abedin, M., Wong, W.E., Khan, L., Qi, Y.: Software fault localization using N-gram analysis. In: Li, Y., Huynh, D.T., Das, S.K., Du, D.Z. (eds.) WASA 2008. LNCS, vol. 5258, pp. 548–559. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88582-5_51

    Chapter  Google Scholar 

  51. Eichinger, F., Krogmann, K., Klug, R., Böhm, K.: Software-defect localisation by mining dataflow-enabled call graphs. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6321, pp. 425–441. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15880-3_33

    Chapter  Google Scholar 

  52. CASP, Critical Appraisal Skills Programme. https://casp-uk.net/. Accessed 15 Mar 2018

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alvaro Fernández Del Carpio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernández Del Carpio, A., Angarita, L.B. (2018). Techniques Based on Data Science for Software Processes: A Systematic Literature Review. In: Stamelos, I., O'Connor, R., Rout, T., Dorling, A. (eds) Software Process Improvement and Capability Determination. SPICE 2018. Communications in Computer and Information Science, vol 918. Springer, Cham. https://doi.org/10.1007/978-3-030-00623-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00623-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00622-8

  • Online ISBN: 978-3-030-00623-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics