Comparing and experimenting machine learning techniques for code smell detection

Arcelli Fontana, Francesca; Mäntylä, Mika V.; Zanoni, Marco; Marino, Alessandro

doi:10.1007/s10664-015-9378-4

Comparing and experimenting machine learning techniques for code smell detection

Published: 06 June 2015

Volume 21, pages 1143–1191, (2016)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Francesca Arcelli Fontana¹,
Mika V. Mäntylä^2,3,
Marco Zanoni¹ &
…
Alessandro Marino¹

6570 Accesses
233 Citations
1 Altmetric
Explore all metrics

Abstract

Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of AI in classical software engineering

Article Open access 26 July 2020

Marco Barenkamp, Jonas Rebstadt & Oliver Thomas

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Amal Alazba, Hamoud Aljamaan & Mohammad Alshayeb

Challenges of Low-Code/No-Code Software Development: A Literature Review

Notes

http://en.wikipedia.org/wiki/Anti-spam_techniques
http://pmd.sourceforge.net/
B-J48 Pruned is the Pruned variant of J48, with AdaBoostM1 applied. All algorithms with a “B-” prefix in the name are boosted algorithms.

References

Aggarwal KK, Singh Y, Kaur A, Malhotra R (2006) Empirical study of object-oriented metrics. J Object Technol 5(8):149
Article Google Scholar
Arcelli Fontana F, Braione P, Zanoni M (2012) Automatic detection of bad smells in code: an experimental assessment. J Object Technol 11(2), p. 5:1
Arcelli Fontana F, Ferme V, Marino A, Walter B, Martenka P (2013a) Investigating the impact of code smells on system’s quality: an empirical study on systems of different application domains. Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM 2013), 260–269
Arcelli Fontana F, Zanoni M, Marino A, Mantyla MV (2013b) Code smell detection: towards a machine learning-based approach. In: Fontana A (ed) Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM 2013), IEEE, Eindhoven, The Netherlands, 396–399. doi:10.1109/ICSM.2013.56
Bansiya J, Davis CG (2002) A hierarchical model for object-oriented design quality assessment. IEEE Trans Softw Eng 28(1):4–17. doi:10.1109/32.979986
Article Google Scholar
Bengio Y, Grandvalet Y (2004) No unbiased estimator of the variance of k-fold cross-validation. J Mach Learn Res 5:1089–1105
MathSciNet MATH Google Scholar
Berander P (2004) Using students as subjects in requirements prioritization. Proceedings of the International Symposium on Empirical Software Engineering (ISESE’04), 167–176
Bowes D, Randall D, Hall T (2013) The inconsistent measurement of message chains. Proceedings of the 4th International Workshop on Emerging Trends in Software Metrics (WETSoM 2013). IEEE, San Francisco, CA, USA, 62–68
Capra E, Francalanci C, Merlo F, Rossi-Lamastra C (2011) Firms’ involvement in open source projects: a trade-off between software structural quality and popularity. J Syst Softw 84(1):144–161
Article Google Scholar
Carver J, Jaccheri L, Morasca S, Shull F (2003) Issues in using students in empirical studies in software engineering education. Proceedings of the Ninth International Software Metrics Symposium (METRICS 2003). IEEE, Sydney, Australia, 239–249
Chen Y-W, Lin C-J (2006) Combining SVMs with various feature selection strategies. Feature extraction. Springer Berlin Heidelberg, 315–324
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493. doi:10.1109/32.295895
Cohen PR, Jensen D (1997) Overfitting explained, Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics. Self published. Printed proceedings distributed at the workshop, 115–122 http://w3.sista.arizona.edu/~cohen/Publications/papers/cohen-ais96b.pdf
Dekkers A, Aarts E (1991) Global optimization and simulated annealing. Math Program 50:367–393
Article MathSciNet MATH Google Scholar
Deligiannis I, Stamelos I, Angelis L, Roumeliotis M, Shepperd M (2004) A controlled experiment investigation of an object-oriented design heuristic for maintainability. J Syst Softw 72(2):129–143
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dubey SK, Sharma A, Rana A (2012) Comparison of software quality metrics for object-oriented system. Int J Comput Sci Manag Stud 12:12–24
Google Scholar
Ferme V (2013) JCodeOdor: a software quality advisor through design flaws detection, Master’s thesis, University of Milano-Bicocca
Ferme V, Marino A, Arcelli Fontana F (2013) Is it a real code smell to be removed or not? Presented at the RefTest 2013 Workshop, co-located event with XP 2013 Conference, 15
Fowler M, Beck K (1999) Refactoring: improving the design of existing code. 1–82
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth International Conference on Machine Learning (ICML 1996), Bari, Italy, 148–156
Goldberg D (1989) Genetic algorithms in search, optimization, and machine learning, 1st edition. Addison-Wesley, Ed. Longman Publishing Co., Inc
Guéhéneuc Y-G, Sahraoui H, Zaidi F (2004) Fingerprinting design patterns. Proceedings. 11th Working Conference on Reverse Engineering (WCRE 2004), IEEE, Victoria, BC, Canada, 172–181. doi:10.1109/WCRE.2004.21
Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Article Google Scholar
Hall M, Frank E, Holmes G (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. doi:10.1145/1656274.1656278
Article Google Scholar
Hall T, Zhang M, Bowes D, Sun Y (2014) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol 23(4):33
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Hollander M, Wolfe DA, Chicken E (2000) Nonparametric statistical methods, 3rd edn. Wiley, New York, pp 39–55, 84-87
MATH Google Scholar
Höst M, Regnell B, Wohlin C (2000) Using students as subjects—a comparative study of students and professionals in lead-time impact assessment. Empir Softw Eng 5(3):201–214
Article MATH Google Scholar
Hsu C, Chang C, Lin C (2003) A practical guide to support vector classification. vol. 1, no. 1, pp. 5–8. URL: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. Accessed: 9 March 2015
Khomh F, Vaucher S, Guéhéneuc Y-G, Sahraoui H (2009) A bayesian approach for the detection of code and design smells. Proceedings of the 9th International Conference on Quality Software (QSIC 2009). IEEE, Jeju, pp 305–314. doi:10.1109/QSIC.2009.47
Google Scholar
F. Khomh, S. Vaucher, Y.-G. Guéhéneuc, and H. Sahraoui, “BDTEX: A GQM-based Bayesian approach for the detection of antipatterns,” Journal of Systems and Software, vol. 84, no. 4, pp. 559–572, Apr. 2011, Elsevier Inc. doi:10.1016/j.jss.2010.11.921.
Kline RM. Library Example. [Online]. Available: http://www.cs.wcupa.edu/~rkline/java/library.html. Accessed: 23 September 2013
Kreimer J (2005) Adaptive detection of design flaws. Electron Notes Theor Comput Sci 141(4):117–136. doi:10.1016/j.entcs.2005.02.059
Article Google Scholar
Lamkanfi A, Demeyer S (2010) Predicting the severity of a reported bug. In: 7th working conference on Mining Software Repositories (MSR 2010). IEEE, Cape Town, 1–10. doi:10.1109/MSR.2010.5463284
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Li W, Shatnawi R (2007) An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution. J Syst Softw 80(7):1120–1128. doi:10.1016/j.jss.2006.10.018, ISSN 0164–1212
Article Google Scholar
Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall, Ed
Maiga A, Ali N (2012) SMURF: a SVM-based incremental anti-pattern detection approach. Proceedings of the 19th Working Conference on Reverse Engineering (WCRE 2012). IEEE, Kingston, pp 466–475. doi:10.1109/WCRE.2012.56
Google Scholar
Maiga A, Ali N, Bhattacharya N, Sabané A, Guéhéneuc Y-G, Antoniol G, Aïmeur E (2012) Support vector machines for anti- pattern detection. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012). ACM, Essen, Germany, 278–281
Mäntylä MV, Vanhanen J, Lassenius C (2004) Bad smells-humans as code critics. Proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM 2004), IEEE, Chicago Illinois, USA, 399–408
Mäntylä MV, Lassenius C (2006) Subjective evaluation of software evolvability using code smells: an empirical study. Empir Softw Eng 11(3):395–431
Article Google Scholar
Marinescu R (2002) Measurement and quality in object-oriented design. Politechnica University of Timisoara http://loose.upt.ro./download/thesis/thesis.zip
Marinescu R (2005) Measurement and quality in object-oriented design. Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM 2005), IEEE, Budapest, Hungary, 701–704. doi:10.1109/ICSM.2005.63
Marinescu C, Marinescu R, Mihancea P, Ratiu D, Wettel R (2005) iPlasma: an integrated platform for quality assessment of object-oriented design. Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM 2005). IEEE, Budapest, Hungary, 77–80 http://loose.upt.ro./download/papers/marinescu-iPlasma.pdf
McKnight LK, Wilcox A, Hripcsak G (2002) The effect of sample size and disease prevalence on supervised machine learning of narrative data. Proc AMIA Symp. 519–22
Menzies T, Marcus A (2008) Automated severity assessment of software defect reports. Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008). IEEE, Beijing, Beijing, China 346–355. doi:10.1109/ICSM.2008.4658083
Moha N, Guéhéneuc Y-G, Duchien L, Le Meur A-F (2010) DECOR: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36
Article MATH Google Scholar
Moser R, Abrahamsson P, Pedrycz W, Sillitti A, Succi G (2008) A case study on the impact of refactoring on quality and productivity in an agile team. In: Meyer B, Nawrocki J, Walter B (eds) Balancing agility and formalism in software engineering. Lecture Notes in Computer Science, vol 5082. Springer Berlin Heidelberg, pp. 252–266
Murphy-Hill E, Black AP (2010) An interactive ambient visualization for code smells. In: Proceedings of the 5th international symposium on Software visualization (SOFTVIS’10). ACM, Salt Lake City, Utah, USA, 5–14. doi:10.1145/1879211.1879216
Navot A, Gilad-Bachrach R, Navot Y, Tishby N (2006) Is feature selection still necessary?. In: Subspace, latent structure and feature selection (pp. 127–138). Springer Berlin Heidelberg
Nongpong K (2012) Integrating ‘Code Smells’ detection with refactoring tool support. University of Wisconsin-Milwaukee http://dc.uwm.edu/etd/13
Olbrich SM, Cruzes DS, Sjoberg DIK (2010) Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems. Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM 2010). IEEE, Timișoara, Romania, pp 1–10 doi:10.1109/ICSM.2010.5609564
Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. Proceedings of the 28th IEEE/ACM 28th International Conference on Automated Software Engineering (ASE 2013). IEEE, Silicon Valley, CA, pp. 268–278 doi:10.1109/ASE.2013.6693086
Runeson P (2003) Using Students as Experimental Subjects -An analysis of Graduate and Freshmen PSP Student Data. Proceedings of the 7th international conference on empirical assessment in software engineering. Keele University, UK, pp 95–102
Sjøberg DIK, Yamashita AF, Anda BCD, Mockus A, Dybå T (2013) Quantifying the effect of code smells on maintenance effort. IEEE Trans Softw Eng 39(8):1144–1156
Article Google Scholar
Spinellis D (2008) A tale of four kernels. In: Proceedings of the 30th international conference on Software engineering (ICSE 2008). ACM, Leipzig, Germany, pp. 381–390 doi: 10.1145/1368088.1368140
Stamelos I, Angelis L, Oikonomou A, Bleris GL (2002) Code quality analysis in open source software development. Inf Syst J 12(1):43–60
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc 36(2):111–147
MathSciNet MATH Google Scholar
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378
Article MATH Google Scholar
Svahnberg M, Aurum A, Wohlin C (2008) Using students as subjects-an empirical evaluation. Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement (ESEM '08). ACM, Kaiserslautern, Germany, pp 288–290
Tempero E, Anslow C, Dietrich J, Han T, Li J, Lumpe M, Melton H, Noble J (2010) The qualitas corpus: a curated collection of java code for empirical studies. Proceedings of the 17th Asia Pacific Software Engineering Conference (APSEC 2010). IEEE, Sydney, NSW, Australia, pp 336–345 doi:10.1109/APSEC.2010.46
Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 19th Working Conference on Reverse Engineering. IEEE, Ontario, Canada, 215–224. doi:10.1109/WCRE.2012.31
Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empir Softw Eng 5(4):309–312
Article MathSciNet Google Scholar
Tsantalis N, Member S, Chatzigeorgiou A (2009) Identification of move method refactoring opportunities. IEEE Trans Softw Eng 35(3):347–367
Article Google Scholar
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. In NIPS (Vol. 12, pp. 668–674)
Wieman R (2011) Anti-pattern scanner: an approach to detect anti-patterns and design violations. Delft University of Technology. LAP Lambert Academic Publishing
Yamashita A (2014) Assessing the capability of code smells to explain maintenance problems: an empirical study combining quantitative and qualitative data. J Empir Softw Eng 19(4):1111–1143
Article Google Scholar
Yang J, Hotta K, Higo Y, Igaki H, Kusumoto S (2012) Filtering clones for individual user based on machine learning analysis. Proceedings of the 6th International Workshop on Software Clones (IWSC 2012). IEEE, Zurich, pp 76–77. doi:10.1109/IWSC.2012.6227872
Book Google Scholar
Zazworka N, Shaw MA, Shull F, Seaman C (2011) Investigating the impact of design debt on software quality. Proceedings of the 2nd Workshop on Managing Technical Debt. ACM, Waikiki, Honolulu, HI, USA, pp. 17–23 doi:10.1002/smr.521
Zhang M, Hall T, Baddoo N (2011) Code Bad Smells : a review of current knowledge. J Softw Maint Evol Res Pract 23(3):179–202. doi:10.1002/smr.521
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Milano-Bicocca, Milano, Italy
Francesca Arcelli Fontana, Marco Zanoni & Alessandro Marino
University of Oulu, Oulu, Finland
Mika V. Mäntylä
Aalto University, Helsinki, Finland
Mika V. Mäntylä

Authors

Francesca Arcelli Fontana
View author publications
You can also search for this author in PubMed Google Scholar
Mika V. Mäntylä
View author publications
You can also search for this author in PubMed Google Scholar
Marco Zanoni
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Marino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Zanoni.

Additional information

Communicated by: Tim Menzies

Appendix

Table 22

Table 22 Projects characteristics

Full size table

Table 23

Table 23 Metric names

Full size table

Table 24

Table 24 Custom metrics names

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arcelli Fontana, F., Mäntylä, M.V., Zanoni, M. et al. Comparing and experimenting machine learning techniques for code smell detection. Empir Software Eng 21, 1143–1191 (2016). https://doi.org/10.1007/s10664-015-9378-4

Download citation

Published: 06 June 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10664-015-9378-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing and experimenting machine learning techniques for code smell detection

Abstract

Access this article

Similar content being viewed by others

Applications of AI in classical software engineering

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Challenges of Low-Code/No-Code Software Development: A Literature Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparing and experimenting machine learning techniques for code smell detection

Abstract

Access this article

Similar content being viewed by others

Applications of AI in classical software engineering

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Challenges of Low-Code/No-Code Software Development: A Literature Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation