skip to main content
10.1145/3524842.3528458acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Public Access

How to improve deep learning for software analytics: (a case study with code smell detection)

Published:17 October 2022Publication History

ABSTRACT

To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, usually without exploring alternatives. For example, one promising alternative is GHOST (from TSE'21) that relies on a combination of hyper-parameter optimization of feedforward neural networks and a novel oversampling technique.

The prior study from TSE'21 proposing this novel "fuzzy sampling" was somewhat limited in that the method was tested on defect prediction, but nothing else. Like defect prediction, code smell detection datasets have a class imbalance (which motivated "fuzzy sampling"). Hence, in this work we test if fuzzy sampling is useful for code smell detection.

The results of this paper show that we can achieve better than state-of-the-art results on code smell detection with fuzzy oversampling. For example, for "feature envy", we were able to achieve 99+% AUC across all our datasets, and on 8/10 datasets for "misplaced class". While our specific results refer to code smell detection, they do suggest other lessons for other kinds of analytics. For example: (a) try better preprocessing before trying complex learners (b) include simpler learners as a baseline in software analytics (c) try "fuzzy sampling" as one such baseline.

In order to support others trying to reproduce/extend/refute this work, all our code and data is available online at https://github.com/yrahul3910/code-smell-detection.

References

  1. Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, and Tim Menzies. 2019. How to" DODGE" Complex Software Analytics. IEEE Transactions on Software Engineering (2019).Google ScholarGoogle Scholar
  2. Amritanshu Agrawal and Tim Menzies. 2018. Is" Better Data" Better Than" Better Data Miners"?. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1050--1061.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng Shen, and Tim Menzies. 2021. Simpler Hyperparameter Optimization for Software Analytics: Why, How, When. IEEE Transactions on Software Engineering (2021).Google ScholarGoogle Scholar
  4. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google ScholarGoogle Scholar
  5. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kent Beck, Martin Fowler, and Grandma Beck. 1999. Bad smells in code. Refactoring: Improving the design of existing code 1, 1999 (1999), 75--88.Google ScholarGoogle Scholar
  8. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002), 321--357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation. In Proceedings of the 40th International Conference on Software Engineering. 665--676.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  11. Jang Hyun Cho and Bharath Hariharan. 2019. On the Efficacy of Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  12. Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Chaiyong Ragkhitwetsagul, and Aditya Ghose. 2021. Automatically recommending components for issue reports using deep learning. Empirical Software Engineering 26, 2 (2021), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. 2015. The loss surfaces of multilayer networks. In Artificial intelligence and statistics. PMLR, 192--204.Google ScholarGoogle Scholar
  14. George Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303--314.Google ScholarGoogle Scholar
  15. Ignatios Deligiannis, Ioannis Stamelos, Lefteris Angelis, Manos Roumeliotis, and Martin Shepperd. 2004. A controlled experiment investigation of an object-oriented design heuristic for maintainability. Journal of Systems and Software 72, 2 (2004), 129--143.Google ScholarGoogle ScholarCross RefCross Ref
  16. Simon S Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. 2018. Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054 (2018).Google ScholarGoogle Scholar
  17. Francesca Arcelli Fontana, Mika V Mäntylä, Marco Zanoni, and Alessandro Marino. 2016. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering 21, 3 (2016), 1143--1191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V Mäntylä. 2013. Code smell detection: Towards a machine learning-based approach. In 2013 IEEE International Conference on Software Maintenance. IEEE, 396--399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lukas Galke and Ansgar Scherp. 2021. Forget me not: A Gentle Reminder to Mind the Simple Multi-Layer Perceptron Baseline for Text Classification. arXiv preprint arXiv:2109.03777 (2021).Google ScholarGoogle Scholar
  20. Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Thomas Zimmermann. 2021. Automating the removal of obsolete TODO comments. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 218--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789--1819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Melinda R Hess and Jeffrey D Kromrey. 2004. Robust confidence intervals for effect sizes: A comparative study of Cohen'sd and Cliff's delta under non-normality and heterogeneous variances. In annual meeting of the American Educational Research Association. Citeseer, 1--30.Google ScholarGoogle Scholar
  23. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  24. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks 2, 5 (1989), 359--366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. arXiv preprint arXiv:1806.07572 (2018).Google ScholarGoogle Scholar
  27. Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016).Google ScholarGoogle Scholar
  28. Can Li, Ling Xu, Meng Yan, and Yan Lei. 2020. TagDC: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering. Journal of Systems and Software 170 (2020), 110783.Google ScholarGoogle ScholarCross RefCross Ref
  29. Hui Liu, Jiahao Jin, Zhifeng Xu, Yifan Bu, Yanzhen Zou, and Lu Zhang. 2019. Deep learning based code smell detection. IEEE transactions on Software Engineering (2019).Google ScholarGoogle ScholarCross RefCross Ref
  30. Tim Menzies, Suvodeep Majumder, Nikhila Balaji, Katie Brey, and Wei Fu. 2018. 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). IEEE, 554--563.Google ScholarGoogle Scholar
  31. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  32. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google ScholarGoogle Scholar
  33. Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. 2014. On the number of linear regions of deep neural networks. arXiv preprint arXiv:1402.1869 (2014).Google ScholarGoogle Scholar
  34. Raimund Moser, Pekka Abrahamsson, Witold Pedrycz, Alberto Sillitti, and Giancarlo Succi. 2007. A case study on the impact of refactoring on quality and productivity in an agile team. In IFIP Central and East European Conference on Software Engineering Techniques. Springer, 252--266.Google ScholarGoogle Scholar
  35. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Icml.Google ScholarGoogle Scholar
  36. Fabio Palomba. 2015. Textual analysis for code smell detection. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 769--771.Google ScholarGoogle ScholarCross RefCross Ref
  37. Priyadarshini Panda and Kaushik Roy. 2016. Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 299--306.Google ScholarGoogle ScholarCross RefCross Ref
  38. Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.Google ScholarGoogle ScholarCross RefCross Ref
  39. Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693.Google ScholarGoogle ScholarCross RefCross Ref
  40. Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, and Zhi Jin. 2021. Integrating Tree Path in Transformer for Code Representation. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  41. Mary Phuong and Christoph Lampert. 2019. Towards understanding knowledge distillation. In International Conference on Machine Learning. PMLR, 5142--5151.Google ScholarGoogle Scholar
  42. Julian Aron Aron Prenner and Romain Robbes. 2021. Making the most of small Software Engineering datasets with modern machine learning. IEEE Transactions on Software Engineering (2021).Google ScholarGoogle Scholar
  43. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  44. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.Google ScholarGoogle Scholar
  45. Dilan Sahin, Marouane Kessentini, Slim Bechikh, and Kalyanmoy Deb. 2014. Code-smell detection as a bilevel problem. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 1 (2014), 1--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Mądry. 2018. How does batch normalization help optimization?. In Proceedings of the 32nd international conference on neural information processing systems. 2488--2498.Google ScholarGoogle Scholar
  47. Jan Schumacher, Nico Zazworka, Forrest Shull, Carolyn Seaman, and Michele Shaw. 2010. Building empirical support for automated code smell detection. In Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement. 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, Alzheimer's Disease Neuroimaging Initiative, et al. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101 (2014), 569--582.Google ScholarGoogle ScholarCross RefCross Ref
  49. Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A curated collection of Java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference. IEEE, 336--345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Cody Allen Watson. 2020. Deep Learning in Software Engineering. Ph.D. Dissertation. College of William & Mary.Google ScholarGoogle Scholar
  51. Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering. 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Aiko Yamashita and Leon Moonen. 2013. To what extent can maintenance problems be predicted by code smell detection?-An empirical study. Information and Software Technology 55, 12 (2013), 2223--2242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. 2018. Convolutional neural networks: an overview and application in radiology. Insights into imaging 9, 4 (2018), 611--629.Google ScholarGoogle Scholar
  54. Rahul Yedida and Tim Menzies. 2021. On the Value of Oversampling for Deep Learning in Software Defect Prediction. IEEE Transactions on Software Engineering (2021).Google ScholarGoogle Scholar
  55. Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4133--4141.Google ScholarGoogle ScholarCross RefCross Ref
  56. Nico Zazworka, Michele A Shaw, Forrest Shull, and Carolyn Seaman. 2011. Investigating the impact of design debt on software quality. In Proceedings of the 2nd Workshop on Managing Technical Debt. 17--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarGoogle ScholarCross RefCross Ref
  58. Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, and Jim Laredo. 2021. Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation. arXiv preprint arXiv:2109.03341 (2021).Google ScholarGoogle Scholar
  59. Difan Zou, Yuan Cao, Dongruo Zhou, and Quanquan Gu. 2020. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning 109, 3 (2020), 467--492.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. How to improve deep learning for software analytics: (a case study with code smell detection)

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
        May 2022
        815 pages
        ISBN:9781450393034
        DOI:10.1145/3524842

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader