ABSTRACT
To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, usually without exploring alternatives. For example, one promising alternative is GHOST (from TSE'21) that relies on a combination of hyper-parameter optimization of feedforward neural networks and a novel oversampling technique.
The prior study from TSE'21 proposing this novel "fuzzy sampling" was somewhat limited in that the method was tested on defect prediction, but nothing else. Like defect prediction, code smell detection datasets have a class imbalance (which motivated "fuzzy sampling"). Hence, in this work we test if fuzzy sampling is useful for code smell detection.
The results of this paper show that we can achieve better than state-of-the-art results on code smell detection with fuzzy oversampling. For example, for "feature envy", we were able to achieve 99+% AUC across all our datasets, and on 8/10 datasets for "misplaced class". While our specific results refer to code smell detection, they do suggest other lessons for other kinds of analytics. For example: (a) try better preprocessing before trying complex learners (b) include simpler learners as a baseline in software analytics (c) try "fuzzy sampling" as one such baseline.
In order to support others trying to reproduce/extend/refute this work, all our code and data is available online at https://github.com/yrahul3910/code-smell-detection.
- Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, and Tim Menzies. 2019. How to" DODGE" Complex Software Analytics. IEEE Transactions on Software Engineering (2019).Google Scholar
- Amritanshu Agrawal and Tim Menzies. 2018. Is" Better Data" Better Than" Better Data Miners"?. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1050--1061.Google ScholarDigital Library
- Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng Shen, and Tim Menzies. 2021. Simpler Hyperparameter Optimization for Software Analytics: Why, How, When. IEEE Transactions on Software Engineering (2021).Google Scholar
- Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google Scholar
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1--29.Google ScholarDigital Library
- Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138.Google ScholarCross Ref
- Kent Beck, Martin Fowler, and Grandma Beck. 1999. Bad smells in code. Refactoring: Improving the design of existing code 1, 1999 (1999), 75--88.Google Scholar
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002), 321--357.Google ScholarDigital Library
- Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation. In Proceedings of the 40th International Conference on Software Engineering. 665--676.Google ScholarDigital Library
- Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems 30 (2017).Google Scholar
- Jang Hyun Cho and Bharath Hariharan. 2019. On the Efficacy of Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Chaiyong Ragkhitwetsagul, and Aditya Ghose. 2021. Automatically recommending components for issue reports using deep learning. Empirical Software Engineering 26, 2 (2021), 1--39.Google ScholarDigital Library
- Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. 2015. The loss surfaces of multilayer networks. In Artificial intelligence and statistics. PMLR, 192--204.Google Scholar
- George Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303--314.Google Scholar
- Ignatios Deligiannis, Ioannis Stamelos, Lefteris Angelis, Manos Roumeliotis, and Martin Shepperd. 2004. A controlled experiment investigation of an object-oriented design heuristic for maintainability. Journal of Systems and Software 72, 2 (2004), 129--143.Google ScholarCross Ref
- Simon S Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. 2018. Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054 (2018).Google Scholar
- Francesca Arcelli Fontana, Mika V Mäntylä, Marco Zanoni, and Alessandro Marino. 2016. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering 21, 3 (2016), 1143--1191.Google ScholarDigital Library
- Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V Mäntylä. 2013. Code smell detection: Towards a machine learning-based approach. In 2013 IEEE International Conference on Software Maintenance. IEEE, 396--399.Google ScholarDigital Library
- Lukas Galke and Ansgar Scherp. 2021. Forget me not: A Gentle Reminder to Mind the Simple Multi-Layer Perceptron Baseline for Text Classification. arXiv preprint arXiv:2109.03777 (2021).Google Scholar
- Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Thomas Zimmermann. 2021. Automating the removal of obsolete TODO comments. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 218--229.Google ScholarDigital Library
- Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789--1819.Google ScholarDigital Library
- Melinda R Hess and Jeffrey D Kromrey. 2004. Robust confidence intervals for effect sizes: A comparative study of Cohen'sd and Cliff's delta under non-normality and heterogeneous variances. In annual meeting of the American Educational Research Association. Citeseer, 1--30.Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks 2, 5 (1989), 359--366.Google ScholarDigital Library
- Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. arXiv preprint arXiv:1806.07572 (2018).Google Scholar
- Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016).Google Scholar
- Can Li, Ling Xu, Meng Yan, and Yan Lei. 2020. TagDC: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering. Journal of Systems and Software 170 (2020), 110783.Google ScholarCross Ref
- Hui Liu, Jiahao Jin, Zhifeng Xu, Yifan Bu, Yanzhen Zou, and Lu Zhang. 2019. Deep learning based code smell detection. IEEE transactions on Software Engineering (2019).Google ScholarCross Ref
- Tim Menzies, Suvodeep Majumder, Nikhila Balaji, Katie Brey, and Wei Fu. 2018. 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). IEEE, 554--563.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
- Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. 2014. On the number of linear regions of deep neural networks. arXiv preprint arXiv:1402.1869 (2014).Google Scholar
- Raimund Moser, Pekka Abrahamsson, Witold Pedrycz, Alberto Sillitti, and Giancarlo Succi. 2007. A case study on the impact of refactoring on quality and productivity in an agile team. In IFIP Central and East European Conference on Software Engineering Techniques. Springer, 252--266.Google Scholar
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Icml.Google Scholar
- Fabio Palomba. 2015. Textual analysis for code smell detection. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 769--771.Google ScholarCross Ref
- Priyadarshini Panda and Kaushik Roy. 2016. Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 299--306.Google ScholarCross Ref
- Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.Google ScholarCross Ref
- Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693.Google ScholarCross Ref
- Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, and Zhi Jin. 2021. Integrating Tree Path in Transformer for Code Representation. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- Mary Phuong and Christoph Lampert. 2019. Towards understanding knowledge distillation. In International Conference on Machine Learning. PMLR, 5142--5151.Google Scholar
- Julian Aron Aron Prenner and Romain Robbes. 2021. Making the most of small Software Engineering datasets with modern machine learning. IEEE Transactions on Software Engineering (2021).Google Scholar
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
- David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.Google Scholar
- Dilan Sahin, Marouane Kessentini, Slim Bechikh, and Kalyanmoy Deb. 2014. Code-smell detection as a bilevel problem. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 1 (2014), 1--44.Google ScholarDigital Library
- Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Mądry. 2018. How does batch normalization help optimization?. In Proceedings of the 32nd international conference on neural information processing systems. 2488--2498.Google Scholar
- Jan Schumacher, Nico Zazworka, Forrest Shull, Carolyn Seaman, and Michele Shaw. 2010. Building empirical support for automated code smell detection. In Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement. 1--10.Google ScholarDigital Library
- Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, Alzheimer's Disease Neuroimaging Initiative, et al. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101 (2014), 569--582.Google ScholarCross Ref
- Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A curated collection of Java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference. IEEE, 336--345.Google ScholarDigital Library
- Cody Allen Watson. 2020. Deep Learning in Software Engineering. Ph.D. Dissertation. College of William & Mary.Google Scholar
- Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering. 1--10.Google ScholarDigital Library
- Aiko Yamashita and Leon Moonen. 2013. To what extent can maintenance problems be predicted by code smell detection?-An empirical study. Information and Software Technology 55, 12 (2013), 2223--2242.Google ScholarDigital Library
- Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. 2018. Convolutional neural networks: an overview and application in radiology. Insights into imaging 9, 4 (2018), 611--629.Google Scholar
- Rahul Yedida and Tim Menzies. 2021. On the Value of Oversampling for Deep Learning in Software Defect Prediction. IEEE Transactions on Software Engineering (2021).Google Scholar
- Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4133--4141.Google ScholarCross Ref
- Nico Zazworka, Michele A Shaw, Forrest Shull, and Carolyn Seaman. 2011. Investigating the impact of design debt on software quality. In Proceedings of the 2nd Workshop on Managing Technical Debt. 17--23.Google ScholarDigital Library
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarCross Ref
- Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, and Jim Laredo. 2021. Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation. arXiv preprint arXiv:2109.03341 (2021).Google Scholar
- Difan Zou, Yuan Cao, Dongruo Zhou, and Quanquan Gu. 2020. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning 109, 3 (2020), 467--492.Google ScholarCross Ref
Index Terms
- How to improve deep learning for software analytics: (a case study with code smell detection)
Recommendations
Using developers' feedback to improve code smell detection
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingSeveral studies are focused on the study of code smells and many detection techniques have been proposed. In this scenario, the use of rules involving software-metrics has been widely used in refactoring tools as a mechanism to detect code smells ...
Developers’ perception matters: machine learning to detect developer-sensitive smells
AbstractCode smells are symptoms of poor design that hamper software evolution and maintenance. Hence, code smells should be detected as early as possible to avoid software quality degradation. However, the notion of whether a design and/or implementation ...
Automatic detection of Long Method and God Class code smells through neural source code embeddings
Highlights- We compare machine learning approaches against heuristics for code smell detection.
AbstractCode smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but ...
Comments