research-article

Public Access

How to improve deep learning for software analytics: (a case study with code smell detection)

Authors:
Rahul Yedida

NC State University

NC State University
View Profile

,
Tim Menzies

NC State University

NC State University
View Profile

MSR '22: Proceedings of the 19th International Conference on Mining Software RepositoriesMay 2022Pages 156–166https://doi.org/10.1145/3524842.3528458

Published:17 October 2022Publication History

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

Pages 156–166

ABSTRACT

To reduce technical debt and make code more maintainable, it is important to be able to warn programmers about code smells. State-of-the-art code small detectors use deep learners, usually without exploring alternatives. For example, one promising alternative is GHOST (from TSE'21) that relies on a combination of hyper-parameter optimization of feedforward neural networks and a novel oversampling technique.

The prior study from TSE'21 proposing this novel "fuzzy sampling" was somewhat limited in that the method was tested on defect prediction, but nothing else. Like defect prediction, code smell detection datasets have a class imbalance (which motivated "fuzzy sampling"). Hence, in this work we test if fuzzy sampling is useful for code smell detection.

The results of this paper show that we can achieve better than state-of-the-art results on code smell detection with fuzzy oversampling. For example, for "feature envy", we were able to achieve 99+% AUC across all our datasets, and on 8/10 datasets for "misplaced class". While our specific results refer to code smell detection, they do suggest other lessons for other kinds of analytics. For example: (a) try better preprocessing before trying complex learners (b) include simpler learners as a baseline in software analytics (c) try "fuzzy sampling" as one such baseline.

In order to support others trying to reproduce/extend/refute this work, all our code and data is available online at https://github.com/yrahul3910/code-smell-detection.

References

Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, and Tim Menzies. 2019. How to" DODGE" Complex Software Analytics. IEEE Transactions on Software Engineering (2019).Google Scholar
Amritanshu Agrawal and Tim Menzies. 2018. Is" Better Data" Better Than" Better Data Miners"?. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 1050--1061.Google ScholarDigital Library
Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng Shen, and Tim Menzies. 2021. Simpler Hyperparameter Optimization for Software Analytics: Why, How, When. IEEE Transactions on Software Engineering (2021).Google Scholar
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google Scholar
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1--29.Google ScholarDigital Library
Muhammad Ilyas Azeem, Fabio Palomba, Lin Shi, and Qing Wang. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115--138.Google ScholarCross Ref
Kent Beck, Martin Fowler, and Grandma Beck. 1999. Bad smells in code. Refactoring: Improving the design of existing code 1, 1999 (1999), 75--88.Google Scholar
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002), 321--357.Google ScholarDigital Library
Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation. In Proceedings of the 40th International Conference on Software Engineering. 665--676.Google ScholarDigital Library
Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems 30 (2017).Google Scholar
Jang Hyun Cho and Bharath Hariharan. 2019. On the Efficacy of Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Chaiyong Ragkhitwetsagul, and Aditya Ghose. 2021. Automatically recommending components for issue reports using deep learning. Empirical Software Engineering 26, 2 (2021), 1--39.Google ScholarDigital Library
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. 2015. The loss surfaces of multilayer networks. In Artificial intelligence and statistics. PMLR, 192--204.Google Scholar
George Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303--314.Google Scholar
Ignatios Deligiannis, Ioannis Stamelos, Lefteris Angelis, Manos Roumeliotis, and Martin Shepperd. 2004. A controlled experiment investigation of an object-oriented design heuristic for maintainability. Journal of Systems and Software 72, 2 (2004), 129--143.Google ScholarCross Ref
Simon S Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. 2018. Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054 (2018).Google Scholar
Francesca Arcelli Fontana, Mika V Mäntylä, Marco Zanoni, and Alessandro Marino. 2016. Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering 21, 3 (2016), 1143--1191.Google ScholarDigital Library
Francesca Arcelli Fontana, Marco Zanoni, Alessandro Marino, and Mika V Mäntylä. 2013. Code smell detection: Towards a machine learning-based approach. In 2013 IEEE International Conference on Software Maintenance. IEEE, 396--399.Google ScholarDigital Library
Lukas Galke and Ansgar Scherp. 2021. Forget me not: A Gentle Reminder to Mind the Simple Multi-Layer Perceptron Baseline for Text Classification. arXiv preprint arXiv:2109.03777 (2021).Google Scholar
Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Thomas Zimmermann. 2021. Automating the removal of obsolete TODO comments. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 218--229.Google ScholarDigital Library
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789--1819.Google ScholarDigital Library
Melinda R Hess and Jeffrey D Kromrey. 2004. Robust confidence intervals for effect sizes: A comparative study of Cohen'sd and Cliff's delta under non-normality and heterogeneous variances. In annual meeting of the American Educational Research Association. Citeseer, 1--30.Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural networks 2, 5 (1989), 359--366.Google ScholarDigital Library
Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. arXiv preprint arXiv:1806.07572 (2018).Google Scholar
Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016).Google Scholar
Can Li, Ling Xu, Meng Yan, and Yan Lei. 2020. TagDC: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering. Journal of Systems and Software 170 (2020), 110783.Google ScholarCross Ref
Hui Liu, Jiahao Jin, Zhifeng Xu, Yifan Bu, Yanzhen Zou, and Lu Zhang. 2019. Deep learning based code smell detection. IEEE transactions on Software Engineering (2019).Google ScholarCross Ref
Tim Menzies, Suvodeep Majumder, Nikhila Balaji, Katie Brey, and Wei Fu. 2018. 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). IEEE, 554--563.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, and Yoshua Bengio. 2014. On the number of linear regions of deep neural networks. arXiv preprint arXiv:1402.1869 (2014).Google Scholar
Raimund Moser, Pekka Abrahamsson, Witold Pedrycz, Alberto Sillitti, and Giancarlo Succi. 2007. A case study on the impact of refactoring on quality and productivity in an agile team. In IFIP Central and East European Conference on Software Engineering Techniques. Springer, 252--266.Google Scholar
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Icml.Google Scholar
Fabio Palomba. 2015. Textual analysis for code smell detection. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 769--771.Google ScholarCross Ref
Priyadarshini Panda and Kaushik Roy. 2016. Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 299--306.Google ScholarCross Ref
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.Google ScholarCross Ref
Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2020. A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169 (2020), 110693.Google ScholarCross Ref
Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, and Zhi Jin. 2021. Integrating Tree Path in Transformer for Code Representation. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Mary Phuong and Christoph Lampert. 2019. Towards understanding knowledge distillation. In International Conference on Machine Learning. PMLR, 5142--5151.Google Scholar
Julian Aron Aron Prenner and Romain Robbes. 2021. Making the most of small Software Engineering datasets with modern machine learning. IEEE Transactions on Software Engineering (2021).Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533--536.Google Scholar
Dilan Sahin, Marouane Kessentini, Slim Bechikh, and Kalyanmoy Deb. 2014. Code-smell detection as a bilevel problem. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 1 (2014), 1--44.Google ScholarDigital Library
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Mądry. 2018. How does batch normalization help optimization?. In Proceedings of the 32nd international conference on neural information processing systems. 2488--2498.Google Scholar
Jan Schumacher, Nico Zazworka, Forrest Shull, Carolyn Seaman, and Michele Shaw. 2010. Building empirical support for automated code smell detection. In Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement. 1--10.Google ScholarDigital Library
Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, Alzheimer's Disease Neuroimaging Initiative, et al. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101 (2014), 569--582.Google ScholarCross Ref
Ewan Tempero, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. The Qualitas Corpus: A curated collection of Java code for empirical studies. In 2010 Asia Pacific Software Engineering Conference. IEEE, 336--345.Google ScholarDigital Library
Cody Allen Watson. 2020. Deep Learning in Software Engineering. Ph.D. Dissertation. College of William & Mary.Google Scholar
Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering. 1--10.Google ScholarDigital Library
Aiko Yamashita and Leon Moonen. 2013. To what extent can maintenance problems be predicted by code smell detection?-An empirical study. Information and Software Technology 55, 12 (2013), 2223--2242.Google ScholarDigital Library
Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi. 2018. Convolutional neural networks: an overview and application in radiology. Insights into imaging 9, 4 (2018), 611--629.Google Scholar
Rahul Yedida and Tim Menzies. 2021. On the Value of Oversampling for Deep Learning in Software Defect Prediction. IEEE Transactions on Software Engineering (2021).Google Scholar
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4133--4141.Google ScholarCross Ref
Nico Zazworka, Michele A Shaw, Forrest Shull, and Carolyn Seaman. 2011. Investigating the impact of design debt on software quality. In Proceedings of the 2nd Workshop on Managing Technical Debt. 17--23.Google ScholarDigital Library
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarCross Ref
Yufan Zhuang, Sahil Suneja, Veronika Thost, Giacomo Domeniconi, Alessandro Morari, and Jim Laredo. 2021. Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation. arXiv preprint arXiv:2109.03341 (2021).Google Scholar
Difan Zou, Yuan Cao, Dongruo Zhou, and Quanquan Gu. 2020. Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning 109, 3 (2020), 467--492.Google ScholarCross Ref

Index Terms

How to improve deep learning for software analytics: (a case study with code smell detection)
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering

Recommendations

Using developers' feedback to improve code smell detection
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Several studies are focused on the study of code smells and many detection techniques have been proposed. In this scenario, the use of rules involving software-metrics has been widely used in refactoring tools as a mechanism to detect code smells ...
Read More
Developers’ perception matters: machine learning to detect developer-sensitive smells
Abstract
Code smells are symptoms of poor design that hamper software evolution and maintenance. Hence, code smells should be detected as early as possible to avoid software quality degradation. However, the notion of whether a design and/or implementation ...
Read More
Automatic detection of Long Method and God Class code smells through neural source code embeddings
Highlights
- We compare machine learning approaches against heuristics for code smell detection.
Abstract
Code smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
May 2022
815 pages
ISBN:9781450393034
DOI:10.1145/3524842
General Chair:
David Lo
Singapore Management University, Singapore
,
Program Chairs:
Shane McIntosh
University of Waterloo, Canada
,
Nicole Novielli
University of Bari, Italy
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
autoencoders
code smell detection
deep learning
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 187
  Total Downloads
- Downloads (Last 12 months)144
- Downloads (Last 6 weeks)28
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How to improve deep learning for software analytics: (a case study with code smell detection)

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using developers' feedback to improve code smell detection

Developers’ perception matters: machine learning to detect developer-sensitive smells

Automatic detection of Long Method and God Class code smells through neural source code embeddings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

How to improve deep learning for software analytics: (a case study with code smell detection)

MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using developers' feedback to improve code smell detection

Developers’ perception matters: machine learning to detect developer-sensitive smells

Automatic detection of Long Method and God Class code smells through neural source code embeddings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media