Article

Improving defect prediction using temporal features and non linear models

Authors:
Abraham Bernstein

University of Zurich, Switzerland

University of Zurich, Switzerland
View Profile

,
Jayalath Ekanayake

University of Zurich, Switzerland

University of Zurich, Switzerland
View Profile

,
Martin Pinzger

University of Zurich, Switzerland

University of Zurich, Switzerland
View Profile

IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meetingSeptember 2007Pages 11–18https://doi.org/10.1145/1294948.1294953

Published:03 September 2007Publication History

IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting

Pages 11–18

ABSTRACT

Predicting the defects in the next release of a large software system is a very valuable asset for the project manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases.

Using data obtained from the CVS and Bugzilla repositories of the Eclipse project, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the project. To that end we use standard tree-based induction algorithms in comparison with the traditional regression.

Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman's correlation of 0.96).

References

M. Askari and R. Holt. Information theoretic evaluation of change prediction models for large-scale software. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 126--132, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7):653--661, 2000. Google ScholarDigital Library
A. E. Hassan and R. C. Holt. The top ten list: Dynamic fault prediction. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05), pages 263--272, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
H. Joshi, C. Zhang, S. Ramaswamy, and C. Bayrak. Local and global recency weighting approach to bug prediction. In MSR 2007: International Workshop on Mining Software Repositories, 2007. Google ScholarDigital Library
T. M. Khoshgoftaar, E. B. Allen, N. Goel, A. Nandi, and J. McMullan. Detection of software modules with high debug code churn in a very large legacy system. In Proceedings of the Seventh International Symposium on Software Reliability Engineering, pages 364--371, White Plains, NY, 1996. IEEECS. Google ScholarDigital Library
C. Kiefer, A. Bernstein, and J. Tappolet. Analyzing software with isparql. In Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007). Springer, June 2007. to appear.Google Scholar
P. Knab, M. Pinzger, and A. Bernstein. Predicting defect densities in source code files with decision tree learners. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 119--125, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997. Google ScholarDigital Library
A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM '00: Proceedings of the International Conference on Software Maintenance (ICSM'00), page 120, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
N. Nagappan and T. Ball. Static analysis tools as early indicators of pre-release defect density. In ICSE '05: Proceedings of the 27th international conference on Software engineering, p580--586, 2005. Google ScholarDigital Library
T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng., 31(4):340--355, 2005. Google ScholarDigital Library
F. J. Provost and T. Fawcett. Robust classification for imprecise environments. volume 42, pages 203--231, 2001. Google ScholarDigital Library
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. Google ScholarDigital Library
R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343--348, Singapore, 1992.Google Scholar
A. Schröter. Predicting defects and changes with import relations. In Proceedings of MSR 2007: International Workshop on Mining Software Repositories, 2007. Google ScholarDigital Library
J. Sliwerski, T. Zimmermann, and A. Zeller. Hatari: Raising risk awareness (research demonstration). In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 107--110. ACM, September 2005. Google ScholarDigital Library
I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, second edition, 2005. Google ScholarDigital Library
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse, May 2007. Google ScholarDigital Library

Index Terms

Improving defect prediction using temporal features and non linear models

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Prediction of software defects works well within projects as long as there is a sufficient amount of data available to train any models. However, this is rarely the case for new software projects and for many companies. So far, only a few have studies ...
Read More
Time variance and defect prediction in software projects

It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project's goals. In this paper we first ...
Read More
Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
September 2007
122 pages
ISBN:9781595937223
DOI:10.1145/1294948
Program Chairs:
Massimiliano Di Penta
RCOST --- Università degli Studi del Sannio, Italy
,
Michele Lanza
University of Lugano, Switzerland
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
decision tree learner
defect prediction
mining software repository
Qualifiers
- Article
Conference
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 698
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving defect prediction using temporal features and non linear models

IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Time variance and defect prediction in software projects

Heterogeneous defect prediction