skip to main content
10.1145/1294948.1294953acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Improving defect prediction using temporal features and non linear models

Published:03 September 2007Publication History

ABSTRACT

Predicting the defects in the next release of a large software system is a very valuable asset for the project manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases.

Using data obtained from the CVS and Bugzilla repositories of the Eclipse project, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the project. To that end we use standard tree-based induction algorithms in comparison with the traditional regression.

Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman's correlation of 0.96).

References

  1. M. Askari and R. Holt. Information theoretic evaluation of change prediction models for large-scale software. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 126--132, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7):653--661, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. E. Hassan and R. C. Holt. The top ten list: Dynamic fault prediction. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05), pages 263--272, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Joshi, C. Zhang, S. Ramaswamy, and C. Bayrak. Local and global recency weighting approach to bug prediction. In MSR 2007: International Workshop on Mining Software Repositories, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. M. Khoshgoftaar, E. B. Allen, N. Goel, A. Nandi, and J. McMullan. Detection of software modules with high debug code churn in a very large legacy system. In Proceedings of the Seventh International Symposium on Software Reliability Engineering, pages 364--371, White Plains, NY, 1996. IEEECS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Kiefer, A. Bernstein, and J. Tappolet. Analyzing software with isparql. In Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering (SWESE 2007). Springer, June 2007. to appear.Google ScholarGoogle Scholar
  7. P. Knab, M. Pinzger, and A. Bernstein. Predicting defect densities in source code files with decision tree learners. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 119--125, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM '00: Proceedings of the International Conference on Software Maintenance (ICSM'00), page 120, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Nagappan and T. Ball. Static analysis tools as early indicators of pre-release defect density. In ICSE '05: Proceedings of the 27th international conference on Software engineering, p580--586, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. J. Ostrand, E. J. Weyuker, and R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng., 31(4):340--355, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. J. Provost and T. Fawcett. Robust classification for imprecise environments. volume 42, pages 203--231, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343--348, Singapore, 1992.Google ScholarGoogle Scholar
  15. A. Schröter. Predicting defects and changes with import relations. In Proceedings of MSR 2007: International Workshop on Mining Software Repositories, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Sliwerski, T. Zimmermann, and A. Zeller. Hatari: Raising risk awareness (research demonstration). In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 107--110. ACM, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, second edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving defect prediction using temporal features and non linear models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
          September 2007
          122 pages
          ISBN:9781595937223
          DOI:10.1145/1294948

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 September 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Upcoming Conference

          FSE '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader