skip to main content
10.1145/1294948.1294954acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Learning from bug-introducing changes to prevent fault prone code

Published:03 September 2007Publication History

ABSTRACT

A version control system, such as CVS/SVN, can provide the history of software changes performed during the evolution of a software project. Among all the changes performed there are some which cause the introduction of bugs, often resolved later with other changes.

In this paper we use a technique to identify bug-introducing changes to train a model that can be used to predict if a new change may introduces or not a bug. We represent software changes as elements of a n-dimensional vector space of terms coordinates extracted from source code snapshots.

The evaluation of various learning algorithms on a set of open source projects looks very promising, in particular for KNN (K-Nearest Neighbor algorithm) where a significant tradeoff between precision and recall has been obtained.

References

  1. D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Mach. Learn., 6(1):37--66, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 28(10):970--983, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Antoniol, M. Di Penta, and E. Merlo. An automatic approach to identify class evolution discontinuities. In IWPSE '04: Proceedings of the International Workshop on Principles of Software Evolution. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22(10):751--761, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng., 31(4):340--355, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Bouktif, Y.-G. Gueheneuc, and G. Antoniol. Extracting change-patterns from cvs repositories. In WCRE '06: Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006), pages 221--230, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Canfora, L. Cerulo, and M. Di Penta. Identifying changed source code lines from version repositories. In MSR '07: Proceedings of the 29th International Conference on Software Engineering Workshops. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Gall, M. Jazayeri, and J. Krajewski. CVS release history data for detecting logical couplings. In IWPSE '03: Proceedings of the 6th International Workshop on Principles of Software Evolution, page 13. IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7):653--661, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Gyimóthy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Software Eng., 31(10):897--910, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. M. Khoshgoftaar and E. B. Allen. Ordering fault-prone software modules. Software Quality Control, 11(1):19--37, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Kim, K. Pan, and J. E. E. James Whitehead. Memories of bug fixes. In SIGSOFT '06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pages 35--45. ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kim, T. Zimmermann, K. Pan, and E. J. J. Whitehead. Automatic identification of bug-introducing changes. In Proceedings of the 21st IEEE International Conference on Automated Software Engineering (ASE'06), pages 81--90, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kim, T. Zimmermann, E. J. Whitehead, and A. Zeller. Predicting faults from cached history. In Proceedings of the 29th IEEE International Conference on Software Engineering (ICSE'07), Minneapolis, MN, USA, 2007 (to appear). IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Landwehr, M. Hall, and E. Frank. Logistic model trees. Mach. Learn., 59(1--2):161--205, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Livshits and T. Zimmermann. Dynamine: finding common error patterns by mining software revision histories. SIGSOFT Softw. Eng. Notes, 30(5):296--305, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In MSR '07: Proceedings of the 29th International Conference on Software Engineering Workshops. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM '00: Proceedings of the International Conference on Software Maintenance (ICSM'00), page 120, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Ribeiro-neto and Baeza-yates. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Śliwerski, T. Zimmermann, and A. Zeller. Hatari: raising risk awareness. In ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 107--110. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, pages 1--5, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36:111--147, 1974.Google ScholarGoogle Scholar
  25. G. I. Webb. Multiboosting: A technique for combining boosting and wagging. Mach. Learn., 40(2):159--196, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. K. Yin. Case Study Research: Design and Methods - Third Edition. SAGE Publications, London, 2002.Google ScholarGoogle Scholar
  28. A. Zeller. The future of programming environments: Integration, synergy, and assistance. In L. Briand and A. Wolf, editors, Future of Software Engineering. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 563--572. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Learning from bug-introducing changes to prevent fault prone code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IWPSE '07: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
      September 2007
      122 pages
      ISBN:9781595937223
      DOI:10.1145/1294948

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 September 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader