ABSTRACT
A version control system, such as CVS/SVN, can provide the history of software changes performed during the evolution of a software project. Among all the changes performed there are some which cause the introduction of bugs, often resolved later with other changes.
In this paper we use a technique to identify bug-introducing changes to train a model that can be used to predict if a new change may introduces or not a bug. We represent software changes as elements of a n-dimensional vector space of terms coordinates extracted from source code snapshots.
The evaluation of various learning algorithms on a set of open source projects looks very promising, in particular for KNN (K-Nearest Neighbor algorithm) where a significant tradeoff between precision and recall has been obtained.
- D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Mach. Learn., 6(1):37--66, 1991. Google ScholarDigital Library
- G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 28(10):970--983, October 2002. Google ScholarDigital Library
- G. Antoniol, M. Di Penta, and E. Merlo. An automatic approach to identify class evolution discontinuities. In IWPSE '04: Proceedings of the International Workshop on Principles of Software Evolution. IEEE Computer Society, 2004. Google ScholarDigital Library
- V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22(10):751--761, 1996. Google ScholarDigital Library
- R. M. Bell. Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng., 31(4):340--355, 2005. Google ScholarDigital Library
- S. Bouktif, Y.-G. Gueheneuc, and G. Antoniol. Extracting change-patterns from cvs repositories. In WCRE '06: Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006), pages 221--230, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- G. Canfora, L. Cerulo, and M. Di Penta. Identifying changed source code lines from version repositories. In MSR '07: Proceedings of the 29th International Conference on Software Engineering Workshops. IEEE Computer Society. Google ScholarDigital Library
- C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20(3):273--297, 1995. Google ScholarDigital Library
- H. Gall, M. Jazayeri, and J. Krajewski. CVS release history data for detecting logical couplings. In IWPSE '03: Proceedings of the 6th International Workshop on Principles of Software Evolution, page 13. IEEE Computer Society, 2003. Google ScholarDigital Library
- T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7):653--661, 2000. Google ScholarDigital Library
- T. Gyimóthy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Software Eng., 31(10):897--910, 2005. Google ScholarDigital Library
- T. M. Khoshgoftaar and E. B. Allen. Ordering fault-prone software modules. Software Quality Control, 11(1):19--37, 2003. Google ScholarDigital Library
- S. Kim, K. Pan, and J. E. E. James Whitehead. Memories of bug fixes. In SIGSOFT '06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pages 35--45. ACM Press, 2006. Google ScholarDigital Library
- S. Kim, T. Zimmermann, K. Pan, and E. J. J. Whitehead. Automatic identification of bug-introducing changes. In Proceedings of the 21st IEEE International Conference on Automated Software Engineering (ASE'06), pages 81--90, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- S. Kim, T. Zimmermann, E. J. Whitehead, and A. Zeller. Predicting faults from cached history. In Proceedings of the 29th IEEE International Conference on Software Engineering (ICSE'07), Minneapolis, MN, USA, 2007 (to appear). IEEE Computer Society. Google ScholarDigital Library
- N. Landwehr, M. Hall, and E. Frank. Logistic model trees. Mach. Learn., 59(1--2):161--205, 2005. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. Dynamine: finding common error patterns by mining software revision histories. SIGSOFT Softw. Eng. Notes, 30(5):296--305, 2005. Google ScholarDigital Library
- O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In MSR '07: Proceedings of the 29th International Conference on Software Engineering Workshops. IEEE Computer Society. Google ScholarDigital Library
- A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM '00: Proceedings of the International Conference on Software Maintenance (ICSM'00), page 120, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
- J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. Google ScholarDigital Library
- B. Ribeiro-neto and Baeza-yates. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- J. Śliwerski, T. Zimmermann, and A. Zeller. Hatari: raising risk awareness. In ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 107--110. ACM Press, 2005. Google ScholarDigital Library
- J. Śliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, pages 1--5, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36:111--147, 1974.Google Scholar
- G. I. Webb. Multiboosting: A technique for combining boosting and wagging. Mach. Learn., 40(2):159--196, 2000. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann, 2005. Google ScholarDigital Library
- R. K. Yin. Case Study Research: Design and Methods - Third Edition. SAGE Publications, London, 2002.Google Scholar
- A. Zeller. The future of programming environments: Integration, synergy, and assistance. In L. Briand and A. Wolf, editors, Future of Software Engineering. IEEE Computer Society, 2007. Google ScholarDigital Library
- T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 563--572. IEEE Computer Society, 2004. Google ScholarDigital Library
- Learning from bug-introducing changes to prevent fault prone code
Recommendations
An empirical study on the effect of community smells on bug prediction
AbstractCommunity-aware metrics through socio-technical developer networks or organizational structures have already been studied in the software bug prediction field. Community smells are also proposed to identify communication and collaboration patterns ...
An Effective Approach for Routing the Bug Reports to the Right Fixers
Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on InternetwareRouting the bug reports to potential fixers (i.e., bug triaging), is an integral step in software development and maintenance. However, manually inspecting and assigning bug reports is tedious and time-consuming, especially in those software projects ...
Studying the laws of software evolution in a long-lived FLOSS project
Some free, open-source software projects have been around for quite a long time, the longest living ones dating from the early 1980s. For some of them, detailed information about their evolution is available in source code management systems tracking ...
Comments