ABSTRACT
Predicting time and effort of software task completion has been an active area of research for a long time. Previous studies have proposed predictive models based on either text data or metadata of software tasks to estimate either completion time or completion effort of software tasks, but there is a lack of focus in the literature on integrating all sets of attributes together to achieve better performing models. We first apply the previously proposed models on the datasets of two IBM commercial projects called RQM and RTC to find the best performing model in predicting task completion effort on each set of attributes. Then we propose an approach to create a hybrid model based on selected individual predictors to achieve more accurate and stable results in early prediction of task completion effort and to make sure the model is not bounded to some attributes and consequently is adoptable to a larger number of tasks. Categorizing task completion effort values into Low and High labels based on their measured median value, we show that our hybrid model provides 3-8% more accuracy in early prediction of task completion effort compared to the best individual predictors.
- W. Abdelmoez, M. Kholief, and F. M. Elsalmy. Bug fix-time prediction model using na¨ıve bayes classifier. In Comp. Theory and Applications (ICCTA), 2012 22nd Int’l Conf. on, pages 167–172. IEEE, 2012.Google Scholar
- J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proc. of the 28th international conference on Software engineering, pages 361–370. ACM, 2006. Google ScholarDigital Library
- P. Bhattacharya and I. Neamtiu. Bug-fix time prediction models: can we do better? In Proc. of the 8th Working Conference on Mining Software Repositories, pages 207–210. ACM, 2011. Google ScholarDigital Library
- K. O. Elish and M. O. Elish. Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5):649–660, 2008. Google ScholarDigital Library
- E. Giger, M. Pinzger, and H. Gall. Predicting the fix time of bugs. In Proc. of the 2nd International Workshop on Recommendation Systems for Software Engineering, pages 52–56. ACM, 2010. Google ScholarDigital Library
- D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson. Using the support vector machine as a classification method for software defect prediction with static code metrics. In Int’l Conf. on Eng. Apps. of Neural Networks, pages 223–234. Springer, 2009.Google Scholar
- P. J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy. Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In 2010 ACM/IEEE 32nd International Conf. on Software Eng., volume 1, pages 495–504. IEEE, 2010. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009. Google ScholarDigital Library
- R. Hewett and P. Kijsanayothin. On modeling software defect repair time. Empirical Software Engineering, 14(2):165–186, 2009. Google ScholarDigital Library
- M. Hofmann and R. Klinkenberg. RapidMiner: Data mining use cases and business analytics applications. CRC Press, 2013. Google ScholarDigital Library
- M. Jorgensen. What we do and don’t know about software development effort estimation. IEEE software, 31(2), 2014.Google Scholar
- Y. Kamei, T. Fukushima, S. McIntosh, K. Yamashita, N. Ubayashi, and A. E. Hassan. Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering, pages 1–35, 2015. Google ScholarDigital Library
- J. Kittler, M. Hatef, R. P. Duin, and J. Matas. On combining classifiers. IEEE transactions on pattern analysis and machine intelligence, 20(3):226–239, 1998. Google ScholarDigital Library
- L. Marks, Y. Zou, and A. E. Hassan. Studying the fix-time for bugs in large open source projects. In Proc. of the 7th International Conf. on Predictive Models in Software Eng., page 11. ACM, 2011. Google ScholarDigital Library
- A. T. Mısırlı, A. B. Bener, and B. Turhan. An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3):515–536, 2011. Google ScholarDigital Library
- L. D. Panjer. Predicting eclipse bug lifetimes. In Proc. of the Fourth International Workshop on mining software repositories, page 29. IEEE Computer Society, 2007. Google ScholarDigital Library
- D. Pfahl, S. Karus, and M. Stavnycha. Improving expert prediction of issue resolution time. In Proc. of the 20th Int’l Conf. on Evaluation and Assessment in Software Engineering, page 42. ACM, 2016. Google ScholarDigital Library
- S. W. Thomas, M. Nagappan, D. Blostein, and A. E. Hassan. The impact of classifier configuration and classifier combination on bug localization. IEEE Transactions on Soft. Eng., 39(10):1427–1443, 2013. Google ScholarDigital Library
- C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How long will it take to fix this bug? In Proc. of the Fourth International Workshop on Mining Software Repositories, page 1. IEEE Computer Society, 2007. Google ScholarDigital Library
- P. Willett. The porter stemming algorithm: then and now. Program, 40(3):219–223, 2006.Google ScholarCross Ref
- D. H. Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992. Google ScholarDigital Library
Index Terms
- A hybrid model for task completion effort estimation
Recommendations
Web effort estimation: the value of cross-company data set compared to single-company data set
PROMISE '12: Proceedings of the 8th International Conference on Predictive Models in Software EngineeringThis study investigates to what extent Web effort estimation models built using cross-company data sets can provide suitable effort estimates for Web projects belonging to another company, when compared to Web effort estimates obtained using that ...
Effort estimation: how valuable is it for a web company to use a cross-company data set, compared to using its own single-company data set?
WWW '07: Proceedings of the 16th international conference on World Wide WebPrevious studies comparing the prediction accuracy of effort models built using Web cross- and single-company data sets have been inconclusive, and as such replicated studies are necessary to determine under what circumstances a company can place ...
A Replicated Comparison of Cross-Company and Within-Company Effort Estimation Models Using the ISBSG Database
METRICS '05: Proceedings of the 11th IEEE International Software Metrics SymposiumFour years ago was the last time the ISBSG database was used to compare the effort prediction accuracy between cross-company and within-company cost models. Since then more than 2,000 projects have been volunteered to this database, which may have ...
Comments