Skip to main content
Log in

Feature selection for software effort estimation with localized neighborhood mutual information

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Feature selection is usually employed before applying case based reasoning (CBR) for Software Effort Estimation (SEE). Unfortunately, most feature selection methods treat CBR as a black box method so there is no guarantee on the appropriateness of CBR on selected feature subset. The key to solve the problem is to measure the appropriateness of CBR assumption for a given feature set. In this paper, a measure called localized neighborhood mutual information (LNI) is proposed for this purpose and a greedy method called LNI based feature selection (LFS) is designed for feature selection. Experiment with leave-one-out cross validation (LOOCV) on 6 benchmark datasets demonstrates that: (1) CBR makes effective estimation with the LFS selected subset compared with a randomized baseline method. Compared with three representative feature selection methods, (2) LFS achieves optimal MAR value on 3 out of 6 datasets with a 14% average improvement and (3) LFS achieves optimal MMRE on 5 out of 6 datasets with a 24% average improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches: a survey. Ann. Softw. Eng. 10(1–4), 177–205 (2000)

    Article  Google Scholar 

  2. Vergara, J.R., Esté, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)

    Article  Google Scholar 

  3. Fernandes, S.L., Gurupur, V.P., Sunder, N.R., Arunkumar, N., Kadry, S.: A novel nonintrusive decision support approach for heart rate measurement. Pattern Recognit. Lett. (2017). https://doi.org/10.1016/j.patrec.2017.07.002

    Article  Google Scholar 

  4. Keung, J.W., Kitchenham, B.A., Jeffery, D.R.: Analogy-X: providing statistical inference to analogy-based software cost estimation. IEEE Softw. Eng. Trans. 34(4), 471–484 (2008)

    Article  Google Scholar 

  5. Guyon, I., Elisseeff, A.E.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  6. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In; proceedings of the ICML (2003)

  7. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)

    Article  Google Scholar 

  8. Esté, V., Pablo, A., et al.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)

    Article  Google Scholar 

  9. Liu, H., et al.: Feature selection with dynamic mutual information. Pattern Recogn. 42(7), 1330–1339 (2009)

    Article  Google Scholar 

  10. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  11. Hu, Q., et al.: Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 38(9), 10737–10750 (2011)

    Article  Google Scholar 

  12. Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton (1999)

    Google Scholar 

  13. Cover, T.M., Thomas, J.A., Kieffer, J.: Elements of information theory. SIAM Rev. 36(3), 509–510 (1994)

    Article  Google Scholar 

  14. Arunkumar, N., Kumar, K.R., Venkataraman, V.: Automatic detection of epileptic seizures using new entropy measures. J. Med. Imaging Health Inform 6(3), 724–730 (2016)

    Article  Google Scholar 

  15. Menzies, T., Krishna, R., Pryor D.: The promise repository of empirical software engineering data. (2015)

  16. Van Hulse, J., Khoshgoftaar, T.M.: A comprehensive empirical evaluation of missing value imputation in noisy software measurement data. J. Syst. Softw. 81(5), 691–708 (2008)

    Article  Google Scholar 

  17. Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)

    Article  Google Scholar 

  18. Kitchenham, B.A., et al. What accuracy statistics really measure [software estimation]. In: Proceedings in Software, IEE (2001)

    Article  Google Scholar 

  19. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  20. Kampenes, V.B., et al.: A systematic review of effect size in software engineering experiments. Inf. Softw. Technol. 49(11), 1073–1086 (2007)

    Article  Google Scholar 

  21. Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V., Valentine, J.C. (eds.) The Handbook of Research Synthesis, pp. 231–244. Russell Sage Foundation, New York (1994)

    Google Scholar 

  22. Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Academic Press, Hillsdale (1988)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Xiao, J. & Zhu, H. Feature selection for software effort estimation with localized neighborhood mutual information. Cluster Comput 22 (Suppl 3), 6953–6961 (2019). https://doi.org/10.1007/s10586-018-1884-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-1884-x

Keywords

Navigation