Skip to main content

A Two-Step Feature Space Transforming Method to Improve Credit Scoring Performance

  • Conference paper
  • First Online:
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019)

Abstract

The increasing amount of credit offered by financial institutions has required intelligent and efficient methodologies of credit scoring. Therefore, the use of different machine learning solutions to that task has been growing during the past recent years. Such procedures have been used in order to identify customers who are reliable or unreliable, with the intention to counterbalance financial losses due to loans offered to wrong customer profiles. Notwithstanding, such an application of machine learning suffers with several limitations when put into practice, such as unbalanced datasets and, specially, the absence of sufficient information from the features that can be useful to discriminate reliable and unreliable loans. To overcome such drawbacks, we propose in this work a Two-Step Feature Space Transforming approach, which operates by evolving feature information in a twofold operation: (i) data enhancement; and (ii) data discretization. In the first step, additional meta-features are used in order to improve data discrimination. In the second step, the goal is to reduce the diversity of features. Experiments results performed in real-world datasets with different levels of unbalancing show that such a step can improve, in a consistent way, the performance of the best machine learning algorithm for such a task. With such results we aim to open new perspectives for novel efficient credit scoring systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ecb.europa.eu.

  2. 2.

    ftp://ftp.ics.uci.edu/pub/machine-learning-databases/statlog/.

  3. 3.

    http://scikit-learn.org.

References

  1. Abellán, J., Castellano, J.G.: A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 73, 1–10 (2017)

    Article  Google Scholar 

  2. Adhikari, R.: A neural network based linear ensemble framework for time series forecasting. Neurocomputing 157, 231–242 (2015)

    Article  Google Scholar 

  3. Ala’raj, M., Abbod, M.F.: A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Syst. Appl. 64, 36–55 (2016)

    Article  Google Scholar 

  4. Attenberg, J., Provost, F.J.: Inactive learning?: difficulties employing active learning in practice. SIGKDD Explor. 12(2), 36–41 (2010). https://doi.org/10.1145/1964897.1964906

    Article  Google Scholar 

  5. Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. Noise Reduction in Speech Processing, pp. 1–4. Springer, Berlin (2009)

    Google Scholar 

  6. Bequé, A., Lessmann, S.: Extreme learning machines for credit scoring: an empirical evaluation. Expert Syst. Appl. 86, 42–53 (2017)

    Article  Google Scholar 

  7. Boratto, L., Carta, S., Fenu, G., Saia, R.: Using neural word embeddings to model user behavior and detect user segments. Knowledge-Based Syst. 108, 5–14 (2016)

    Article  Google Scholar 

  8. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12(6), e0177678 (2017)

    Article  Google Scholar 

  9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  10. de Castro Vieira, J.R., Barboza, F., Sobreiro, V.A., Kimura, H.: Machine learning models for credit analysis improvements: predicting low-income families’ default. Appl. Soft Comput. 83, 105640 (2019). https://doi.org/10.1016/j.asoc.2019.105640. http://www.sciencedirect.com/science/article/pii/S156849461930420X

  11. Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geoscientific Model Dev. 7(3), 1247–1250 (2014)

    Article  Google Scholar 

  12. Chatterjee, A., Segev, A.: Data manipulation in heterogeneous databases. ACM SIGMOD Rec. 20(4), 64–68 (1991)

    Article  Google Scholar 

  13. Chen, B., Zeng, W., Lin, Y.: Applications of artificial intelligence technologies in credit scoring: a survey of literature. In: International Conference on Natural Computation (ICNC), pp. 658–664, August 2014

    Google Scholar 

  14. Chen, N., Ribeiro, B., Chen, A.: Financial credit risk assessment: a recent review. Artif. Intell. Rev. 45(1), 1–23 (2016)

    Article  Google Scholar 

  15. Chopra, A., Bhilare, P.: Application of ensemble models in credit scoring models. Bus. Perspect. Res. 6(2), 129–141 (2018)

    Article  Google Scholar 

  16. Cleary, S., Hebb, G.: An efficient and functional model for predicting bank distress: in and out of sample evidence. J. Bank. Finance 64, 101–111 (2016)

    Article  Google Scholar 

  17. Crook, J.N., Edelman, D.B., Thomas, L.C.: Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183(3), 1447–1465 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)

    Article  Google Scholar 

  19. Damrongsakmethee, T., Neagoe, V.-E.: Principal component analysis and relieff cascaded with decision tree for credit scoring. In: Silhavy, R. (ed.) CSOC 2019. AISC, vol. 985, pp. 85–95. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19810-7_9

    Chapter  Google Scholar 

  20. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492

    Article  Google Scholar 

  21. Economics, T.: Euro area consumer credit (2019). https://tradingeconomics.com/euro-area/consumer-credit?continent=europe

  22. Economics, T.: Euro area consumer spending (2019). https://tradingeconomics.com/euro-area/consumer-spending?continent=europe

  23. Fang, F., Chen, Y.: A new approach for credit scoring by directly maximizing the kolmogorov-smirnov statistic. Comput. Stat. Data Anal. 133, 180–194 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. 65, 139–151 (2018). https://doi.org/10.1016/j.asoc.2018.01.021. http://www.sciencedirect.com/science/article/pii/S1568494618300279

  25. Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016). https://doi.org/10.1145/2959100.2959175

  26. García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Analytics 1(1), 9 (2016)

    Article  Google Scholar 

  27. Ghodselahi, A.: A hybrid support vector machine ensemble model for credit scoring. Int. J. Comput. Appl. 17(5), 1–5 (2011)

    Google Scholar 

  28. Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Mach. Learn. 54(3), 187–193 (2004)

    Article  Google Scholar 

  29. Guo, S., He, H., Huang, X.: A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 7, 78549–78559 (2019)

    Article  Google Scholar 

  30. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  31. Hashem, I.A.T., Anuar, N.B., Gani, A., Yaqoob, I., Xia, F., Khan, S.U.: Mapreduce: review and open challenges. Scientometrics 109(1), 389–422 (2016)

    Article  Google Scholar 

  32. Hassan, M.K., Brodmann, J., Rayfield, B., Huda, M.: Modeling credit risk in credit unions using survival analysis. Int. J. Bank Mark. 36(3), 482–495 (2018)

    Article  Google Scholar 

  33. Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)

    Article  MathSciNet  Google Scholar 

  34. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  35. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)

    Article  Google Scholar 

  36. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  37. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  MATH  Google Scholar 

  38. Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251. IEEE (2013)

    Google Scholar 

  39. Khemais, Z., Nesrine, D., Mohamed, M., et al.: Credit scoring and default risk prediction: a comparative study between discriminant analysis & logistic regression. Int. J. Econ. Finance 8(4), 39 (2016)

    Article  Google Scholar 

  40. Khemakhem, S., Ben Said, F., Boujelbene, Y.: Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines. J. Modell. Manage. 13(4), 932–951 (2018)

    Article  Google Scholar 

  41. Laha, A.: Developing credit scoring models with SOM and fuzzy rule based k-NN classifiers. In: IEEE International Conference on Fuzzy Systems, pp. 692–698, July 2006. https://doi.org/10.1109/FUZZY.2006.1681786

  42. Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)

    Article  MATH  Google Scholar 

  43. Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41(4), 2065–2073 (2014). https://doi.org/10.1016/j.eswa.2013.09.005

    Article  Google Scholar 

  44. Liu, C., Huang, H., Lu, S.: Research on personal credit scoring model based on artificial intelligence. In: Sugumaran, V., Xu, Z., P., S., Zhou, H. (eds.) MMIA 2019. AISC, vol. 929, pp. 466–473. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15740-1_64

    Chapter  Google Scholar 

  45. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  46. López, R.F., Ramon-Jeronimo, J.M.: Modelling credit risk with scarce default data: on the suitability of cooperative bootstrapped strategies for small low-default portfolios. JORS 65(3), 416–434 (2014). https://doi.org/10.1057/jors.2013.119

    Article  Google Scholar 

  47. López, J., Maldonado, S.: Profit-based credit scoring based on robust optimization and feature selection. Inf. Sci. 500, 190–202 (2019)

    Article  MathSciNet  Google Scholar 

  48. Luo, C., Wu, D., Wu, D.: A deep learning approach for credit scoring using credit default swaps. Eng. Appl. Artif. Intell. 65, 465–470 (2017)

    Article  Google Scholar 

  49. Luque, A., Carrasco, A., Martín, A., de las Heras, A.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 91, 216–231 (2019)

    Article  Google Scholar 

  50. Maldonado, S., Peters, G., Weber, R.: Credit scoring using three-way decisions with probabilistic rough sets. Inf. Sci. (2018). https://doi.org/10.1016/j.ins.2018.08.001. http://www.sciencedirect.com/science/article/pii/S0020025518306078

  51. Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015)

    Article  Google Scholar 

  52. Mester, L.J., et al.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)

    Google Scholar 

  53. Neagoe, V., Ciotec, A., Cucu, G.: Deep convolutional neural networks versus multilayer perceptron for financial prediction. In: International Conference on Communications (COMM), pp. 201–206, June 2018

    Google Scholar 

  54. Pasila, F.: Credit scoring modeling of Indonesian micro, small and medium enterprises using neuro-fuzzy algorithm. In: IEEE International Conference on Fuzzy Systems, pp. 1–6, June 2019. https://doi.org/10.1109/FUZZ-IEEE.2019.8858841

  55. Powers, D.: Evaluation: from precision, recall and f-factor to roc, informedness, markedness & correlation. Mach. Learn. Technol. 2, January 2008

    Google Scholar 

  56. Rapach, D.E., Wohar, M.E.: In-sample vs. out-of-sample tests of stock return predictability in the context of data mining. J. Empirical Finance 13(2), 231–247 (2006)

    Article  Google Scholar 

  57. Rodda, S., Erothi, U.S.R.: Class imbalance problem in the network intrusion detection systems. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2685–2688. IEEE (2016)

    Google Scholar 

  58. Saia, R.: A discrete wavelet transform approach to fraud detection. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds.) NSS 2017. LNCS, vol. 10394, pp. 464–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64701-2_34

    Chapter  Google Scholar 

  59. Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49944-4_20

    Chapter  Google Scholar 

  60. Saia, R., Carta, S.: Introducing a vector space model to perform a proactive credit scoring. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) IC3K 2016. CCIS, vol. 914, pp. 125–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99701-8_6

    Chapter  Google Scholar 

  61. Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: KDIR, pp. 111–120 (2016)

    Google Scholar 

  62. Saia, R., Carta, S.: Evaluating credit card transactions in the frequency domain for a proactive fraud detection approach. In: SECRYPT, pp. 335–342. SciTePress (2017)

    Google Scholar 

  63. Saia, R., Carta, S.: A fourier spectral pattern analysis to design credit scoring models. In: Proceedings of the 1st International Conference on Internet of Things and Machine Learning, p. 18. ACM (2017)

    Google Scholar 

  64. Saia, R., Carta, S., Fenu, G.: A wavelet-based data analysis to credit scoring. In: Proceedings of the 2nd International Conference on Digital Signal Processing, pp. 176–180. ACM (2018)

    Google Scholar 

  65. Saia, R., Carta, S., Recupero, D.R.: A probabilistic-driven ensemble approach to perform event classification in intrusion detection system. In: KDIR, pp. 139–146. SciTePress (2018)

    Google Scholar 

  66. Saia, R., Carta, S., Recupero, D.R., Fenu, G., Saia, M.: A discretized enriched technique to enhance machine learning performance in credit scoring. In: KDIR, pp. 202–213. ScitePress (2019)

    Google Scholar 

  67. Saia, R., et al.: A frequency-domain-based pattern mining for credit card fraud detection. In: IoTBDS, pp. 386–391 (2017)

    Google Scholar 

  68. Sewwandi, D., Perera, K., Sandaruwan, S., Lakchani, O., Nugaliyadde, A., Thelijjagoda, S.: Linguistic features based personality recognition using social media data. In: 2017 6th National Conference on Technology and Management (NCTM), pp. 63–68, January 2017. https://doi.org/10.1109/NCTM.2017.7872829

  69. Siddiqi, N.: Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. John Wiley & Sons, Hoboken (2017)

    Book  Google Scholar 

  70. Sohn, S.Y., Kim, D.H., Yoon, J.H.: Technology credit scoring model with fuzzy logistic regression. Appl. Soft Comput. 43, 150–158 (2016)

    Article  Google Scholar 

  71. Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016). https://doi.org/10.1016/j.is.2014.10.001

    Article  Google Scholar 

  72. Sun, X., Liu, B., Cao, J., Luo, J., Shen, X.: Who am i? personality detection based on deep learning for texts. In: IEEE International Conference on Communications (ICC), pp. 1–6, May 2018

    Google Scholar 

  73. Tamadonejad, A., Abdul-Majid, M., Abdul-Rahman, A., Jusoh, M., Tabandeh, R.: Early warning systems for banking crises? political and economic stability. Jurnal Ekonomi Malaysia 50(2), 31–38 (2016)

    Google Scholar 

  74. Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2(3), 399–580 (2011)

    Google Scholar 

  75. Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int. J. Forecast. 16(2), 149–172 (2000)

    Article  Google Scholar 

  76. Tian, Y., Yong, Z., Luo, J.: A new approach for reject inference in credit scoring using kernel-free fuzzy quadratic surface support vector machines. Appl. Soft Comput. 73, 96–105 (2018)

    Article  Google Scholar 

  77. Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)

    Article  Google Scholar 

  78. Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A., Dharavath, R.: Credit scoring model based on weighted voting and cluster based feature selection. Procedia Comput. Sci. 132, 22–31 (2018)

    Article  Google Scholar 

  79. Vedala, R., Kumar, B.R.: An application of naive bayes classification for credit scoring in e-lending platform. In: International Conference on Data Science Engineering (ICDSE), pp. 81–84, July 2012. https://doi.org/10.1109/ICDSE.2012.6282321

  80. Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)

    Article  Google Scholar 

  81. Wang, C.M., Huang, Y.F.: Evolutionary-based feature selection approaches with new criteria for data mining: a case study of credit approval data. Expert Syst. Appl. 36(3), 5900–5908 (2009)

    Article  Google Scholar 

  82. Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. CRC Press, United States (2009)

    Book  Google Scholar 

  83. Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225–241 (2017)

    Article  Google Scholar 

  84. Zhang, H., He, H., Zhang, W.: Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring. Neurocomputing 316, 210–221 (2018)

    Article  Google Scholar 

  85. Zhang, X., Yang, Y., Zhou, Z.: A novel credit scoring model based on optimized random forest. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC), pp. 60–65, January 2018

    Google Scholar 

  86. Zhao, Y., Shen, Y., Huang, Y.: Dmdp: a dynamic multi-source default probability prediction framework. Data Sci. Eng. 4(1), 3–13 (2019)

    Article  Google Scholar 

  87. Zhu, B., Yang, W., Wang, H., Yuan, Y.: A hybrid deep learning model for consumer credit scoring. In: International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 205–208, May 2018. https://doi.org/10.1109/ICAIBD.2018.8396195

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Saia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carta, S., Fenu, G., Ferreira, A., Reforgiato Recupero, D., Saia, R. (2020). A Two-Step Feature Space Transforming Method to Improve Credit Scoring Performance. In: Fred, A., Salgado, A., Aveiro, D., Dietz, J., Bernardino, J., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2019. Communications in Computer and Information Science, vol 1297. Springer, Cham. https://doi.org/10.1007/978-3-030-66196-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66196-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66195-3

  • Online ISBN: 978-3-030-66196-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics