A Two-Step Feature Space Transforming Method to Improve Credit Scoring Performance

Carta, Salvatore; Fenu, Gianni; Ferreira, Anselmo; Reforgiato Recupero, Diego; Saia, Roberto

doi:10.1007/978-3-030-66196-0_7

Salvatore Carta¹¹,
Gianni Fenu¹¹,
Anselmo Ferreira¹¹,
Diego Reforgiato Recupero¹¹ &
…
Roberto Saia¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1297))

Included in the following conference series:

International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management

393 Accesses
1 Citations

Abstract

The increasing amount of credit offered by financial institutions has required intelligent and efficient methodologies of credit scoring. Therefore, the use of different machine learning solutions to that task has been growing during the past recent years. Such procedures have been used in order to identify customers who are reliable or unreliable, with the intention to counterbalance financial losses due to loans offered to wrong customer profiles. Notwithstanding, such an application of machine learning suffers with several limitations when put into practice, such as unbalanced datasets and, specially, the absence of sufficient information from the features that can be useful to discriminate reliable and unreliable loans. To overcome such drawbacks, we propose in this work a Two-Step Feature Space Transforming approach, which operates by evolving feature information in a twofold operation: (i) data enhancement; and (ii) data discretization. In the first step, additional meta-features are used in order to improve data discrimination. In the second step, the goal is to reduce the diversity of features. Experiments results performed in real-world datasets with different levels of unbalancing show that such a step can improve, in a consistent way, the performance of the best machine learning algorithm for such a task. With such results we aim to open new perspectives for novel efficient credit scoring systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abellán, J., Castellano, J.G.: A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 73, 1–10 (2017)
Article Google Scholar
Adhikari, R.: A neural network based linear ensemble framework for time series forecasting. Neurocomputing 157, 231–242 (2015)
Article Google Scholar
Ala’raj, M., Abbod, M.F.: A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Syst. Appl. 64, 36–55 (2016)
Article Google Scholar
Attenberg, J., Provost, F.J.: Inactive learning?: difficulties employing active learning in practice. SIGKDD Explor. 12(2), 36–41 (2010). https://doi.org/10.1145/1964897.1964906
Article Google Scholar
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. Noise Reduction in Speech Processing, pp. 1–4. Springer, Berlin (2009)
Google Scholar
Bequé, A., Lessmann, S.: Extreme learning machines for credit scoring: an empirical evaluation. Expert Syst. Appl. 86, 42–53 (2017)
Article Google Scholar
Boratto, L., Carta, S., Fenu, G., Saia, R.: Using neural word embeddings to model user behavior and detect user segments. Knowledge-Based Syst. 108, 5–14 (2016)
Article Google Scholar
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12(6), e0177678 (2017)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
de Castro Vieira, J.R., Barboza, F., Sobreiro, V.A., Kimura, H.: Machine learning models for credit analysis improvements: predicting low-income families’ default. Appl. Soft Comput. 83, 105640 (2019). https://doi.org/10.1016/j.asoc.2019.105640. http://www.sciencedirect.com/science/article/pii/S156849461930420X
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geoscientific Model Dev. 7(3), 1247–1250 (2014)
Article Google Scholar
Chatterjee, A., Segev, A.: Data manipulation in heterogeneous databases. ACM SIGMOD Rec. 20(4), 64–68 (1991)
Article Google Scholar
Chen, B., Zeng, W., Lin, Y.: Applications of artificial intelligence technologies in credit scoring: a survey of literature. In: International Conference on Natural Computation (ICNC), pp. 658–664, August 2014
Google Scholar
Chen, N., Ribeiro, B., Chen, A.: Financial credit risk assessment: a recent review. Artif. Intell. Rev. 45(1), 1–23 (2016)
Article Google Scholar
Chopra, A., Bhilare, P.: Application of ensemble models in credit scoring models. Bus. Perspect. Res. 6(2), 129–141 (2018)
Article Google Scholar
Cleary, S., Hebb, G.: An efficient and functional model for predicting bank distress: in and out of sample evidence. J. Bank. Finance 64, 101–111 (2016)
Article Google Scholar
Crook, J.N., Edelman, D.B., Thomas, L.C.: Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183(3), 1447–1465 (2007)
Article MathSciNet MATH Google Scholar
Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)
Article Google Scholar
Damrongsakmethee, T., Neagoe, V.-E.: Principal component analysis and relieff cascaded with decision tree for credit scoring. In: Silhavy, R. (ed.) CSOC 2019. AISC, vol. 985, pp. 85–95. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19810-7_9
Chapter Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Article Google Scholar
Economics, T.: Euro area consumer credit (2019). https://tradingeconomics.com/euro-area/consumer-credit?continent=europe
Economics, T.: Euro area consumer spending (2019). https://tradingeconomics.com/euro-area/consumer-spending?continent=europe
Fang, F., Chen, Y.: A new approach for credit scoring by directly maximizing the kolmogorov-smirnov statistic. Comput. Stat. Data Anal. 133, 180–194 (2019)
Article MathSciNet MATH Google Scholar
Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. 65, 139–151 (2018). https://doi.org/10.1016/j.asoc.2018.01.021. http://www.sciencedirect.com/science/article/pii/S1568494618300279
Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016). https://doi.org/10.1145/2959100.2959175
García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Analytics 1(1), 9 (2016)
Article Google Scholar
Ghodselahi, A.: A hybrid support vector machine ensemble model for credit scoring. Int. J. Comput. Appl. 17(5), 1–5 (2011)
Google Scholar
Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Mach. Learn. 54(3), 187–193 (2004)
Article Google Scholar
Guo, S., He, H., Huang, X.: A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 7, 78549–78559 (2019)
Article Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Hashem, I.A.T., Anuar, N.B., Gani, A., Yaqoob, I., Xia, F., Khan, S.U.: Mapreduce: review and open challenges. Scientometrics 109(1), 389–422 (2016)
Article Google Scholar
Hassan, M.K., Brodmann, J., Rayfield, B., Huda, M.: Modeling credit risk in credit unions using survival analysis. Int. J. Bank Mark. 36(3), 482–495 (2018)
Article Google Scholar
Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
Article MathSciNet Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
Article Google Scholar
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Article MATH Google Scholar
Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251. IEEE (2013)
Google Scholar
Khemais, Z., Nesrine, D., Mohamed, M., et al.: Credit scoring and default risk prediction: a comparative study between discriminant analysis & logistic regression. Int. J. Econ. Finance 8(4), 39 (2016)
Article Google Scholar
Khemakhem, S., Ben Said, F., Boujelbene, Y.: Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines. J. Modell. Manage. 13(4), 932–951 (2018)
Article Google Scholar
Laha, A.: Developing credit scoring models with SOM and fuzzy rule based k-NN classifiers. In: IEEE International Conference on Fuzzy Systems, pp. 692–698, July 2006. https://doi.org/10.1109/FUZZY.2006.1681786
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Article MATH Google Scholar
Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41(4), 2065–2073 (2014). https://doi.org/10.1016/j.eswa.2013.09.005
Article Google Scholar
Liu, C., Huang, H., Lu, S.: Research on personal credit scoring model based on artificial intelligence. In: Sugumaran, V., Xu, Z., P., S., Zhou, H. (eds.) MMIA 2019. AISC, vol. 929, pp. 466–473. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15740-1_64
Chapter Google Scholar
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
López, R.F., Ramon-Jeronimo, J.M.: Modelling credit risk with scarce default data: on the suitability of cooperative bootstrapped strategies for small low-default portfolios. JORS 65(3), 416–434 (2014). https://doi.org/10.1057/jors.2013.119
Article Google Scholar
López, J., Maldonado, S.: Profit-based credit scoring based on robust optimization and feature selection. Inf. Sci. 500, 190–202 (2019)
Article MathSciNet Google Scholar
Luo, C., Wu, D., Wu, D.: A deep learning approach for credit scoring using credit default swaps. Eng. Appl. Artif. Intell. 65, 465–470 (2017)
Article Google Scholar
Luque, A., Carrasco, A., Martín, A., de las Heras, A.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 91, 216–231 (2019)
Article Google Scholar
Maldonado, S., Peters, G., Weber, R.: Credit scoring using three-way decisions with probabilistic rough sets. Inf. Sci. (2018). https://doi.org/10.1016/j.ins.2018.08.001. http://www.sciencedirect.com/science/article/pii/S0020025518306078
Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015)
Article Google Scholar
Mester, L.J., et al.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)
Google Scholar
Neagoe, V., Ciotec, A., Cucu, G.: Deep convolutional neural networks versus multilayer perceptron for financial prediction. In: International Conference on Communications (COMM), pp. 201–206, June 2018
Google Scholar
Pasila, F.: Credit scoring modeling of Indonesian micro, small and medium enterprises using neuro-fuzzy algorithm. In: IEEE International Conference on Fuzzy Systems, pp. 1–6, June 2019. https://doi.org/10.1109/FUZZ-IEEE.2019.8858841
Powers, D.: Evaluation: from precision, recall and f-factor to roc, informedness, markedness & correlation. Mach. Learn. Technol. 2, January 2008
Google Scholar
Rapach, D.E., Wohar, M.E.: In-sample vs. out-of-sample tests of stock return predictability in the context of data mining. J. Empirical Finance 13(2), 231–247 (2006)
Article Google Scholar
Rodda, S., Erothi, U.S.R.: Class imbalance problem in the network intrusion detection systems. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2685–2688. IEEE (2016)
Google Scholar
Saia, R.: A discrete wavelet transform approach to fraud detection. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds.) NSS 2017. LNCS, vol. 10394, pp. 464–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64701-2_34
Chapter Google Scholar
Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49944-4_20
Chapter Google Scholar
Saia, R., Carta, S.: Introducing a vector space model to perform a proactive credit scoring. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) IC3K 2016. CCIS, vol. 914, pp. 125–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99701-8_6
Chapter Google Scholar
Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: KDIR, pp. 111–120 (2016)
Google Scholar
Saia, R., Carta, S.: Evaluating credit card transactions in the frequency domain for a proactive fraud detection approach. In: SECRYPT, pp. 335–342. SciTePress (2017)
Google Scholar
Saia, R., Carta, S.: A fourier spectral pattern analysis to design credit scoring models. In: Proceedings of the 1st International Conference on Internet of Things and Machine Learning, p. 18. ACM (2017)
Google Scholar
Saia, R., Carta, S., Fenu, G.: A wavelet-based data analysis to credit scoring. In: Proceedings of the 2nd International Conference on Digital Signal Processing, pp. 176–180. ACM (2018)
Google Scholar
Saia, R., Carta, S., Recupero, D.R.: A probabilistic-driven ensemble approach to perform event classification in intrusion detection system. In: KDIR, pp. 139–146. SciTePress (2018)
Google Scholar
Saia, R., Carta, S., Recupero, D.R., Fenu, G., Saia, M.: A discretized enriched technique to enhance machine learning performance in credit scoring. In: KDIR, pp. 202–213. ScitePress (2019)
Google Scholar
Saia, R., et al.: A frequency-domain-based pattern mining for credit card fraud detection. In: IoTBDS, pp. 386–391 (2017)
Google Scholar
Sewwandi, D., Perera, K., Sandaruwan, S., Lakchani, O., Nugaliyadde, A., Thelijjagoda, S.: Linguistic features based personality recognition using social media data. In: 2017 6th National Conference on Technology and Management (NCTM), pp. 63–68, January 2017. https://doi.org/10.1109/NCTM.2017.7872829
Siddiqi, N.: Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. John Wiley & Sons, Hoboken (2017)
Book Google Scholar
Sohn, S.Y., Kim, D.H., Yoon, J.H.: Technology credit scoring model with fuzzy logistic regression. Appl. Soft Comput. 43, 150–158 (2016)
Article Google Scholar
Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016). https://doi.org/10.1016/j.is.2014.10.001
Article Google Scholar
Sun, X., Liu, B., Cao, J., Luo, J., Shen, X.: Who am i? personality detection based on deep learning for texts. In: IEEE International Conference on Communications (ICC), pp. 1–6, May 2018
Google Scholar
Tamadonejad, A., Abdul-Majid, M., Abdul-Rahman, A., Jusoh, M., Tabandeh, R.: Early warning systems for banking crises? political and economic stability. Jurnal Ekonomi Malaysia 50(2), 31–38 (2016)
Google Scholar
Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2(3), 399–580 (2011)
Google Scholar
Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int. J. Forecast. 16(2), 149–172 (2000)
Article Google Scholar
Tian, Y., Yong, Z., Luo, J.: A new approach for reject inference in credit scoring using kernel-free fuzzy quadratic surface support vector machines. Appl. Soft Comput. 73, 96–105 (2018)
Article Google Scholar
Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)
Article Google Scholar
Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A., Dharavath, R.: Credit scoring model based on weighted voting and cluster based feature selection. Procedia Comput. Sci. 132, 22–31 (2018)
Article Google Scholar
Vedala, R., Kumar, B.R.: An application of naive bayes classification for credit scoring in e-lending platform. In: International Conference on Data Science Engineering (ICDSE), pp. 81–84, July 2012. https://doi.org/10.1109/ICDSE.2012.6282321
Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)
Article Google Scholar
Wang, C.M., Huang, Y.F.: Evolutionary-based feature selection approaches with new criteria for data mining: a case study of credit approval data. Expert Syst. Appl. 36(3), 5900–5908 (2009)
Article Google Scholar
Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. CRC Press, United States (2009)
Book Google Scholar
Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225–241 (2017)
Article Google Scholar
Zhang, H., He, H., Zhang, W.: Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring. Neurocomputing 316, 210–221 (2018)
Article Google Scholar
Zhang, X., Yang, Y., Zhou, Z.: A novel credit scoring model based on optimized random forest. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC), pp. 60–65, January 2018
Google Scholar
Zhao, Y., Shen, Y., Huang, Y.: Dmdp: a dynamic multi-source default probability prediction framework. Data Sci. Eng. 4(1), 3–13 (2019)
Article Google Scholar
Zhu, B., Yang, W., Wang, H., Yuan, Y.: A hybrid deep learning model for consumer credit scoring. In: International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 205–208, May 2018. https://doi.org/10.1109/ICAIBD.2018.8396195

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Salvatore Carta, Gianni Fenu, Anselmo Ferreira, Diego Reforgiato Recupero & Roberto Saia

Authors

Salvatore Carta
View author publications
You can also search for this author in PubMed Google Scholar
Gianni Fenu
View author publications
You can also search for this author in PubMed Google Scholar
Anselmo Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Diego Reforgiato Recupero
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Saia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto Saia .

Editor information

Editors and Affiliations

Instituto de Telecomunicações, Lisbon, Portugal
Ana Fred
Federal University of Pernambuco, Recife, Brazil
Ana Salgado
University of Madeira, Funchal, Portugal
David Aveiro
Delft University of Technology, Delft, The Netherlands
Jan Dietz
Polytechnic Institute of Coimbra, Coimbra, Portugal
Jorge Bernardino
Polytechnic Institute of Setúbal, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carta, S., Fenu, G., Ferreira, A., Reforgiato Recupero, D., Saia, R. (2020). A Two-Step Feature Space Transforming Method to Improve Credit Scoring Performance. In: Fred, A., Salgado, A., Aveiro, D., Dietz, J., Bernardino, J., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2019. Communications in Computer and Information Science, vol 1297. Springer, Cham. https://doi.org/10.1007/978-3-030-66196-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-66196-0_7
Published: 14 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66195-3
Online ISBN: 978-3-030-66196-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics