Abstract
The increasing amount of credit offered by financial institutions has required intelligent and efficient methodologies of credit scoring. Therefore, the use of different machine learning solutions to that task has been growing during the past recent years. Such procedures have been used in order to identify customers who are reliable or unreliable, with the intention to counterbalance financial losses due to loans offered to wrong customer profiles. Notwithstanding, such an application of machine learning suffers with several limitations when put into practice, such as unbalanced datasets and, specially, the absence of sufficient information from the features that can be useful to discriminate reliable and unreliable loans. To overcome such drawbacks, we propose in this work a Two-Step Feature Space Transforming approach, which operates by evolving feature information in a twofold operation: (i) data enhancement; and (ii) data discretization. In the first step, additional meta-features are used in order to improve data discrimination. In the second step, the goal is to reduce the diversity of features. Experiments results performed in real-world datasets with different levels of unbalancing show that such a step can improve, in a consistent way, the performance of the best machine learning algorithm for such a task. With such results we aim to open new perspectives for novel efficient credit scoring systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abellán, J., Castellano, J.G.: A comparative study on base classifiers in ensemble methods for credit scoring. Expert Syst. Appl. 73, 1–10 (2017)
Adhikari, R.: A neural network based linear ensemble framework for time series forecasting. Neurocomputing 157, 231–242 (2015)
Ala’raj, M., Abbod, M.F.: A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Syst. Appl. 64, 36–55 (2016)
Attenberg, J., Provost, F.J.: Inactive learning?: difficulties employing active learning in practice. SIGKDD Explor. 12(2), 36–41 (2010). https://doi.org/10.1145/1964897.1964906
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. Noise Reduction in Speech Processing, pp. 1–4. Springer, Berlin (2009)
Bequé, A., Lessmann, S.: Extreme learning machines for credit scoring: an empirical evaluation. Expert Syst. Appl. 86, 42–53 (2017)
Boratto, L., Carta, S., Fenu, G., Saia, R.: Using neural word embeddings to model user behavior and detect user segments. Knowledge-Based Syst. 108, 5–14 (2016)
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12(6), e0177678 (2017)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
de Castro Vieira, J.R., Barboza, F., Sobreiro, V.A., Kimura, H.: Machine learning models for credit analysis improvements: predicting low-income families’ default. Appl. Soft Comput. 83, 105640 (2019). https://doi.org/10.1016/j.asoc.2019.105640. http://www.sciencedirect.com/science/article/pii/S156849461930420X
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geoscientific Model Dev. 7(3), 1247–1250 (2014)
Chatterjee, A., Segev, A.: Data manipulation in heterogeneous databases. ACM SIGMOD Rec. 20(4), 64–68 (1991)
Chen, B., Zeng, W., Lin, Y.: Applications of artificial intelligence technologies in credit scoring: a survey of literature. In: International Conference on Natural Computation (ICNC), pp. 658–664, August 2014
Chen, N., Ribeiro, B., Chen, A.: Financial credit risk assessment: a recent review. Artif. Intell. Rev. 45(1), 1–23 (2016)
Chopra, A., Bhilare, P.: Application of ensemble models in credit scoring models. Bus. Perspect. Res. 6(2), 129–141 (2018)
Cleary, S., Hebb, G.: An efficient and functional model for predicting bank distress: in and out of sample evidence. J. Bank. Finance 64, 101–111 (2016)
Crook, J.N., Edelman, D.B., Thomas, L.C.: Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183(3), 1447–1465 (2007)
Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)
Damrongsakmethee, T., Neagoe, V.-E.: Principal component analysis and relieff cascaded with decision tree for credit scoring. In: Silhavy, R. (ed.) CSOC 2019. AISC, vol. 985, pp. 85–95. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-19810-7_9
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Economics, T.: Euro area consumer credit (2019). https://tradingeconomics.com/euro-area/consumer-credit?continent=europe
Economics, T.: Euro area consumer spending (2019). https://tradingeconomics.com/euro-area/consumer-spending?continent=europe
Fang, F., Chen, Y.: A new approach for credit scoring by directly maximizing the kolmogorov-smirnov statistic. Comput. Stat. Data Anal. 133, 180–194 (2019)
Feng, X., Xiao, Z., Zhong, B., Qiu, J., Dong, Y.: Dynamic ensemble classification for credit scoring using soft probability. Appl. Soft Comput. 65, 139–151 (2018). https://doi.org/10.1016/j.asoc.2018.01.021. http://www.sciencedirect.com/science/article/pii/S1568494618300279
Fernández-TobÃas, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016). https://doi.org/10.1145/2959100.2959175
GarcÃa, S., RamÃrez-Gallego, S., Luengo, J., BenÃtez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Analytics 1(1), 9 (2016)
Ghodselahi, A.: A hybrid support vector machine ensemble model for credit scoring. Int. J. Comput. Appl. 17(5), 1–5 (2011)
Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Mach. Learn. 54(3), 187–193 (2004)
Guo, S., He, H., Huang, X.: A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access 7, 78549–78559 (2019)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Hashem, I.A.T., Anuar, N.B., Gani, A., Yaqoob, I., Xia, F., Khan, S.U.: Mapreduce: review and open challenges. Scientometrics 109(1), 389–422 (2016)
Hassan, M.K., Brodmann, J., Rayfield, B., Huda, M.: Modeling credit risk in credit unions using survival analysis. Int. J. Bank Mark. 36(3), 482–495 (2018)
Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251. IEEE (2013)
Khemais, Z., Nesrine, D., Mohamed, M., et al.: Credit scoring and default risk prediction: a comparative study between discriminant analysis & logistic regression. Int. J. Econ. Finance 8(4), 39 (2016)
Khemakhem, S., Ben Said, F., Boujelbene, Y.: Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines. J. Modell. Manage. 13(4), 932–951 (2018)
Laha, A.: Developing credit scoring models with SOM and fuzzy rule based k-NN classifiers. In: IEEE International Conference on Fuzzy Systems, pp. 692–698, July 2006. https://doi.org/10.1109/FUZZY.2006.1681786
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41(4), 2065–2073 (2014). https://doi.org/10.1016/j.eswa.2013.09.005
Liu, C., Huang, H., Lu, S.: Research on personal credit scoring model based on artificial intelligence. In: Sugumaran, V., Xu, Z., P., S., Zhou, H. (eds.) MMIA 2019. AISC, vol. 929, pp. 466–473. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15740-1_64
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)
López, R.F., Ramon-Jeronimo, J.M.: Modelling credit risk with scarce default data: on the suitability of cooperative bootstrapped strategies for small low-default portfolios. JORS 65(3), 416–434 (2014). https://doi.org/10.1057/jors.2013.119
López, J., Maldonado, S.: Profit-based credit scoring based on robust optimization and feature selection. Inf. Sci. 500, 190–202 (2019)
Luo, C., Wu, D., Wu, D.: A deep learning approach for credit scoring using credit default swaps. Eng. Appl. Artif. Intell. 65, 465–470 (2017)
Luque, A., Carrasco, A., MartÃn, A., de las Heras, A.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 91, 216–231 (2019)
Maldonado, S., Peters, G., Weber, R.: Credit scoring using three-way decisions with probabilistic rough sets. Inf. Sci. (2018). https://doi.org/10.1016/j.ins.2018.08.001. http://www.sciencedirect.com/science/article/pii/S0020025518306078
Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015)
Mester, L.J., et al.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)
Neagoe, V., Ciotec, A., Cucu, G.: Deep convolutional neural networks versus multilayer perceptron for financial prediction. In: International Conference on Communications (COMM), pp. 201–206, June 2018
Pasila, F.: Credit scoring modeling of Indonesian micro, small and medium enterprises using neuro-fuzzy algorithm. In: IEEE International Conference on Fuzzy Systems, pp. 1–6, June 2019. https://doi.org/10.1109/FUZZ-IEEE.2019.8858841
Powers, D.: Evaluation: from precision, recall and f-factor to roc, informedness, markedness & correlation. Mach. Learn. Technol. 2, January 2008
Rapach, D.E., Wohar, M.E.: In-sample vs. out-of-sample tests of stock return predictability in the context of data mining. J. Empirical Finance 13(2), 231–247 (2006)
Rodda, S., Erothi, U.S.R.: Class imbalance problem in the network intrusion detection systems. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 2685–2688. IEEE (2016)
Saia, R.: A discrete wavelet transform approach to fraud detection. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds.) NSS 2017. LNCS, vol. 10394, pp. 464–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64701-2_34
Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49944-4_20
Saia, R., Carta, S.: Introducing a vector space model to perform a proactive credit scoring. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) IC3K 2016. CCIS, vol. 914, pp. 125–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99701-8_6
Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: KDIR, pp. 111–120 (2016)
Saia, R., Carta, S.: Evaluating credit card transactions in the frequency domain for a proactive fraud detection approach. In: SECRYPT, pp. 335–342. SciTePress (2017)
Saia, R., Carta, S.: A fourier spectral pattern analysis to design credit scoring models. In: Proceedings of the 1st International Conference on Internet of Things and Machine Learning, p. 18. ACM (2017)
Saia, R., Carta, S., Fenu, G.: A wavelet-based data analysis to credit scoring. In: Proceedings of the 2nd International Conference on Digital Signal Processing, pp. 176–180. ACM (2018)
Saia, R., Carta, S., Recupero, D.R.: A probabilistic-driven ensemble approach to perform event classification in intrusion detection system. In: KDIR, pp. 139–146. SciTePress (2018)
Saia, R., Carta, S., Recupero, D.R., Fenu, G., Saia, M.: A discretized enriched technique to enhance machine learning performance in credit scoring. In: KDIR, pp. 202–213. ScitePress (2019)
Saia, R., et al.: A frequency-domain-based pattern mining for credit card fraud detection. In: IoTBDS, pp. 386–391 (2017)
Sewwandi, D., Perera, K., Sandaruwan, S., Lakchani, O., Nugaliyadde, A., Thelijjagoda, S.: Linguistic features based personality recognition using social media data. In: 2017 6th National Conference on Technology and Management (NCTM), pp. 63–68, January 2017. https://doi.org/10.1109/NCTM.2017.7872829
Siddiqi, N.: Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. John Wiley & Sons, Hoboken (2017)
Sohn, S.Y., Kim, D.H., Yoon, J.H.: Technology credit scoring model with fuzzy logistic regression. Appl. Soft Comput. 43, 150–158 (2016)
Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016). https://doi.org/10.1016/j.is.2014.10.001
Sun, X., Liu, B., Cao, J., Luo, J., Shen, X.: Who am i? personality detection based on deep learning for texts. In: IEEE International Conference on Communications (ICC), pp. 1–6, May 2018
Tamadonejad, A., Abdul-Majid, M., Abdul-Rahman, A., Jusoh, M., Tabandeh, R.: Early warning systems for banking crises? political and economic stability. Jurnal Ekonomi Malaysia 50(2), 31–38 (2016)
Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2(3), 399–580 (2011)
Thomas, L.C.: A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int. J. Forecast. 16(2), 149–172 (2000)
Tian, Y., Yong, Z., Luo, J.: A new approach for reject inference in credit scoring using kernel-free fuzzy quadratic surface support vector machines. Appl. Soft Comput. 73, 96–105 (2018)
Tripathi, D., Edla, D.R., Cheruku, R.: Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. J. Intell. Fuzzy Syst. 34(3), 1543–1549 (2018)
Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A., Dharavath, R.: Credit scoring model based on weighted voting and cluster based feature selection. Procedia Comput. Sci. 132, 22–31 (2018)
Vedala, R., Kumar, B.R.: An application of naive bayes classification for credit scoring in e-lending platform. In: International Conference on Data Science Engineering (ICDSE), pp. 81–84, July 2012. https://doi.org/10.1109/ICDSE.2012.6282321
Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)
Wang, C.M., Huang, Y.F.: Evolutionary-based feature selection approaches with new criteria for data mining: a case study of credit approval data. Expert Syst. Appl. 36(3), 5900–5908 (2009)
Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. CRC Press, United States (2009)
Xia, Y., Liu, C., Li, Y., Liu, N.: A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225–241 (2017)
Zhang, H., He, H., Zhang, W.: Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring. Neurocomputing 316, 210–221 (2018)
Zhang, X., Yang, Y., Zhou, Z.: A novel credit scoring model based on optimized random forest. In: IEEE Annual Computing and Communication Workshop and Conference (CCWC), pp. 60–65, January 2018
Zhao, Y., Shen, Y., Huang, Y.: Dmdp: a dynamic multi-source default probability prediction framework. Data Sci. Eng. 4(1), 3–13 (2019)
Zhu, B., Yang, W., Wang, H., Yuan, Y.: A hybrid deep learning model for consumer credit scoring. In: International Conference on Artificial Intelligence and Big Data (ICAIBD), pp. 205–208, May 2018. https://doi.org/10.1109/ICAIBD.2018.8396195
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Carta, S., Fenu, G., Ferreira, A., Reforgiato Recupero, D., Saia, R. (2020). A Two-Step Feature Space Transforming Method to Improve Credit Scoring Performance. In: Fred, A., Salgado, A., Aveiro, D., Dietz, J., Bernardino, J., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2019. Communications in Computer and Information Science, vol 1297. Springer, Cham. https://doi.org/10.1007/978-3-030-66196-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-66196-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66195-3
Online ISBN: 978-3-030-66196-0
eBook Packages: Computer ScienceComputer Science (R0)