Abstract
Intuitively, similar customers should have similar credit risk. Capturing this similarity is often attempted using Euclidean distances between customer features and predicting credit default via logistic regression. Here we explore the use of topological data analysis for describing this similarity. In particular, persistent homology algorithms provide summaries of point clouds which relate to their topology. This approach has been shown to be useful in many applications but to the best of our knowledge, applying topological data analysis to prediction of credit risk is novel. We develop a pipeline which is based on the topological analysis of neighbourhoods of customers, with the neighbourhoods determined by a geometric network construction. We find a modest signal using three data sets from the Lending Club, and the Japan Credit Screening data set. The Cleveland oncological data set is used to validate the pipeline. The results have high variance, but they indicate that including such topological features could improve credit risk prediction when used as additional explanatory variable in a logistic regression.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
Neither is now available from the Lending Club website.
- 5.
- 6.
University of California, Irvine.
References
Bernstein, A., Burnaev, E., Sharaev, M., Kondrateva, E., Kachan, O.: Topological data analysis in computer vision. In: Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, pp. 673–679. SPIE (2020)
Bubenik, P., Dłotko, P.: A persistence landscapes toolbox for topological statistics. J. Symb. Comput. 78, 91–114 (2017)
Bubenik, P., et al.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015)
Bukkuri, A., Andor, N., Darcy, I.K. Applications of topological data analysis in oncology. Front. Artif. Intell. 38 (2021)
Byrne, H.M., Harrington, H.A., Muschel, R., Reinert, G., Stolz-Pretzer, B., Tillmann, U.: Topology characterises tumour vasculature. Math. Today (2019)
Chatterjee, S., Barcun, S.: A nonparametric approach to credit screening. J. Am. Stat. Assoc. 65(329), 150–154 (1970)
Detrano, R. Heart Disease Data Set. V.A. Medical Center, Long Beach and Cleveland Clinic. UCI Machine Learning Repository (1988)
Durand, D.: Risk Elements in Consumer Instalment Financing. National Bureau of Economic Research (1941)
Gidea, M., Katz, Y.: Topological data analysis of financial time series: landscapes of crashes. Physica A 491, 820–834 (2018)
Henley, W.E.: Statistical aspects of credit scoring, Ph.D., Open University (1995)
Mitic, P.: A metric framework for quantifying data concentration. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 181–190. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_20
Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18(1), 109–131 (1980)
Otter, N., Porter, M.A., Tillmann, U., Grindrod, P., Harrington, H.A.: A roadmap for the computation of persistent homology. EPJ Data Sci. 6, 1–38 (2017)
Oudot, S.Y.: Persistence Theory: From Quiver Representations to Data Analysis, vol. 209. American Mathematical Society (2017)
Riihimäki, H., Chachólski, W., Theorell, J., Hillert, J., Ramanujam, R.: A topological data analysis based classification method for multiple measurements. BMC Bioinform. 21(1), 1–18 (2020)
Wiginton, J.: A note on the comparison of logit and discriminant models of consumer credit behavior. J. Fin. Quant. Anal. 15(3), 757–770 (1980)
Wu, C., Hargreaves, C.: Topological machine learning for mixed numeric and categorical data. Int. J. Artif. Intell. Tools 30, 1–18 (2021)
Acknowledgements
GR was supported in part by EPSRC grants EP/T018445/1 and EP/R018472/1. TT acknowledges funding from EPSRC studentship 2275810.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cooper, J., Mitic, P., Reinert, G., Temčinas, T. (2022). Topological Analysis of Credit Data: Preliminary Findings. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-21753-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21752-4
Online ISBN: 978-3-031-21753-1
eBook Packages: Computer ScienceComputer Science (R0)