Skip to main content

Topological Analysis of Credit Data: Preliminary Findings

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13756))

Abstract

Intuitively, similar customers should have similar credit risk. Capturing this similarity is often attempted using Euclidean distances between customer features and predicting credit default via logistic regression. Here we explore the use of topological data analysis for describing this similarity. In particular, persistent homology algorithms provide summaries of point clouds which relate to their topology. This approach has been shown to be useful in many applications but to the best of our knowledge, applying topological data analysis to prediction of credit risk is novel. We develop a pipeline which is based on the topological analysis of neighbourhoods of customers, with the neighbourhoods determined by a geometric network construction. We find a modest signal using three data sets from the Lending Club, and the Japan Credit Screening data set. The Cleveland oncological data set is used to validate the pipeline. The results have high variance, but they indicate that including such topological features could improve credit risk prediction when used as additional explanatory variable in a logistic regression.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/Japanese+Credit+Screening.

  2. 2.

    https://archive.ics.uci.edu/ml/datasets/heart+disease.

  3. 3.

    https://www.fico.com/.

  4. 4.

    Neither is now available from the Lending Club website.

  5. 5.

    https://archive.ics.uci.edu/ml/datasets/Japanese+Credit+Screening.

  6. 6.

    University of California, Irvine.

References

  1. Bernstein, A., Burnaev, E., Sharaev, M., Kondrateva, E., Kachan, O.: Topological data analysis in computer vision. In: Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, pp. 673–679. SPIE (2020)

    Google Scholar 

  2. Bubenik, P., Dłotko, P.: A persistence landscapes toolbox for topological statistics. J. Symb. Comput. 78, 91–114 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bubenik, P., et al.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015)

    MathSciNet  MATH  Google Scholar 

  4. Bukkuri, A., Andor, N., Darcy, I.K. Applications of topological data analysis in oncology. Front. Artif. Intell. 38 (2021)

    Google Scholar 

  5. Byrne, H.M., Harrington, H.A., Muschel, R., Reinert, G., Stolz-Pretzer, B., Tillmann, U.: Topology characterises tumour vasculature. Math. Today (2019)

    Google Scholar 

  6. Chatterjee, S., Barcun, S.: A nonparametric approach to credit screening. J. Am. Stat. Assoc. 65(329), 150–154 (1970)

    Article  Google Scholar 

  7. Detrano, R. Heart Disease Data Set. V.A. Medical Center, Long Beach and Cleveland Clinic. UCI Machine Learning Repository (1988)

    Google Scholar 

  8. Durand, D.: Risk Elements in Consumer Instalment Financing. National Bureau of Economic Research (1941)

    Google Scholar 

  9. Gidea, M., Katz, Y.: Topological data analysis of financial time series: landscapes of crashes. Physica A 491, 820–834 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  10. Henley, W.E.: Statistical aspects of credit scoring, Ph.D., Open University (1995)

    Google Scholar 

  11. Mitic, P.: A metric framework for quantifying data concentration. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11872, pp. 181–190. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33617-2_20

    Chapter  Google Scholar 

  12. Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18(1), 109–131 (1980)

    Article  MathSciNet  Google Scholar 

  13. Otter, N., Porter, M.A., Tillmann, U., Grindrod, P., Harrington, H.A.: A roadmap for the computation of persistent homology. EPJ Data Sci. 6, 1–38 (2017)

    Article  Google Scholar 

  14. Oudot, S.Y.: Persistence Theory: From Quiver Representations to Data Analysis, vol. 209. American Mathematical Society (2017)

    Google Scholar 

  15. Riihimäki, H., Chachólski, W., Theorell, J., Hillert, J., Ramanujam, R.: A topological data analysis based classification method for multiple measurements. BMC Bioinform. 21(1), 1–18 (2020)

    Article  Google Scholar 

  16. Wiginton, J.: A note on the comparison of logit and discriminant models of consumer credit behavior. J. Fin. Quant. Anal. 15(3), 757–770 (1980)

    Article  Google Scholar 

  17. Wu, C., Hargreaves, C.: Topological machine learning for mixed numeric and categorical data. Int. J. Artif. Intell. Tools 30, 1–18 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

GR was supported in part by EPSRC grants EP/T018445/1 and EP/R018472/1. TT acknowledges funding from EPSRC studentship 2275810.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Mitic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cooper, J., Mitic, P., Reinert, G., Temčinas, T. (2022). Topological Analysis of Credit Data: Preliminary Findings. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21753-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21752-4

  • Online ISBN: 978-3-031-21753-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics