O.R. Applications
On the communal analysis suspicion scoring for identity crime in streaming credit applications

https://doi.org/10.1016/j.ejor.2008.02.015Get rights and content

Abstract

This paper describes a rapid technique: communal analysis suspicion scoring (CASS), for generating numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs. Results on mining several hundred thousand real credit applications demonstrate that CASS reduces false alarm rates while maintaining reasonable hit rates. CASS is scalable for this large data sample, and can rapidly detect early symptoms of identity crime. In addition, new insights have been observed from the relationships between applications.

Introduction

Annually, credit bureaus collect millions of enquiries from financial institutions (subscribers) relating to credit applications. In Australia, credit card and personal loan applications have increased significantly, and currently, close to half a million credit bureau enquiries are made per month (Baycorp, 2005). Each credit application contains a large number of identity attributes such as personal names, address(es), telephone number(s), driver licence number (or social security number), date-of-birth, and other personal identifiers which are potentially available to the credit bureau (if local privacy laws permit it). Therefore, from a commercial perspective, it is uneconomical to physically validate and approve each attribute in every credit application.

Application fraud, a manifestation of identity crime, is present when application form(s) contain plausible and synthetic (identity fraud), and/or real, but stolen identity information (identity theft). In developed countries, the monetary cost of application fraud and identity crime is often estimated to be in the billions of dollars; and this is strongly correlated with the large volume of widely available personal information. By performing better once-off assessments in the first stage of the credit life cycle, credit scoring processes could be improved and some transactional fraud can be prevented.

Typical commercial techniques for the identification of such fraud involve the use of attribute verification rules using reference tables, and pair-wise matching rules between credit application and credit history data. However, the success rate of rule-based approaches can be weak when faced with increasingly common fraudster-tampered applications (Oscherwitz, 2005) which have valid attributes and no credit history. Other techniques being used include known fraud matching using blacklists (list of applications previously submitted by fraudsters) and supervised modelling/classification using labelled data. Often, these labelled data approaches alone are operationally inefficient and ineffective (Phua et al., 2005). Our work focuses on credit application data only, with no checks carried out against credit history.

As it is simulated to run in real-time, communal analysis suspicion scoring (CASS) does not take class labels into account when scoring applications. It only uses class labels to determine the effectiveness of its approach. Its purpose is to generate numeric suspicion scores on streaming credit applications based on implicit links to each other, over both time and space. CASS includes pair-wise communal scoring of identifier attributes for applications, definition of categories of suspiciousness for application-pairs, the incorporation of temporal and spatial weights, and smoothed k-wise scoring of multiple linked application-pairs.

In Section 2, this paper explores and compares semi-related literature and contrasts the research findings with this paper’s contributions. Section 3 describes a rapid technique known as CASS for scoring and generating links on incoming current/new application streams on demand and focuses at the level of each pair of linked applications (application-pair). Section 4, firstly, lays out the retrospective and stream processing experimental results on ethically-approved data and subsequently visualises and discusses the unique patterns of identity crime in credit applications. Section 5 concludes the paper.

Section snippets

Related work

There is no academic research, to the best of our knowledge (Phua et al., 2005) into the scoring of dynamic credit applications which accounts for its sparse-identifiers, communal, temporal, and spatial aspects. However, there are other related and established application fields. Section 2.1 summarises multi-attribute pair-wise matching (for example, record linkage/de-duplication detection and an interesting effort to discover cheating amongst teachers in exams by altering their students’

Communal analysis suspicion scoring (CASS) process

In this section, CASS compresses multiple identifier attributes to a single attribute vector representation of each link/non-link (Section 3.2). The approach distinguishes between three different categories of links: black, white, and anomalous, which will result in different weights and scores for every application-pair (Section 3.3). It accounts for the temporal and spatial effects by applying weights to each linked application-pair’s communal score computation (Section 3.4). Also, this

Experiments

All experiments were performed on a single Pentium IV 3.0 GHz, 2 Gb RAM workstation, running on Windows XP platform. CASS itself is written in Visual Basic and works together with C#.NET libraries, and the credit application data is stored in Microsoft Access. In this section, the ethically-approved data sample, privacy and confidentiality issues, and the attributes are explained (Section 4.1). The analysis of individual non-identity attributes highlights some general non-compliant behavioural

Conclusion

CASS is a new, low false alarm, rapid credit application fraud detection tool and technique which is complementary to those already in existence. It identifies a large number of individuals and keeps scores of the relations between them, detecting normal/abnormal and small/large similarities. The results presented here indicate substantial cost savings by investigating/rejecting only a few hundred of the most suspicious credit applications out of a few hundred thousand. The investigations,

Acknowledgements

The first author was financially supported by the Australian Research Council under Linkage Grant Number LP0454077 whilst he was in Monash University as a PhD candidate. Ethics approval has been granted by Monash SCERH under Project Number 2005/694ED. The real credit application data has been provided by Veda Advantage. Special thanks go to the developers of yEd and particular participants in the Credit Scoring and Credit Control (CSCC05) conference for useful comments.

References (18)

  • Baxter, R., Christen, P., Churches, T., 2003. A Comparison of fast blocking methods for record linkage. In: Proceedings...
  • Baycorp Advantage, 2005. Zero-Interest Credit Cards Cause Record Growth In Card...
  • M. Bilenko et al.

    Adaptive name matching in information integration

    IEEE Intelligent Systems

    (2003)
  • Chapman, S., 2005. Simmetrics – Open Source Similarity Measure Library. Accessed from:...
  • C. Cortes et al.

    Computational methods for dynamic graphs

    Journal of Computational and Graphical Statistics

    (2003)
  • Cortes, C., Pregibon, D., 1999. Information mining platforms: an infrastructure for KDD rapid deployment. In:...
  • T. Fawcett et al.

    Adaptive fraud detection

    Data Mining and Knowledge Discovery

    (1997)
  • ID Analytics, 2004. Identity 2004: The Identity Risk Management...
  • J. Kleinberg

    Authoritative sources in a hyperlinked environment

    Journal of the ACM

    (1999)
There are more references available in the full text version of this article.

Cited by (0)

View full text