skip to main content
research-article

TitAnt: online real-time transaction fraud detection in Ant Financial

Published:01 August 2019Publication History
Skip Abstract Section

Abstract

With the explosive growth of e-commerce and the booming of e-payment, detecting online transaction fraud in real time has become increasingly important to Fintech business. To tackle this problem, we introduce the TitAnt, a transaction fraud detection system deployed in Ant Financial, one of the largest Fintech companies in the world. The system is able to predict online real-time transaction fraud in mere milliseconds. We present the problem definition, feature extraction, detection methods, implementation and deployment of the system, as well as empirical effectiveness. Extensive experiments have been conducted on large real-world transaction data to show the effectiveness and the efficiency of the proposed system.

References

  1. M. M. Ahmed and M. Abdel-Aty. Application of stochastic gradient boosting technique to enhance reliability of real-time risk assessment: use of automatic vehicle identification and remote traffic microwave sensor data. Transportation research record, 2386(1):26--34, 2013.Google ScholarGoogle Scholar
  2. E. Aleskerov, B. Freisleben, and B. Rao. Cardwatch: A neural network based database mining system for credit card fraud detection. In Proceedings of the IEEE/IAFE 1997 computational intelligence for financial engineering (CIFEr), pages 220--226. IEEE, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  3. E. L. Barse, H. Kvarnstrom, and E. Jonsson. Synthesizing test data for fraud detection systems. In 19th Annual Computer Security Applications Conference, 2003. Proceedings., pages 384--394. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. D. Baulier, M. H. Cahill, V. K. Ferrara, and D. Lambert. Automated fraud management in transaction-based networks, Dec. 19 2000. US Patent 6,163,604.Google ScholarGoogle Scholar
  5. R. Bhowmik. Detecting auto insurance fraud by data mining techniques. Journal of Emerging Trends in Computing and Information Sciences, 2(4):156--162, 2011.Google ScholarGoogle Scholar
  6. R. J. Bolton, D. J. Hand, et al. Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control VII, pages 235--255, 2001.Google ScholarGoogle Scholar
  7. R. Brause, T. Langsdorf, and M. Hepp. Neural data mining for credit card fraud detection. In Proceedings 11th International Conference on Tools with Artificial Intelligence, pages 103--106. IEEE, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Burge and J. Shawe-Taylor. An unsupervised neural network approach to profiling the behavior of mobile phone users for use in fraud detection. Journal of parallel and distributed computing, 61(7):915--925, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 891--900. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Casas, A. D'Alconzo, G. Settanni, P. Fiadino, and F. Skopik. Poster:(semi)-supervised machine learning approaches for network security in high-dimensional network data. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 1805--1807. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems, 26(2):4, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C.-C. Chiu and C.-Y. Tsai. A web services-based collaborative scheme for credit card fraud detection. In IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE'04. 2004, pages 177--181. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. W. Cohen. Fast effective rule induction. In Machine Learning Proceedings 1995, pages 115--123. Elsevier, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Cortes, D. Pregibon, and C. Volinsky. Computational methods for dynamic graphs. Journal of Computational and Graphical Statistics, 12(4):950--970, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. K. C. Cox, S. G. Eick, G. J. Wills, and R. J. Brachman. Brief application description; visual data mining: Recognizing telephone calling fraud. Data Mining and Knowledge Discovery, 1(2):225--231, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Dai, B. Dai, and L. Song. Discriminative embeddings of latent variable models for structured data. In International conference on machine learning, pages 2702--2711, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. J. Ezawa and S. W. Norton. Constructing bayesian networks to predict uncollectible telecommunications accounts. IEEE Expert, 11(5):45--51, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. P. Foster and R. A. Stine. Variable selection in data mining: Building a predictive model for bankruptcy. Journal of the American Statistical Association, 99(466):303--313, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.Google ScholarGoogle Scholar
  20. J. H. Friedman. Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367--378, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Ghosh and D. L. Reilly. Credit card fraud detection with a neural-network. In System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on, volume 3, pages 621--630. IEEE, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. Goyal and E. Ferrara. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151:78--94, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  23. W. D. Gropp, W. Gropp, E. Lusk, and A. Skjellum. Using MPI: portable parallel programming with the message-passing interface, volume 1. MIT press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855--864. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Guardian. Chinese shoppers spend a record $25bn in singles day splurge. https://www.theguardian.com/world/2017/nov/12/chinese-shoppers-spend-a-record-25bn-in-singles-day-splurge/, 2018. Accessed May 24, 2018.Google ScholarGoogle Scholar
  26. N. S. Halvaiee and M. K. Akbari. A novel model for credit card fraud detection using artificial immune systems. Applied soft computing, 24:40--49, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. J. Hand. Discrimination and classification. Wiley Series in Probability and Mathematical Statistics, Chichester: Wiley, 1981, 1981.Google ScholarGoogle Scholar
  28. S. Jha, M. Guillen, and J. C. Westland. Employing transaction aggregation strategy to detect credit card fraud. Expert systems with applications, 39(16):12650--12657, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Jia-jie. Electronic transaction fraud detection based on improved pso algorithm. In Proceedings of 2012 2nd International Conference on Computer Science and Network Technology, pages 2121--2125. IEEE, 2012.Google ScholarGoogle Scholar
  30. W. S. Journal. 5 things to know about china's ant financial. https://blogs.wsj.com/briefly/2016/04/26/5-things-to-know-about-chinas-ant-financial/, 2016. Accessed May 24, 2018.Google ScholarGoogle Scholar
  31. J. Kim, A. Ong, and R. E. Overill. Design of an artificial immune system as a novel anomaly detector for combating financial fraud in the retail sector. In The 2003 Congress on Evolutionary Computation, 2003. CEC'03., volume 1, pages 405--412. IEEE, 2003.Google ScholarGoogle Scholar
  32. S. Kotsiantis and D. Kanellopoulos. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1):47--58, 2006.Google ScholarGoogle Scholar
  33. M. Kuhn and K. Johnson. Applied predictive modeling, volume 26. Springer, 2013.Google ScholarGoogle Scholar
  34. M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. Smola. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, volume 6, page 2, 2013.Google ScholarGoogle Scholar
  35. F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413--422. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. A. Macskassy and F. Provost. A simple relational classifier. Technical report, NEW YORK UNIV NY STERN SCHOOL OF BUSINESS, 2003.Google ScholarGoogle Scholar
  37. S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick. Credit card fraud detection using bayesian and neural networks. In Proceedings of the 1st international naiso congress on neuro fuzzy technologies, pages 261--270, 2002.Google ScholarGoogle Scholar
  38. J. A. Major and D. R. Riedinger. Efd: A hybrid knowledge/statistical-based system for the detection of fraud. Journal of Risk and Insurance, 69(3):309--324, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  39. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. W. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision support systems, 50(3):559--569, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. B. of China. The overall operation of the payment system in 2017. http://www.pcac.org.cn/Upload/image/20180306/20180306144824\_91997.pdf/, 2018. Accessed Feburay 19, 2019.Google ScholarGoogle Scholar
  42. J. Pathak, N. Vidyarthi, and S. L. Summers. A fuzzy-based algorithm for auditors to detect elements of fraud in settled insurance claims. Managerial Auditing Journal, 20(6):632--644, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  43. R. Patidar, L. Sharma, et al. Credit card fraud detection using neural network. International Journal of Soft Computing and Engineering (IJSCE), 1(32--38), 2011.Google ScholarGoogle Scholar
  44. C. Perlich and F. Provost. Aggregation-based feature invention and relational concept classes. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 167--176. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701--710. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Phua, V. Lee, K. Smith, and R. Gayler. A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119, 2010.Google ScholarGoogle Scholar
  47. J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81--106, 1986. Google ScholarGoogle ScholarCross RefCross Ref
  48. J. R. Quinlan. Learning logical definitions from relations. Machine learning, 5(3):239--266, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Quinlan. Data mining tools see5 and c5.0. http://www.rulequest.com/see5-info.html. Accessed February 12, 2019.Google ScholarGoogle Scholar
  51. M. T. Review. Big data game-changer: Alibaba's double 11 event raises the bar for online sales. https://www.technologyreview.com/s/602850/big-data-game-changer-alibabas-double-11-event-raises-the-bar-for-online-sales/, 2016. Accessed May 24, 2018.Google ScholarGoogle Scholar
  52. S. Rosset, U. Murad, E. Neumann, Y. Idan, and G. Pinkas. Discovery of fraud rules for telecommunicationschallenges and solutions. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 409--413. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. B. Sagar, P. Singh, and S. Mallika. Online transaction fraud detection techniques: A review of data mining approaches. In 2016 3rd International Conference on Computing for Sustainable Global Development, pages 3756--3761. IEEE, 2016.Google ScholarGoogle Scholar
  54. B. Stefano and F. Gisella. Insurance fraud evaluation: a fuzzy expert system. In 10th IEEE International Conference on Fuzzy Systems.(Cat. No. 01CH37297), volume 3, pages 1491--1494. IEEE, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  55. M. Syeda, Y.-Q. Zhang, and Y. Pan. Parallel granular neural networks for fast credit card fraud detection. In 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No. 02CH37291), volume 1, pages 572--577. IEEE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  56. J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pages 1067--1077. International World Wide Web Conferences Steering Committee, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447--478, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. M. Vadoodparast, A. R. Hamdan, et al. Fraudulent electronic transaction detection using dynamic kda model. International Journal of Computer Science and Information Security, 13(3):90, 2015.Google ScholarGoogle Scholar
  60. S. Viaene, R. A. Derrig, and G. Dedene. A case study of applying boosting naive bayes to claim fraud diagnosis. IEEE Transactions on Knowledge and Data Engineering, 16(5):612--620, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. C. Von Altrock. Fuzzy logic and neurofuzzy applications in business and finance. Prentice-Hall, Inc., 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. S. H. Walker and D. B. Duncan. Estimation of the probability of an event as a function of several independent variables. Biometrika, 54(1-2):167--179, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  63. G. Wang and J. Ma. A hybrid ensemble approach for enterprise credit risk assessment based on support vector machine. Expert Systems with Applications, 39(5):5325--5331, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. R. Wheeler and S. Aitken. Multiple algorithms for fraud detection. In Applications and Innovations in Intelligent Systems VII, pages 219--231. Springer, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  65. C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams. Transaction aggregation as a strategy for credit card fraud detection. Data mining and knowledge discovery, 18(1):30--55, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8(3):275--300, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. D. Zhang, J. Yin, X. Zhu, and C. Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.Google ScholarGoogle Scholar
  68. Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang, and J. Xu. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. PVLDB, 7(13):1393--1404, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. J. Zhou, X. Li, P. Zhao, C. Chen, L. Li, X. Yang, Q. Cui, J. Yu, X. Chen, Y. Ding, et al. Kunpeng: Parameter server based distributed learning systems and its applications in alibaba and ant financial. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1693--1702. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TitAnt: online real-time transaction fraud detection in Ant Financial
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 12, Issue 12
      August 2019
      547 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 August 2019
      Published in pvldb Volume 12, Issue 12

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader