Abstract
With the explosive growth of e-commerce and the booming of e-payment, detecting online transaction fraud in real time has become increasingly important to Fintech business. To tackle this problem, we introduce the TitAnt, a transaction fraud detection system deployed in Ant Financial, one of the largest Fintech companies in the world. The system is able to predict online real-time transaction fraud in mere milliseconds. We present the problem definition, feature extraction, detection methods, implementation and deployment of the system, as well as empirical effectiveness. Extensive experiments have been conducted on large real-world transaction data to show the effectiveness and the efficiency of the proposed system.
- M. M. Ahmed and M. Abdel-Aty. Application of stochastic gradient boosting technique to enhance reliability of real-time risk assessment: use of automatic vehicle identification and remote traffic microwave sensor data. Transportation research record, 2386(1):26--34, 2013.Google Scholar
- E. Aleskerov, B. Freisleben, and B. Rao. Cardwatch: A neural network based database mining system for credit card fraud detection. In Proceedings of the IEEE/IAFE 1997 computational intelligence for financial engineering (CIFEr), pages 220--226. IEEE, 1997.Google ScholarCross Ref
- E. L. Barse, H. Kvarnstrom, and E. Jonsson. Synthesizing test data for fraud detection systems. In 19th Annual Computer Security Applications Conference, 2003. Proceedings., pages 384--394. IEEE, 2003. Google ScholarDigital Library
- G. D. Baulier, M. H. Cahill, V. K. Ferrara, and D. Lambert. Automated fraud management in transaction-based networks, Dec. 19 2000. US Patent 6,163,604.Google Scholar
- R. Bhowmik. Detecting auto insurance fraud by data mining techniques. Journal of Emerging Trends in Computing and Information Sciences, 2(4):156--162, 2011.Google Scholar
- R. J. Bolton, D. J. Hand, et al. Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control VII, pages 235--255, 2001.Google Scholar
- R. Brause, T. Langsdorf, and M. Hepp. Neural data mining for credit card fraud detection. In Proceedings 11th International Conference on Tools with Artificial Intelligence, pages 103--106. IEEE, 1999. Google ScholarDigital Library
- P. Burge and J. Shawe-Taylor. An unsupervised neural network approach to profiling the behavior of mobile phone users for use in fraud detection. Journal of parallel and distributed computing, 61(7):915--925, 2001. Google ScholarDigital Library
- S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 891--900. ACM, 2015. Google ScholarDigital Library
- P. Casas, A. D'Alconzo, G. Settanni, P. Fiadino, and F. Skopik. Poster:(semi)-supervised machine learning approaches for network security in high-dimensional network data. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 1805--1807. ACM, 2016. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems, 26(2):4, 2008. Google ScholarDigital Library
- C.-C. Chiu and C.-Y. Tsai. A web services-based collaborative scheme for credit card fraud detection. In IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE'04. 2004, pages 177--181. IEEE, 2004. Google ScholarDigital Library
- W. W. Cohen. Fast effective rule induction. In Machine Learning Proceedings 1995, pages 115--123. Elsevier, 1995. Google ScholarDigital Library
- C. Cortes, D. Pregibon, and C. Volinsky. Computational methods for dynamic graphs. Journal of Computational and Graphical Statistics, 12(4):950--970, 2003.Google ScholarCross Ref
- K. C. Cox, S. G. Eick, G. J. Wills, and R. J. Brachman. Brief application description; visual data mining: Recognizing telephone calling fraud. Data Mining and Knowledge Discovery, 1(2):225--231, 1997. Google ScholarDigital Library
- H. Dai, B. Dai, and L. Song. Discriminative embeddings of latent variable models for structured data. In International conference on machine learning, pages 2702--2711, 2016. Google ScholarDigital Library
- K. J. Ezawa and S. W. Norton. Constructing bayesian networks to predict uncollectible telecommunications accounts. IEEE Expert, 11(5):45--51, 1996. Google ScholarDigital Library
- D. P. Foster and R. A. Stine. Variable selection in data mining: Building a predictive model for bankruptcy. Journal of the American Statistical Association, 99(466):303--313, 2004.Google ScholarCross Ref
- J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.Google Scholar
- J. H. Friedman. Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367--378, 2002. Google ScholarDigital Library
- S. Ghosh and D. L. Reilly. Credit card fraud detection with a neural-network. In System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on, volume 3, pages 621--630. IEEE, 1994.Google ScholarCross Ref
- P. Goyal and E. Ferrara. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151:78--94, 2018.Google ScholarCross Ref
- W. D. Gropp, W. Gropp, E. Lusk, and A. Skjellum. Using MPI: portable parallel programming with the message-passing interface, volume 1. MIT press, 1999. Google ScholarDigital Library
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855--864. ACM, 2016. Google ScholarDigital Library
- T. Guardian. Chinese shoppers spend a record $25bn in singles day splurge. https://www.theguardian.com/world/2017/nov/12/chinese-shoppers-spend-a-record-25bn-in-singles-day-splurge/, 2018. Accessed May 24, 2018.Google Scholar
- N. S. Halvaiee and M. K. Akbari. A novel model for credit card fraud detection using artificial immune systems. Applied soft computing, 24:40--49, 2014. Google ScholarDigital Library
- D. J. Hand. Discrimination and classification. Wiley Series in Probability and Mathematical Statistics, Chichester: Wiley, 1981, 1981.Google Scholar
- S. Jha, M. Guillen, and J. C. Westland. Employing transaction aggregation strategy to detect credit card fraud. Expert systems with applications, 39(16):12650--12657, 2012. Google ScholarDigital Library
- S. Jia-jie. Electronic transaction fraud detection based on improved pso algorithm. In Proceedings of 2012 2nd International Conference on Computer Science and Network Technology, pages 2121--2125. IEEE, 2012.Google Scholar
- W. S. Journal. 5 things to know about china's ant financial. https://blogs.wsj.com/briefly/2016/04/26/5-things-to-know-about-chinas-ant-financial/, 2016. Accessed May 24, 2018.Google Scholar
- J. Kim, A. Ong, and R. E. Overill. Design of an artificial immune system as a novel anomaly detector for combating financial fraud in the retail sector. In The 2003 Congress on Evolutionary Computation, 2003. CEC'03., volume 1, pages 405--412. IEEE, 2003.Google Scholar
- S. Kotsiantis and D. Kanellopoulos. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1):47--58, 2006.Google Scholar
- M. Kuhn and K. Johnson. Applied predictive modeling, volume 26. Springer, 2013.Google Scholar
- M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. Smola. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, volume 6, page 2, 2013.Google Scholar
- F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413--422. IEEE, 2008. Google ScholarDigital Library
- S. A. Macskassy and F. Provost. A simple relational classifier. Technical report, NEW YORK UNIV NY STERN SCHOOL OF BUSINESS, 2003.Google Scholar
- S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick. Credit card fraud detection using bayesian and neural networks. In Proceedings of the 1st international naiso congress on neuro fuzzy technologies, pages 261--270, 2002.Google Scholar
- J. A. Major and D. R. Riedinger. Efd: A hybrid knowledge/statistical-based system for the detection of fraud. Journal of Risk and Insurance, 69(3):309--324, 2002.Google ScholarCross Ref
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013. Google ScholarDigital Library
- E. W. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision support systems, 50(3):559--569, 2011. Google ScholarDigital Library
- P. B. of China. The overall operation of the payment system in 2017. http://www.pcac.org.cn/Upload/image/20180306/20180306144824\_91997.pdf/, 2018. Accessed Feburay 19, 2019.Google Scholar
- J. Pathak, N. Vidyarthi, and S. L. Summers. A fuzzy-based algorithm for auditors to detect elements of fraud in settled insurance claims. Managerial Auditing Journal, 20(6):632--644, 2005.Google ScholarCross Ref
- R. Patidar, L. Sharma, et al. Credit card fraud detection using neural network. International Journal of Soft Computing and Engineering (IJSCE), 1(32--38), 2011.Google Scholar
- C. Perlich and F. Provost. Aggregation-based feature invention and relational concept classes. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 167--176. ACM, 2003. Google ScholarDigital Library
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701--710. ACM, 2014. Google ScholarDigital Library
- C. Phua, V. Lee, K. Smith, and R. Gayler. A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119, 2010.Google Scholar
- J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81--106, 1986. Google ScholarCross Ref
- J. R. Quinlan. Learning logical definitions from relations. Machine learning, 5(3):239--266, 1990. Google ScholarDigital Library
- J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.Google ScholarDigital Library
- R. Quinlan. Data mining tools see5 and c5.0. http://www.rulequest.com/see5-info.html. Accessed February 12, 2019.Google Scholar
- M. T. Review. Big data game-changer: Alibaba's double 11 event raises the bar for online sales. https://www.technologyreview.com/s/602850/big-data-game-changer-alibabas-double-11-event-raises-the-bar-for-online-sales/, 2016. Accessed May 24, 2018.Google Scholar
- S. Rosset, U. Murad, E. Neumann, Y. Idan, and G. Pinkas. Discovery of fraud rules for telecommunicationschallenges and solutions. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 409--413. ACM, 1999. Google ScholarDigital Library
- B. Sagar, P. Singh, and S. Mallika. Online transaction fraud detection techniques: A review of data mining approaches. In 2016 3rd International Conference on Computing for Sustainable Global Development, pages 3756--3761. IEEE, 2016.Google Scholar
- B. Stefano and F. Gisella. Insurance fraud evaluation: a fuzzy expert system. In 10th IEEE International Conference on Fuzzy Systems.(Cat. No. 01CH37297), volume 3, pages 1491--1494. IEEE, 2001.Google ScholarCross Ref
- M. Syeda, Y.-Q. Zhang, and Y. Pan. Parallel granular neural networks for fast credit card fraud detection. In 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No. 02CH37291), volume 1, pages 572--577. IEEE, 2002.Google ScholarCross Ref
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pages 1067--1077. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817--826. ACM, 2009. Google ScholarDigital Library
- L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447--478, 2011. Google ScholarDigital Library
- M. Vadoodparast, A. R. Hamdan, et al. Fraudulent electronic transaction detection using dynamic kda model. International Journal of Computer Science and Information Security, 13(3):90, 2015.Google Scholar
- S. Viaene, R. A. Derrig, and G. Dedene. A case study of applying boosting naive bayes to claim fraud diagnosis. IEEE Transactions on Knowledge and Data Engineering, 16(5):612--620, 2004. Google ScholarDigital Library
- C. Von Altrock. Fuzzy logic and neurofuzzy applications in business and finance. Prentice-Hall, Inc., 1996. Google ScholarDigital Library
- S. H. Walker and D. B. Duncan. Estimation of the probability of an event as a function of several independent variables. Biometrika, 54(1-2):167--179, 1967.Google ScholarCross Ref
- G. Wang and J. Ma. A hybrid ensemble approach for enterprise credit risk assessment based on support vector machine. Expert Systems with Applications, 39(5):5325--5331, 2012. Google ScholarDigital Library
- R. Wheeler and S. Aitken. Multiple algorithms for fraud detection. In Applications and Innovations in Intelligent Systems VII, pages 219--231. Springer, 2000.Google ScholarCross Ref
- C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams. Transaction aggregation as a strategy for credit card fraud detection. Data mining and knowledge discovery, 18(1):30--55, 2009. Google ScholarDigital Library
- K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8(3):275--300, 2004. Google ScholarDigital Library
- D. Zhang, J. Yin, X. Zhu, and C. Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.Google Scholar
- Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang, and J. Xu. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. PVLDB, 7(13):1393--1404, 2014. Google ScholarDigital Library
- J. Zhou, X. Li, P. Zhao, C. Chen, L. Li, X. Yang, Q. Cui, J. Yu, X. Chen, Y. Ding, et al. Kunpeng: Parameter server based distributed learning systems and its applications in alibaba and ant financial. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1693--1702. ACM, 2017. Google ScholarDigital Library
Index Terms
- TitAnt: online real-time transaction fraud detection in Ant Financial
Comments