research-article

Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud

Authors:
Ya-Lin Zhang

Ant Financial Services Group, China

Ant Financial Services Group, China

0000-0002-3244-1084
View Profile

,
Jun Zhou

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Wenhao Zheng

National Key Lab for Novel Software Technology, Nanjing University, China

National Key Lab for Novel Software Technology, Nanjing University, China
View Profile

,
Ji Feng

National Key Lab for Novel Software Technology, Nanjing University, China

National Key Lab for Novel Software Technology, Nanjing University, China
View Profile

,
Longfei Li

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Ziqi Liu

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Ming Li

National Key Lab for Novel Software Technology, Nanjing University, China

National Key Lab for Novel Software Technology, Nanjing University, China
View Profile

,
Zhiqiang Zhang

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Chaochao Chen

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Xiaolong Li

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Yuan (Alan) Qi

Ant Financial Services Group, China

Ant Financial Services Group, China
View Profile

,
Zhi-Hua Zhou

National Key Lab for Novel Software Technology, Nanjing University, China

National Key Lab for Novel Software Technology, Nanjing University, China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 10 Issue 5Article No.: 55pp 1–19https://doi.org/10.1145/3342241

Published:05 September 2019Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large-scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level. We tested the deep forest model on an extra-large-scale task, i.e., automatic detection of cash-out fraud, with more than 100 million training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring a significant decrease in economic loss each day.

References

2016. MIT Technology Review. Retireved from https://www.technologyreview.com/s/602850/big-data-game-changer-alibabas-double-11-event-raises-the-bar-for-online-sales/.Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022. Google ScholarDigital Library
Leo Breiman. 1996. Stacked regressions. Machine Learning 24, 1 (1996), 49--64. Google ScholarDigital Library
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarDigital Library
Christopher J. C. Burges. 2010. From Ranknet to Lambdarank to Lambdamart: An overview. Learning 11, 23--581 (2010), 81.Google Scholar
Philip K. Chan, Wei Fan, Andreas L. Prodromidis, and Salvatore J. Stolfo. 1999. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems and Their Applications 14, 6 (1999), 67--74. Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15. Google ScholarDigital Library
Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning. IEEE Transactions on Neural Networks 20, 3 (2009), 542--542. Google ScholarDigital Library
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarDigital Library
Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 42--53. Google Scholar
Manoranjan Dash and Huan Liu. 1997. Feature selection for classification. Intelligent Data Analysis 1, 3 (1997), 131--156. Google ScholarCross Ref
James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, et al. 2010. The YouTube video recommendation system. In Proceedings of the 4th ACM Conference on Recommender Systems. ACM, 293--296. Google ScholarDigital Library
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231. Google ScholarDigital Library
Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 1--2 (1997), 31--71. Google ScholarDigital Library
Harris Drucker and Corinna Cortes. 1996. Boosting decision trees. In Advances in Neural Information Processing Systems. 479--485. Google ScholarDigital Library
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874. Google ScholarDigital Library
Jerome Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics (2001), 1189--1232.Google Scholar
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 2 (2000), 337--407.Google ScholarCross Ref
Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (2006), 3--42. Google ScholarDigital Library
Aritra Ghosh, Naresh Manwani, and P. S. Sastry. 2017. On the robustness of decision tree learning under label noise. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 685--697.Google Scholar
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press, Cambridge. Google ScholarDigital Library
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela. 2014. Practical lessons from predicting clicks on ads at Facebook. In Proceedings of the 8th International Workshop on Data Mining for Online Advertising. ACM, 1--9. Google ScholarDigital Library
David W. Hosmer, Jr., Stanley Lemeshow, and Rodney X. Sturdivant. 2013. Applied Logistic Regression. Vol. 398. John Wiley 8 Sons.Google Scholar
Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429--449. Google ScholarDigital Library
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3149--3157. Google ScholarDigital Library
Igor Kononenko. 2001. Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine 23, 1 (2001), 89--109. Google ScholarDigital Library
Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 2 (2003), 181--207. Google ScholarDigital Library
Anísio Lacerda, Marco Cristo, Marcos André Gonçalves, Weiguo Fan, Nivio Ziviani, and Berthier Ribeiro-Neto. 2006. Learning to advertise. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 549--556. Google ScholarDigital Library
Christian Leistner, Amir Saffari, and Horst Bischof. 2010. MIForests: Multiple-instance learning with randomized trees. In European Conference on Computer Vision. Springer, 29--42. Google ScholarDigital Library
Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, Vol. 14. 583--598. Google ScholarDigital Library
Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. Difacto: Distributed factorization machines. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 377--386. Google ScholarDigital Library
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 413--422.Google ScholarDigital Library
Jun Liu, Jianhui Chen, and Jieping Ye. 2009. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD Iternational Conference on Knowledge Discovery and Data Mining. ACM, 547--556. Google ScholarDigital Library
Tongliang Liu and Dacheng Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 447--461. Google ScholarDigital Library
Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2009), 539--550. Google ScholarDigital Library
H. Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222--1230. Google ScholarDigital Library
Ming Pang, Kai-Ming Ting, Peng Zhao, and Zhi-Hua Zhou. 2018. Improving deep forest by confidence screening. In 2018 IEEE International Conference on Data Mining. IEEE, 1194--1199.Google ScholarCross Ref
Michael J. Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In The Adaptive Web. Springer, 325--341. Google Scholar
David Martin Powers. 2011. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation, Vol. 2. 37--63.Google Scholar
Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 263--272. Google ScholarDigital Library
Berthier A. Ribeiro-Neto, Marco Cristo, Paulo Braz Golgher, and Edleno Silva de Moura. 2005. Impedance coupling in content-targeted advertising. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 496--503. Google ScholarDigital Library
Tao Sun and Zhi-Hua Zhou. 2018. Structural diversity for decision tree ensemble learning. Frontiers of Computer Science 12, 3 (2018), 560--570. Google ScholarDigital Library
E. Ke Tang, Ponnuthurai N. Suganthan, and Xin Yao. 2006. An analysis of diversity measures. Machine Learning 65, 1 (2006), 247--271. Google ScholarDigital Library
Krishnaiyan Thulasiraman and Madisetti N. S. Swamy. 2011. Graphs: Theory and Algorithms. John Wiley 8 Sons.Google Scholar
Kai Ming Ting and Ian H. Witten. 1999. Issues in stacked generalization. Journal of Artificial Intelligence Research 10 (1999), 271--289. Google ScholarCross Ref
David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), 241--259. Google ScholarDigital Library
Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.Google ScholarCross Ref
Zhixiang Xu, Gao Huang, Kilian Q. Weinberger, and Alice X. Zheng. 2014. Gradient boosted feature selection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 522--531. Google ScholarDigital Library
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819--1837.Google ScholarCross Ref
Ya-Lin Zhang, Longfei Li, Jun Zhou, Xiaolong Li, and Zhi-Hua Zhou. 2018. Anomaly detection with partially observed anomalies. In Companion Proceedings of The Web Conference 2018. International World Wide Web Conferences Steering Committee, 639--646. Google ScholarDigital Library
Jun Zhou, Qing Cui, Xiaolong Li, Peilin Zhao, Shenquan Qu, and Jun Huang. 2017. PSMART: Parameter server based multiple additive regression trees system. In Companion Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 879--880. Google ScholarDigital Library
Jun Zhou, Xiaolong Li, Peilin Zhao, Chaochao Chen, Longfei Li, Xinxing Yang, Qing Cui, Jin Yu, Xu Chen, Yi Ding, and Yuan (Alan) Qi. 2017. KunPeng: Parameter server based distributed learning systems and its applications in Alibaba and Ant Financial. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1693--1702. Google ScholarDigital Library
Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press. Google ScholarCross Ref
Zhi-Hua Zhou. 2019. Abductive learning: Towards bridging machine learning and logical reasoning. Science China Information Sciences 62, 7 (2019), 076101.Google ScholarCross Ref
Zhi-Hua Zhou and Ji Feng. 2017. Deep forest: Towards an alternative to deep neural networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3553--3559. Google ScholarDigital Library
Zhi-Hua Zhou and Ji Feng. 2019. Deep forest. National Science Review 6, 1 (2019), 74--86.Google ScholarCross Ref
Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63--77. Google ScholarDigital Library

Index Terms

Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud
1. Computing methodologies

Recommendations

Small-Scale Data Classification Based on Deep Forest
Knowledge Science, Engineering and Management
Abstract
Developing effective and efficient small-scale data classification methods is very challenging in the digital age. Recent researches have shown that deep forest achieves a considerable increase in classification accuracy compared with general ...
Read More
Wafer map defect recognition based on deep transfer learning-based densely connected convolutional network and deep forest
Abstract
Due to the complexity and dynamics of the semiconductor manufacturing processes, wafer maps will present various defect patterns caused by various process faults. Identification of those defect patterns on wafer maps can help operators ...
Graphical abstract

Display Omitted
Highlights
- Transfer learning-based DenseNet is proposed for feature learning.
- The proposed ...
Read More
Inner Product Similarity Pruning Optimization based on Imbalanced Datasets in Deep Forest
EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

The deep forest model is a deep learning model comparable to deep learning proposed by Zhou and Feng, which has good performance in classifying small-scale datasets. The deep forest cascade part consists of multiple sets of random forests, in which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 10, Issue 5
Special Section on Advances in Causal Discovery and Inference and Regular Papers
September 2019
314 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3360733
Editor:
Yu Zheng
JD Finance, China
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 September 2019
- Accepted: 1 June 2019
- Revised: 1 April 2019
- Received: 1 February 2019
Published in tist Volume 10, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep forest
large-scale machine learning
parameter server
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 589
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Small-Scale Data Classification Based on Deep Forest

Wafer map defect recognition based on deep transfer learning-based densely connected convolutional network and deep forest

Inner Product Similarity Pruning Optimization based on Imbalanced Datasets in Deep Forest