skip to main content
research-article

Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud

Authors Info & Claims
Published:05 September 2019Publication History
Skip Abstract Section

Abstract

Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large-scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level. We tested the deep forest model on an extra-large-scale task, i.e., automatic detection of cash-out fraud, with more than 100 million training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring a significant decrease in economic loss each day.

References

  1. 2016. MIT Technology Review. Retireved from https://www.technologyreview.com/s/602850/big-data-game-changer-alibabas-double-11-event-raises-the-bar-for-online-sales/.Google ScholarGoogle Scholar
  2. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leo Breiman. 1996. Stacked regressions. Machine Learning 24, 1 (1996), 49--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christopher J. C. Burges. 2010. From Ranknet to Lambdarank to Lambdamart: An overview. Learning 11, 23--581 (2010), 81.Google ScholarGoogle Scholar
  6. Philip K. Chan, Wei Fan, Andreas L. Prodromidis, and Salvatore J. Stolfo. 1999. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems and Their Applications 14, 6 (1999), 67--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning. IEEE Transactions on Neural Networks 20, 3 (2009), 542--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 42--53. Google ScholarGoogle Scholar
  11. Manoranjan Dash and Huan Liu. 1997. Feature selection for classification. Intelligent Data Analysis 1, 3 (1997), 131--156. Google ScholarGoogle ScholarCross RefCross Ref
  12. James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, et al. 2010. The YouTube video recommendation system. In Proceedings of the 4th ACM Conference on Recommender Systems. ACM, 293--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 1--2 (1997), 31--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Harris Drucker and Corinna Cortes. 1996. Boosting decision trees. In Advances in Neural Information Processing Systems. 479--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jerome Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  18. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 2 (2000), 337--407.Google ScholarGoogle ScholarCross RefCross Ref
  19. Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (2006), 3--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Aritra Ghosh, Naresh Manwani, and P. S. Sastry. 2017. On the robustness of decision tree learning under label noise. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 685--697.Google ScholarGoogle Scholar
  21. Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press, Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela. 2014. Practical lessons from predicting clicks on ads at Facebook. In Proceedings of the 8th International Workshop on Data Mining for Online Advertising. ACM, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David W. Hosmer, Jr., Stanley Lemeshow, and Rodney X. Sturdivant. 2013. Applied Logistic Regression. Vol. 398. John Wiley 8 Sons.Google ScholarGoogle Scholar
  24. Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3149--3157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Igor Kononenko. 2001. Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine 23, 1 (2001), 89--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 2 (2003), 181--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Anísio Lacerda, Marco Cristo, Marcos André Gonçalves, Weiguo Fan, Nivio Ziviani, and Berthier Ribeiro-Neto. 2006. Learning to advertise. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 549--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christian Leistner, Amir Saffari, and Horst Bischof. 2010. MIForests: Multiple-instance learning with randomized trees. In European Conference on Computer Vision. Springer, 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, Vol. 14. 583--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. Difacto: Distributed factorization machines. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 377--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 413--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jun Liu, Jianhui Chen, and Jieping Ye. 2009. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD Iternational Conference on Knowledge Discovery and Data Mining. ACM, 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tongliang Liu and Dacheng Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 447--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2009), 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ming Pang, Kai-Ming Ting, Peng Zhao, and Zhi-Hua Zhou. 2018. Improving deep forest by confidence screening. In 2018 IEEE International Conference on Data Mining. IEEE, 1194--1199.Google ScholarGoogle ScholarCross RefCross Ref
  38. Michael J. Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In The Adaptive Web. Springer, 325--341. Google ScholarGoogle Scholar
  39. David Martin Powers. 2011. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation, Vol. 2. 37--63.Google ScholarGoogle Scholar
  40. Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Berthier A. Ribeiro-Neto, Marco Cristo, Paulo Braz Golgher, and Edleno Silva de Moura. 2005. Impedance coupling in content-targeted advertising. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 496--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tao Sun and Zhi-Hua Zhou. 2018. Structural diversity for decision tree ensemble learning. Frontiers of Computer Science 12, 3 (2018), 560--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. Ke Tang, Ponnuthurai N. Suganthan, and Xin Yao. 2006. An analysis of diversity measures. Machine Learning 65, 1 (2006), 247--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Krishnaiyan Thulasiraman and Madisetti N. S. Swamy. 2011. Graphs: Theory and Algorithms. John Wiley 8 Sons.Google ScholarGoogle Scholar
  45. Kai Ming Ting and Ian H. Witten. 1999. Issues in stacked generalization. Journal of Artificial Intelligence Research 10 (1999), 271--289. Google ScholarGoogle ScholarCross RefCross Ref
  46. David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), 241--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  48. Zhixiang Xu, Gao Huang, Kilian Q. Weinberger, and Alice X. Zheng. 2014. Gradient boosted feature selection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 522--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819--1837.Google ScholarGoogle ScholarCross RefCross Ref
  50. Ya-Lin Zhang, Longfei Li, Jun Zhou, Xiaolong Li, and Zhi-Hua Zhou. 2018. Anomaly detection with partially observed anomalies. In Companion Proceedings of The Web Conference 2018. International World Wide Web Conferences Steering Committee, 639--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jun Zhou, Qing Cui, Xiaolong Li, Peilin Zhao, Shenquan Qu, and Jun Huang. 2017. PSMART: Parameter server based multiple additive regression trees system. In Companion Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 879--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jun Zhou, Xiaolong Li, Peilin Zhao, Chaochao Chen, Longfei Li, Xinxing Yang, Qing Cui, Jin Yu, Xu Chen, Yi Ding, and Yuan (Alan) Qi. 2017. KunPeng: Parameter server based distributed learning systems and its applications in Alibaba and Ant Financial. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1693--1702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press. Google ScholarGoogle ScholarCross RefCross Ref
  54. Zhi-Hua Zhou. 2019. Abductive learning: Towards bridging machine learning and logical reasoning. Science China Information Sciences 62, 7 (2019), 076101.Google ScholarGoogle ScholarCross RefCross Ref
  55. Zhi-Hua Zhou and Ji Feng. 2017. Deep forest: Towards an alternative to deep neural networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3553--3559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhi-Hua Zhou and Ji Feng. 2019. Deep forest. National Science Review 6, 1 (2019), 74--86.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63--77. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 10, Issue 5
          Special Section on Advances in Causal Discovery and Inference and Regular Papers
          September 2019
          314 pages
          ISSN:2157-6904
          EISSN:2157-6912
          DOI:10.1145/3360733
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 September 2019
          • Accepted: 1 June 2019
          • Revised: 1 April 2019
          • Received: 1 February 2019
          Published in tist Volume 10, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format