Abstract
Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large-scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level. We tested the deep forest model on an extra-large-scale task, i.e., automatic detection of cash-out fraud, with more than 100 million training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring a significant decrease in economic loss each day.
- 2016. MIT Technology Review. Retireved from https://www.technologyreview.com/s/602850/big-data-game-changer-alibabas-double-11-event-raises-the-bar-for-online-sales/.Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022. Google ScholarDigital Library
- Leo Breiman. 1996. Stacked regressions. Machine Learning 24, 1 (1996), 49--64. Google ScholarDigital Library
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarDigital Library
- Christopher J. C. Burges. 2010. From Ranknet to Lambdarank to Lambdamart: An overview. Learning 11, 23--581 (2010), 81.Google Scholar
- Philip K. Chan, Wei Fan, Andreas L. Prodromidis, and Salvatore J. Stolfo. 1999. Distributed data mining in credit card fraud detection. IEEE Intelligent Systems and Their Applications 14, 6 (1999), 67--74. Google ScholarDigital Library
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15. Google ScholarDigital Library
- Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. 2009. Semi-supervised learning. IEEE Transactions on Neural Networks 20, 3 (2009), 542--542. Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785--794. Google ScholarDigital Library
- Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 42--53. Google Scholar
- Manoranjan Dash and Huan Liu. 1997. Feature selection for classification. Intelligent Data Analysis 1, 3 (1997), 131--156. Google ScholarCross Ref
- James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, et al. 2010. The YouTube video recommendation system. In Proceedings of the 4th ACM Conference on Recommender Systems. ACM, 293--296. Google ScholarDigital Library
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231. Google ScholarDigital Library
- Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 1--2 (1997), 31--71. Google ScholarDigital Library
- Harris Drucker and Corinna Cortes. 1996. Boosting decision trees. In Advances in Neural Information Processing Systems. 479--485. Google ScholarDigital Library
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874. Google ScholarDigital Library
- Jerome Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics (2001), 1189--1232.Google Scholar
- Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2000. Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 2 (2000), 337--407.Google ScholarCross Ref
- Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (2006), 3--42. Google ScholarDigital Library
- Aritra Ghosh, Naresh Manwani, and P. S. Sastry. 2017. On the robustness of decision tree learning under label noise. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 685--697.Google Scholar
- Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press, Cambridge. Google ScholarDigital Library
- Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela. 2014. Practical lessons from predicting clicks on ads at Facebook. In Proceedings of the 8th International Workshop on Data Mining for Online Advertising. ACM, 1--9. Google ScholarDigital Library
- David W. Hosmer, Jr., Stanley Lemeshow, and Rodney X. Sturdivant. 2013. Applied Logistic Regression. Vol. 398. John Wiley 8 Sons.Google Scholar
- Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5 (2002), 429--449. Google ScholarDigital Library
- Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3149--3157. Google ScholarDigital Library
- Igor Kononenko. 2001. Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine 23, 1 (2001), 89--109. Google ScholarDigital Library
- Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 2 (2003), 181--207. Google ScholarDigital Library
- Anísio Lacerda, Marco Cristo, Marcos André Gonçalves, Weiguo Fan, Nivio Ziviani, and Berthier Ribeiro-Neto. 2006. Learning to advertise. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 549--556. Google ScholarDigital Library
- Christian Leistner, Amir Saffari, and Horst Bischof. 2010. MIForests: Multiple-instance learning with randomized trees. In European Conference on Computer Vision. Springer, 29--42. Google ScholarDigital Library
- Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, Vol. 14. 583--598. Google ScholarDigital Library
- Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. Difacto: Distributed factorization machines. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 377--386. Google ScholarDigital Library
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 413--422.Google ScholarDigital Library
- Jun Liu, Jianhui Chen, and Jieping Ye. 2009. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD Iternational Conference on Knowledge Discovery and Data Mining. ACM, 547--556. Google ScholarDigital Library
- Tongliang Liu and Dacheng Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 447--461. Google ScholarDigital Library
- Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2009), 539--550. Google ScholarDigital Library
- H. Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222--1230. Google ScholarDigital Library
- Ming Pang, Kai-Ming Ting, Peng Zhao, and Zhi-Hua Zhou. 2018. Improving deep forest by confidence screening. In 2018 IEEE International Conference on Data Mining. IEEE, 1194--1199.Google ScholarCross Ref
- Michael J. Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In The Adaptive Web. Springer, 325--341. Google Scholar
- David Martin Powers. 2011. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation, Vol. 2. 37--63.Google Scholar
- Yashoteja Prabhu and Manik Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 263--272. Google ScholarDigital Library
- Berthier A. Ribeiro-Neto, Marco Cristo, Paulo Braz Golgher, and Edleno Silva de Moura. 2005. Impedance coupling in content-targeted advertising. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 496--503. Google ScholarDigital Library
- Tao Sun and Zhi-Hua Zhou. 2018. Structural diversity for decision tree ensemble learning. Frontiers of Computer Science 12, 3 (2018), 560--570. Google ScholarDigital Library
- E. Ke Tang, Ponnuthurai N. Suganthan, and Xin Yao. 2006. An analysis of diversity measures. Machine Learning 65, 1 (2006), 247--271. Google ScholarDigital Library
- Krishnaiyan Thulasiraman and Madisetti N. S. Swamy. 2011. Graphs: Theory and Algorithms. John Wiley 8 Sons.Google Scholar
- Kai Ming Ting and Ian H. Witten. 1999. Issues in stacked generalization. Journal of Artificial Intelligence Research 10 (1999), 271--289. Google ScholarCross Ref
- David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), 241--259. Google ScholarDigital Library
- Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.Google ScholarCross Ref
- Zhixiang Xu, Gao Huang, Kilian Q. Weinberger, and Alice X. Zheng. 2014. Gradient boosted feature selection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 522--531. Google ScholarDigital Library
- Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819--1837.Google ScholarCross Ref
- Ya-Lin Zhang, Longfei Li, Jun Zhou, Xiaolong Li, and Zhi-Hua Zhou. 2018. Anomaly detection with partially observed anomalies. In Companion Proceedings of The Web Conference 2018. International World Wide Web Conferences Steering Committee, 639--646. Google ScholarDigital Library
- Jun Zhou, Qing Cui, Xiaolong Li, Peilin Zhao, Shenquan Qu, and Jun Huang. 2017. PSMART: Parameter server based multiple additive regression trees system. In Companion Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 879--880. Google ScholarDigital Library
- Jun Zhou, Xiaolong Li, Peilin Zhao, Chaochao Chen, Longfei Li, Xinxing Yang, Qing Cui, Jin Yu, Xu Chen, Yi Ding, and Yuan (Alan) Qi. 2017. KunPeng: Parameter server based distributed learning systems and its applications in Alibaba and Ant Financial. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1693--1702. Google ScholarDigital Library
- Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press. Google ScholarCross Ref
- Zhi-Hua Zhou. 2019. Abductive learning: Towards bridging machine learning and logical reasoning. Science China Information Sciences 62, 7 (2019), 076101.Google ScholarCross Ref
- Zhi-Hua Zhou and Ji Feng. 2017. Deep forest: Towards an alternative to deep neural networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3553--3559. Google ScholarDigital Library
- Zhi-Hua Zhou and Ji Feng. 2019. Deep forest. National Science Review 6, 1 (2019), 74--86.Google ScholarCross Ref
- Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18, 1 (2006), 63--77. Google ScholarDigital Library
Index Terms
- Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud
Recommendations
Small-Scale Data Classification Based on Deep Forest
Knowledge Science, Engineering and ManagementAbstractDeveloping effective and efficient small-scale data classification methods is very challenging in the digital age. Recent researches have shown that deep forest achieves a considerable increase in classification accuracy compared with general ...
Wafer map defect recognition based on deep transfer learning-based densely connected convolutional network and deep forest
AbstractDue to the complexity and dynamics of the semiconductor manufacturing processes, wafer maps will present various defect patterns caused by various process faults. Identification of those defect patterns on wafer maps can help operators ...
Graphical abstractDisplay Omitted
Highlights- Transfer learning-based DenseNet is proposed for feature learning.
- The proposed ...
Inner Product Similarity Pruning Optimization based on Imbalanced Datasets in Deep Forest
EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer EngineeringThe deep forest model is a deep learning model comparable to deep learning proposed by Zhou and Feng, which has good performance in classifying small-scale datasets. The deep forest cascade part consists of multiple sets of random forests, in which ...
Comments