ABSTRACT
Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (1) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (2) we store the output model in a distributed manner in order to minimize information release, and (3) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.
- Donald Beaver. 1991. Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference. Springer, 420--432. Google ScholarDigital Library
- Octavian Catrina and Amitabh Saxena. 2010. Secure computation with fixed-point numbers. In International Conference on Financial Cryptography and Data Security. Springer, 35--50. Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794. Google ScholarDigital Library
- Xiaolin Chen, Shuai Zhou, Kai Yang, Hao Fan, Zejin Feng, Zhong Chen, Hu Wang, and Yongji Wang. 2021. Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning. arXiv preprint arXiv:2105.09540 (2021).Google Scholar
- Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019).Google Scholar
- Geoffroy Couteau. 2019. A note on the communication complexity of multiparty computation in the correlated randomness model. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 473--503.Google ScholarDigital Library
- Ivan Damgård, Jesper Buus Nielsen, Michael Nielsen, and Samuel Ranellucci. 2017. The tinytable protocol for 2-party secure computation, or: Gate-scrambling revisited. In Annual International Cryptology Conference. Springer, 167--187.Google ScholarCross Ref
- Martine De Cock, Rafael Dowsley, Caleb Horst, Raj Katti, Anderson CA Nascimento, Wing-Sea Poon, and Stacey Truex. 2017. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Transactions on Dependable and Secure Computing, Vol. 16, 2 (2017), 217--230. Google ScholarDigital Library
- Sebastiaan de Hoogh, Berry Schoenmakers, Ping Chen, and Harm op den Akker. 2014. Practical secure decision tree learning in a teletreatment application. In International Conference on Financial Cryptography and Data Security. Springer, 179--194.Google Scholar
- Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation.. In NDSS.Google Scholar
- Wenliang Du and Zhijun Zhan. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE international conference on Privacy, security and data mining-Volume 14. Australian Computer Society, Inc., 1--8. Google ScholarDigital Library
- David Evans, Vladimir Kolesnikov, and Mike Rosulek. 2017. A pragmatic introduction to secure multi-party computation. Foundations and Trends® in Privacy and Security, Vol. 2, 2--3 (2017).Google Scholar
- Zhi Fengy, Haoyi Xiong, Chuanyuan Song, Sijia Yang, Baoxin Zhao, Licheng Wang, Zeyu Chen, Shengwen Yang, Liping Liu, and Jun Huan. 2019. SecureGBM: Secure Multi-Party Gradient Boosting. arXiv preprint arXiv:1911.11997 (2019).Google Scholar
- Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google Scholar
- Tan Soo Fun and Azman Samsudin. 2016. A survey of homomorphic encryption for outsourced big data computation. KSII Transactions on Internet and Information Systems (TIIS), Vol. 10, 8 (2016), 3826--3851.Google Scholar
- Adrià Gascón, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans. 2017. Privacy-preserving distributed linear regression on high-dimensional data. Proceedings on Privacy Enhancing Technologies, Vol. 2017, 4 (2017), 345--364.Google ScholarCross Ref
- Oded Goldreich. 2007. Foundations of cryptography: volume 1, basic tools. Cambridge university press. Google ScholarDigital Library
- Oded Goldreich, Silvio Micali, and Avi Wigderson. 2019. How to play any mental game, or a completeness theorem for protocols with honest majority. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali. 307--328. Google ScholarDigital Library
- Robert E Goldschmidt. 1964. Applications of division by convergence. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Rob Hall, Stephen E Fienberg, and Yuval Nardi. 2011. Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, Vol. 27, 4 (2011), 669.Google Scholar
- Yuval Ishai, Eyal Kushilevitz, Sigurd Meldgaard, Claudio Orlandi, and Anat Paskin-Cherniavsky. 2013. On the power of correlated randomness in secure computation. In Theory of Cryptography Conference. Springer, 600--620. Google ScholarDigital Library
- Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, and Xiaoqian Jiang. 2018. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR medical informatics, Vol. 6, 2 (2018), e19.Google Scholar
- Ágnes Kiss, Masoud Naderpour, Jian Liu, N Asokan, and Thomas Schneider. 2019. Sok: modular and efficient private decision tree evaluation. Proceedings on Privacy Enhancing Technologies, Vol. 2019, 2 (2019), 187--208.Google ScholarCross Ref
- Qinbin Li, Zeyi Wen, and Bingsheng He. 2019. Practical Federated Gradient Boosting Decision Trees. arXiv preprint arXiv:1911.04206 (2019).Google Scholar
- Yehuda Lindell and Benny Pinkas. 2000. Privacy preserving data mining. In Annual International Cryptology Conference. Springer, 36--54. Google ScholarDigital Library
- Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th International Conference on World Wide Web Companion. 689--698. Google ScholarDigital Library
- Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, Surya Nepal, and Robert Deng. 2019. Boosting privately: Privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218 (2019).Google Scholar
- Xianrui Meng and Joan Feignebaum. 2020. Privacy-preserving XGBoost Inference. arXiv preprint arXiv:2011.04789 (2020).Google Scholar
- Payman Mohassel and Peter Rindal. 2018. ABY3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 35--52. Google ScholarDigital Library
- Payman Mohassel, Mike Rosulek, and Ni Trieu. 2020. Practical privacy-preserving k-means clustering. Proceedings on Privacy Enhancing Technologies, Vol. 2020, 4 (2020), 414--433.Google ScholarCross Ref
- Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 19--38.Google ScholarCross Ref
- Moni Naor and Benny Pinkas. 1999. Oblivious transfer and polynomial evaluation. In Proceedings of the thirty-first annual ACM symposium on Theory of computing. 245--254. Google ScholarDigital Library
- Tatsuaki Okamoto and Shigenori Uchiyama. 1998. A new public-key cryptosystem as secure as factoring. In International conference on the theory and applications of cryptographic techniques. Springer, 308--318.Google ScholarCross Ref
- Pille Pullonen, Dan Bogdanov, and Thomas Schneider. 2012. The design and implementation of a two-party protocol suite for Sharemind 3. CYBERNETICA Institute of Information Security, Tech. Rep, Vol. 4 (2012), 17.Google Scholar
- J. Ross Quinlan. 1986. Induction of decision trees. Machine learning, Vol. 1, 1 (1986), 81--106. Google ScholarDigital Library
- Rahul Rachuri and Ajith Suresh. 2019. Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning. arXiv preprint arXiv:1912.02631 (2019).Google Scholar
- Gabriel Rushin, Cody Stancil, Muyang Sun, Stephen Adams, and Peter Beling. 2017. Horse race analysis in credit card fraud-deep learning, logistic regression, and Gradient Boosted Tree. In 2017 systems and information engineering design symposium (SIEDS). IEEE, 117--121.Google Scholar
- Saeed Samet and Ali Miri. 2008. Privacy preserving ID3 using Gini index over horizontally partitioned data. In 2008 IEEE/ACS International Conference on Computer Systems and Applications. IEEE, 645--651. Google ScholarDigital Library
- Zhihua Tian, Rui Zhang, Xiaoyang Hou, Jian Liu, and Kui Ren. 2020. FederBoost: Private Federated Learning for GBDT. arXiv preprint arXiv:2011.02796 (2020).Google Scholar
- Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2018. SecureNN: Efficient and Private Neural Network Training. IACR Cryptology ePrint Archive, Vol. 2018 (2018), 442.Google Scholar
- Ke Wang, Yabo Xu, Rong She, and Philip S. Yu. 2006. Classification spanning private databases. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 293. Google ScholarDigital Library
- Haiqin Weng, Juntao Zhang, Feng Xue, Tao Wei, Shouling Ji, and Zhiyuan Zong. 2020. Privacy leakage of real-world vertical federated learning. arXiv preprint arXiv:2011.09290 (2020).Google Scholar
- David J Wu, Tony Feng, Michael Naehrig, and Kristin Lauter. 2016. Privately evaluating decision trees and random forests. Proceedings on Privacy Enhancing Technologies, Vol. 2016, 4 (2016), 335--355.Google ScholarCross Ref
- Ming-Jun Xiao, Liu-Sheng Huang, Yong-Long Luo, and Hong Shen. 2005. Privacy preserving id3 algorithm over horizontally partitioned data. In Sixth international conference on parallel and distributed computing applications and technologies (PDCAT'05). IEEE, 239--243. Google ScholarDigital Library
- Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 10, 2 (2019), 1--19. Google ScholarDigital Library
- Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162--167. Google ScholarDigital Library
- Zhiqiang Zhang, Chaochao Chen, Jun Zhou, and Xiaolong Li. 2018. An industrial-scale system for heterogeneous information card ranking in alipay. In International Conference on Database Systems for Advanced Applications. Springer, 713--724.Google ScholarDigital Library
- Longfei Zheng, Chaochao Chen, Yingting Liu, Bingzhe Wu, Xibin Wu, Li Wang, Lei Wang, Jun Zhou, and Shuang Yang. 2020. Industrial Scale Privacy Preserving Deep Neural Network. arXiv preprint arXiv:2003.05198 (2020).Google Scholar
Index Terms
- Large-scale Secure XGB for Vertical Federated Learning
Recommendations
An efficient fair UC-secure protocol for two-party computation
With the development of modern Internet and mobile networks, there is an increasing need for collaborative privacy-preserving applications. Secure multi-party computation SMPC gives a general solution to these applications and has become a hot topic. ...
Unconditionally secure disjointness tests for private datasets
We present two unconditional secure protocols for private set disjointness tests. In order to provide intuition of our protocols, we give a naive example that applies Sylvester matrices. Unfortunately, this simple construction is insecure as it reveals ...
Secure Multi-Party Computation
CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications SecuritySecure multi-party computation (SMC) is an emerging topic which has been drawing growing attention during recent decades. There are many examples which show importance of SMC constructions in practice, such as privacy-preserving decision making and ...
Comments