research-article

Large-scale Secure XGB for Vertical Federated Learning

Authors:
Wenjing Fang

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Derun Zhao

Ant Group, Shanghai, China

Ant Group, Shanghai, China
View Profile

,
Jin Tan

Ant Group, Shanghai, China

Ant Group, Shanghai, China
View Profile

,
Chaochao Chen

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Chaofan Yu

Ant Group, Shanghai, China

Ant Group, Shanghai, China
View Profile

,
Li Wang

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Lei Wang

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China
View Profile

,
Jun Zhou

Ant Group, Beijing, China

Ant Group, Beijing, China
View Profile

,
Benyu Zhang

Ant Group, Sunnyvale, CA, USA

Ant Group, Sunnyvale, CA, USA
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 443–452https://doi.org/10.1145/3459637.3482361

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 443–452

ABSTRACT

Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (1) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (2) we store the output model in a distributed manner in order to minimize information release, and (3) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.

References

Donald Beaver. 1991. Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference. Springer, 420--432. Google ScholarDigital Library
Octavian Catrina and Amitabh Saxena. 2010. Secure computation with fixed-point numbers. In International Conference on Financial Cryptography and Data Security. Springer, 35--50. Google ScholarDigital Library
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794. Google ScholarDigital Library
Xiaolin Chen, Shuai Zhou, Kai Yang, Hao Fan, Zejin Feng, Zhong Chen, Hu Wang, and Yongji Wang. 2021. Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning. arXiv preprint arXiv:2105.09540 (2021).Google Scholar
Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019).Google Scholar
Geoffroy Couteau. 2019. A note on the communication complexity of multiparty computation in the correlated randomness model. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 473--503.Google ScholarDigital Library
Ivan Damgård, Jesper Buus Nielsen, Michael Nielsen, and Samuel Ranellucci. 2017. The tinytable protocol for 2-party secure computation, or: Gate-scrambling revisited. In Annual International Cryptology Conference. Springer, 167--187.Google ScholarCross Ref
Martine De Cock, Rafael Dowsley, Caleb Horst, Raj Katti, Anderson CA Nascimento, Wing-Sea Poon, and Stacey Truex. 2017. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Transactions on Dependable and Secure Computing, Vol. 16, 2 (2017), 217--230. Google ScholarDigital Library
Sebastiaan de Hoogh, Berry Schoenmakers, Ping Chen, and Harm op den Akker. 2014. Practical secure decision tree learning in a teletreatment application. In International Conference on Financial Cryptography and Data Security. Springer, 179--194.Google Scholar
Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation.. In NDSS.Google Scholar
Wenliang Du and Zhijun Zhan. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE international conference on Privacy, security and data mining-Volume 14. Australian Computer Society, Inc., 1--8. Google ScholarDigital Library
David Evans, Vladimir Kolesnikov, and Mike Rosulek. 2017. A pragmatic introduction to secure multi-party computation. Foundations and Trends® in Privacy and Security, Vol. 2, 2--3 (2017).Google Scholar
Zhi Fengy, Haoyi Xiong, Chuanyuan Song, Sijia Yang, Baoxin Zhao, Licheng Wang, Zeyu Chen, Shengwen Yang, Liping Liu, and Jun Huan. 2019. SecureGBM: Secure Multi-Party Gradient Boosting. arXiv preprint arXiv:1911.11997 (2019).Google Scholar
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google Scholar
Tan Soo Fun and Azman Samsudin. 2016. A survey of homomorphic encryption for outsourced big data computation. KSII Transactions on Internet and Information Systems (TIIS), Vol. 10, 8 (2016), 3826--3851.Google Scholar
Adrià Gascón, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans. 2017. Privacy-preserving distributed linear regression on high-dimensional data. Proceedings on Privacy Enhancing Technologies, Vol. 2017, 4 (2017), 345--364.Google ScholarCross Ref
Oded Goldreich. 2007. Foundations of cryptography: volume 1, basic tools. Cambridge university press. Google ScholarDigital Library
Oded Goldreich, Silvio Micali, and Avi Wigderson. 2019. How to play any mental game, or a completeness theorem for protocols with honest majority. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali. 307--328. Google ScholarDigital Library
Robert E Goldschmidt. 1964. Applications of division by convergence. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Rob Hall, Stephen E Fienberg, and Yuval Nardi. 2011. Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, Vol. 27, 4 (2011), 669.Google Scholar
Yuval Ishai, Eyal Kushilevitz, Sigurd Meldgaard, Claudio Orlandi, and Anat Paskin-Cherniavsky. 2013. On the power of correlated randomness in secure computation. In Theory of Cryptography Conference. Springer, 600--620. Google ScholarDigital Library
Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, and Xiaoqian Jiang. 2018. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR medical informatics, Vol. 6, 2 (2018), e19.Google Scholar
Ágnes Kiss, Masoud Naderpour, Jian Liu, N Asokan, and Thomas Schneider. 2019. Sok: modular and efficient private decision tree evaluation. Proceedings on Privacy Enhancing Technologies, Vol. 2019, 2 (2019), 187--208.Google ScholarCross Ref
Qinbin Li, Zeyi Wen, and Bingsheng He. 2019. Practical Federated Gradient Boosting Decision Trees. arXiv preprint arXiv:1911.04206 (2019).Google Scholar
Yehuda Lindell and Benny Pinkas. 2000. Privacy preserving data mining. In Annual International Cryptology Conference. Springer, 36--54. Google ScholarDigital Library
Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th International Conference on World Wide Web Companion. 689--698. Google ScholarDigital Library
Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, Surya Nepal, and Robert Deng. 2019. Boosting privately: Privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218 (2019).Google Scholar
Xianrui Meng and Joan Feignebaum. 2020. Privacy-preserving XGBoost Inference. arXiv preprint arXiv:2011.04789 (2020).Google Scholar
Payman Mohassel and Peter Rindal. 2018. ABY3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 35--52. Google ScholarDigital Library
Payman Mohassel, Mike Rosulek, and Ni Trieu. 2020. Practical privacy-preserving k-means clustering. Proceedings on Privacy Enhancing Technologies, Vol. 2020, 4 (2020), 414--433.Google ScholarCross Ref
Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 19--38.Google ScholarCross Ref
Moni Naor and Benny Pinkas. 1999. Oblivious transfer and polynomial evaluation. In Proceedings of the thirty-first annual ACM symposium on Theory of computing. 245--254. Google ScholarDigital Library
Tatsuaki Okamoto and Shigenori Uchiyama. 1998. A new public-key cryptosystem as secure as factoring. In International conference on the theory and applications of cryptographic techniques. Springer, 308--318.Google ScholarCross Ref
Pille Pullonen, Dan Bogdanov, and Thomas Schneider. 2012. The design and implementation of a two-party protocol suite for Sharemind 3. CYBERNETICA Institute of Information Security, Tech. Rep, Vol. 4 (2012), 17.Google Scholar
J. Ross Quinlan. 1986. Induction of decision trees. Machine learning, Vol. 1, 1 (1986), 81--106. Google ScholarDigital Library
Rahul Rachuri and Ajith Suresh. 2019. Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning. arXiv preprint arXiv:1912.02631 (2019).Google Scholar
Gabriel Rushin, Cody Stancil, Muyang Sun, Stephen Adams, and Peter Beling. 2017. Horse race analysis in credit card fraud-deep learning, logistic regression, and Gradient Boosted Tree. In 2017 systems and information engineering design symposium (SIEDS). IEEE, 117--121.Google Scholar
Saeed Samet and Ali Miri. 2008. Privacy preserving ID3 using Gini index over horizontally partitioned data. In 2008 IEEE/ACS International Conference on Computer Systems and Applications. IEEE, 645--651. Google ScholarDigital Library
Zhihua Tian, Rui Zhang, Xiaoyang Hou, Jian Liu, and Kui Ren. 2020. FederBoost: Private Federated Learning for GBDT. arXiv preprint arXiv:2011.02796 (2020).Google Scholar
Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2018. SecureNN: Efficient and Private Neural Network Training. IACR Cryptology ePrint Archive, Vol. 2018 (2018), 442.Google Scholar
Ke Wang, Yabo Xu, Rong She, and Philip S. Yu. 2006. Classification spanning private databases. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 293. Google ScholarDigital Library
Haiqin Weng, Juntao Zhang, Feng Xue, Tao Wei, Shouling Ji, and Zhiyuan Zong. 2020. Privacy leakage of real-world vertical federated learning. arXiv preprint arXiv:2011.09290 (2020).Google Scholar
David J Wu, Tony Feng, Michael Naehrig, and Kristin Lauter. 2016. Privately evaluating decision trees and random forests. Proceedings on Privacy Enhancing Technologies, Vol. 2016, 4 (2016), 335--355.Google ScholarCross Ref
Ming-Jun Xiao, Liu-Sheng Huang, Yong-Long Luo, and Hong Shen. 2005. Privacy preserving id3 algorithm over horizontally partitioned data. In Sixth international conference on parallel and distributed computing applications and technologies (PDCAT'05). IEEE, 239--243. Google ScholarDigital Library
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 10, 2 (2019), 1--19. Google ScholarDigital Library
Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162--167. Google ScholarDigital Library
Zhiqiang Zhang, Chaochao Chen, Jun Zhou, and Xiaolong Li. 2018. An industrial-scale system for heterogeneous information card ranking in alipay. In International Conference on Database Systems for Advanced Applications. Springer, 713--724.Google ScholarDigital Library
Longfei Zheng, Chaochao Chen, Yingting Liu, Bingzhe Wu, Xibin Wu, Li Wang, Lei Wang, Jun Zhou, and Shuang Yang. 2020. Industrial Scale Privacy Preserving Deep Neural Network. arXiv preprint arXiv:2003.05198 (2020).Google Scholar

Index Terms

Large-scale Secure XGB for Vertical Federated Learning
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Privacy protections
    2. Usability in security and privacy

Recommendations

An efficient fair UC-secure protocol for two-party computation

With the development of modern Internet and mobile networks, there is an increasing need for collaborative privacy-preserving applications. Secure multi-party computation SMPC gives a general solution to these applications and has become a hot topic. ...
Read More
Unconditionally secure disjointness tests for private datasets

We present two unconditional secure protocols for private set disjointness tests. In order to provide intuition of our protocols, we give a naive example that applies Sylvester matrices. Unfortunately, this simple construction is insecure as it reveals ...
Read More
Secure Multi-Party Computation
CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security

Secure multi-party computation (SMC) is an emerging topic which has been drawing growing attention during recent decades. There are many examples which show importance of SMC constructions in practice, such as privacy-preserving decision making and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gradient tree boosting
secret sharing
secure multi-party computation
secure permutation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 534
  Total Downloads
- Downloads (Last 12 months)139
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-scale Secure XGB for Vertical Federated Learning

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

An efficient fair UC-secure protocol for two-party computation

Unconditionally secure disjointness tests for private datasets

Secure Multi-Party Computation