skip to main content
10.1145/3459637.3482361acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Large-scale Secure XGB for Vertical Federated Learning

Published:30 October 2021Publication History

ABSTRACT

Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (1) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (2) we store the output model in a distributed manner in order to minimize information release, and (3) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.

References

  1. Donald Beaver. 1991. Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference. Springer, 420--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Octavian Catrina and Amitabh Saxena. 2010. Secure computation with fixed-point numbers. In International Conference on Financial Cryptography and Data Security. Springer, 35--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Xiaolin Chen, Shuai Zhou, Kai Yang, Hao Fan, Zejin Feng, Zhong Chen, Hu Wang, and Yongji Wang. 2021. Fed-EINI: An Efficient and Interpretable Inference Framework for Decision Tree Ensembles in Federated Learning. arXiv preprint arXiv:2105.09540 (2021).Google ScholarGoogle Scholar
  5. Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019).Google ScholarGoogle Scholar
  6. Geoffroy Couteau. 2019. A note on the communication complexity of multiparty computation in the correlated randomness model. In Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, 473--503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ivan Damgård, Jesper Buus Nielsen, Michael Nielsen, and Samuel Ranellucci. 2017. The tinytable protocol for 2-party secure computation, or: Gate-scrambling revisited. In Annual International Cryptology Conference. Springer, 167--187.Google ScholarGoogle ScholarCross RefCross Ref
  8. Martine De Cock, Rafael Dowsley, Caleb Horst, Raj Katti, Anderson CA Nascimento, Wing-Sea Poon, and Stacey Truex. 2017. Efficient and private scoring of decision trees, support vector machines and logistic regression models based on pre-computation. IEEE Transactions on Dependable and Secure Computing, Vol. 16, 2 (2017), 217--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sebastiaan de Hoogh, Berry Schoenmakers, Ping Chen, and Harm op den Akker. 2014. Practical secure decision tree learning in a teletreatment application. In International Conference on Financial Cryptography and Data Security. Springer, 179--194.Google ScholarGoogle Scholar
  10. Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation.. In NDSS.Google ScholarGoogle Scholar
  11. Wenliang Du and Zhijun Zhan. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE international conference on Privacy, security and data mining-Volume 14. Australian Computer Society, Inc., 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Evans, Vladimir Kolesnikov, and Mike Rosulek. 2017. A pragmatic introduction to secure multi-party computation. Foundations and Trends® in Privacy and Security, Vol. 2, 2--3 (2017).Google ScholarGoogle Scholar
  13. Zhi Fengy, Haoyi Xiong, Chuanyuan Song, Sijia Yang, Baoxin Zhao, Licheng Wang, Zeyu Chen, Shengwen Yang, Liping Liu, and Jun Huan. 2019. SecureGBM: Secure Multi-Party Gradient Boosting. arXiv preprint arXiv:1911.11997 (2019).Google ScholarGoogle Scholar
  14. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  15. Tan Soo Fun and Azman Samsudin. 2016. A survey of homomorphic encryption for outsourced big data computation. KSII Transactions on Internet and Information Systems (TIIS), Vol. 10, 8 (2016), 3826--3851.Google ScholarGoogle Scholar
  16. Adrià Gascón, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans. 2017. Privacy-preserving distributed linear regression on high-dimensional data. Proceedings on Privacy Enhancing Technologies, Vol. 2017, 4 (2017), 345--364.Google ScholarGoogle ScholarCross RefCross Ref
  17. Oded Goldreich. 2007. Foundations of cryptography: volume 1, basic tools. Cambridge university press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Oded Goldreich, Silvio Micali, and Avi Wigderson. 2019. How to play any mental game, or a completeness theorem for protocols with honest majority. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali. 307--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert E Goldschmidt. 1964. Applications of division by convergence. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  20. Rob Hall, Stephen E Fienberg, and Yuval Nardi. 2011. Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, Vol. 27, 4 (2011), 669.Google ScholarGoogle Scholar
  21. Yuval Ishai, Eyal Kushilevitz, Sigurd Meldgaard, Claudio Orlandi, and Anat Paskin-Cherniavsky. 2013. On the power of correlated randomness in secure computation. In Theory of Cryptography Conference. Springer, 600--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, and Xiaoqian Jiang. 2018. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR medical informatics, Vol. 6, 2 (2018), e19.Google ScholarGoogle Scholar
  23. Ágnes Kiss, Masoud Naderpour, Jian Liu, N Asokan, and Thomas Schneider. 2019. Sok: modular and efficient private decision tree evaluation. Proceedings on Privacy Enhancing Technologies, Vol. 2019, 2 (2019), 187--208.Google ScholarGoogle ScholarCross RefCross Ref
  24. Qinbin Li, Zeyi Wen, and Bingsheng He. 2019. Practical Federated Gradient Boosting Decision Trees. arXiv preprint arXiv:1911.04206 (2019).Google ScholarGoogle Scholar
  25. Yehuda Lindell and Benny Pinkas. 2000. Privacy preserving data mining. In Annual International Cryptology Conference. Springer, 36--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xiaoliang Ling, Weiwei Deng, Chen Gu, Hucheng Zhou, Cui Li, and Feng Sun. 2017. Model ensemble for click prediction in bing search ads. In Proceedings of the 26th International Conference on World Wide Web Companion. 689--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yang Liu, Zhuo Ma, Ximeng Liu, Siqi Ma, Surya Nepal, and Robert Deng. 2019. Boosting privately: Privacy-preserving federated extreme boosting for mobile crowdsensing. arXiv preprint arXiv:1907.10218 (2019).Google ScholarGoogle Scholar
  28. Xianrui Meng and Joan Feignebaum. 2020. Privacy-preserving XGBoost Inference. arXiv preprint arXiv:2011.04789 (2020).Google ScholarGoogle Scholar
  29. Payman Mohassel and Peter Rindal. 2018. ABY3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 35--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Payman Mohassel, Mike Rosulek, and Ni Trieu. 2020. Practical privacy-preserving k-means clustering. Proceedings on Privacy Enhancing Technologies, Vol. 2020, 4 (2020), 414--433.Google ScholarGoogle ScholarCross RefCross Ref
  31. Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 19--38.Google ScholarGoogle ScholarCross RefCross Ref
  32. Moni Naor and Benny Pinkas. 1999. Oblivious transfer and polynomial evaluation. In Proceedings of the thirty-first annual ACM symposium on Theory of computing. 245--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tatsuaki Okamoto and Shigenori Uchiyama. 1998. A new public-key cryptosystem as secure as factoring. In International conference on the theory and applications of cryptographic techniques. Springer, 308--318.Google ScholarGoogle ScholarCross RefCross Ref
  34. Pille Pullonen, Dan Bogdanov, and Thomas Schneider. 2012. The design and implementation of a two-party protocol suite for Sharemind 3. CYBERNETICA Institute of Information Security, Tech. Rep, Vol. 4 (2012), 17.Google ScholarGoogle Scholar
  35. J. Ross Quinlan. 1986. Induction of decision trees. Machine learning, Vol. 1, 1 (1986), 81--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rahul Rachuri and Ajith Suresh. 2019. Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning. arXiv preprint arXiv:1912.02631 (2019).Google ScholarGoogle Scholar
  37. Gabriel Rushin, Cody Stancil, Muyang Sun, Stephen Adams, and Peter Beling. 2017. Horse race analysis in credit card fraud-deep learning, logistic regression, and Gradient Boosted Tree. In 2017 systems and information engineering design symposium (SIEDS). IEEE, 117--121.Google ScholarGoogle Scholar
  38. Saeed Samet and Ali Miri. 2008. Privacy preserving ID3 using Gini index over horizontally partitioned data. In 2008 IEEE/ACS International Conference on Computer Systems and Applications. IEEE, 645--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhihua Tian, Rui Zhang, Xiaoyang Hou, Jian Liu, and Kui Ren. 2020. FederBoost: Private Federated Learning for GBDT. arXiv preprint arXiv:2011.02796 (2020).Google ScholarGoogle Scholar
  40. Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2018. SecureNN: Efficient and Private Neural Network Training. IACR Cryptology ePrint Archive, Vol. 2018 (2018), 442.Google ScholarGoogle Scholar
  41. Ke Wang, Yabo Xu, Rong She, and Philip S. Yu. 2006. Classification spanning private databases. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Haiqin Weng, Juntao Zhang, Feng Xue, Tao Wei, Shouling Ji, and Zhiyuan Zong. 2020. Privacy leakage of real-world vertical federated learning. arXiv preprint arXiv:2011.09290 (2020).Google ScholarGoogle Scholar
  43. David J Wu, Tony Feng, Michael Naehrig, and Kristin Lauter. 2016. Privately evaluating decision trees and random forests. Proceedings on Privacy Enhancing Technologies, Vol. 2016, 4 (2016), 335--355.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ming-Jun Xiao, Liu-Sheng Huang, Yong-Long Luo, and Hong Shen. 2005. Privacy preserving id3 algorithm over horizontally partitioned data. In Sixth international conference on parallel and distributed computing applications and technologies (PDCAT'05). IEEE, 239--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 10, 2 (2019), 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Andrew Chi-Chih Yao. 1986. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986). IEEE, 162--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zhiqiang Zhang, Chaochao Chen, Jun Zhou, and Xiaolong Li. 2018. An industrial-scale system for heterogeneous information card ranking in alipay. In International Conference on Database Systems for Advanced Applications. Springer, 713--724.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Longfei Zheng, Chaochao Chen, Yingting Liu, Bingzhe Wu, Xibin Wu, Li Wang, Lei Wang, Jun Zhou, and Shuang Yang. 2020. Industrial Scale Privacy Preserving Deep Neural Network. arXiv preprint arXiv:2003.05198 (2020).Google ScholarGoogle Scholar

Index Terms

  1. Large-scale Secure XGB for Vertical Federated Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
          October 2021
          4966 pages
          ISBN:9781450384469
          DOI:10.1145/3459637

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 October 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader