ABSTRACT
Machine Learning, and in particular Federated Machine Learning, opens new perspectives in terms of medical research and patient care. Although Federated Machine Learning improves over centralized Machine Learning in terms of privacy, it does not provide provable privacy guarantees. Furthermore, Federated Machine Learning is quite expensive in term of bandwidth consumption as it requires participant nodes to regularly exchange large updates. This paper proposes a bandwidth-efficient privacy-preserving Federated Learning that provides theoretical privacy guarantees based on Differential Privacy. We experimentally evaluate our proposal for in-hospital mortality prediction using a real dataset, containing Electronic Health Records of about one million patients. Our results suggest that strong and provable patient-level privacy can be enforced at the expense of only a moderate loss of prediction accuracy.
- Martín Abadi,, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.Google Scholar
- Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In ACM CCS. Google ScholarDigital Library
- Josephine Akosa. 2017. Predictive accuracy: a misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum. 2--5.Google Scholar
- Dan Alistarh, Jerry Li, Ryota Tomioka, and Milan Vojnovic. 2016. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent. CoRR abs/1610.02132 (2016). arXiv:1610.02132 http://arxiv.org/abs/1610.02132Google Scholar
- Anand Avati, Kenneth Jung, Stephanie Harman, Lance Downing, Andrew Ng, and Nigam H. Shah. 2018. Improving palliative care with deep learning. BMC Medical Informatics and Decision Making 18, 4 (12 Dec 2018), 122. Google ScholarCross Ref
- Brett K. Beaulieu-Jones, William Yuan, Samuel G. Finlayson, and Zhiwei Steven Wu. 2018. Privacy-Preserving Distributed Deep Learning for Clinical Data. arXiv:cs.LG/1812.01484Google Scholar
- Mohamed Bekkar, Hassiba Djema, and T.A. Alitouche. 2013. Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications 3 (01 2013), 27--38.Google Scholar
- Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. 2018. signSGD: compressed optimisation for non-convex problems. CoRR abs/1802.04434 (2018). arXiv:1802.04434 http://arxiv.org/abs/1802.04434Google Scholar
- Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, and Anima Anandkumar. 2018. signSGD with Majority Vote is Communication Efficient And Byzantine Fault Tolerant. CoRR abs/1810.05291 (2018). arXiv:1810.05291 http://arxiv.org/abs/1810.05291Google Scholar
- Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M Buhmann. 2010. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition. IEEE, 3121--3124. Google ScholarDigital Library
- Kamalika Chaudhuri and Claire Monteleoni. 2009. Privacy-preserving logistic regression. In Advances in neural information processing systems. 289--296. Google ScholarDigital Library
- Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research 12, Mar (2011), 1069--1109. Google ScholarDigital Library
- François Chollet et al. 2015. Keras. https://keras.io.Google Scholar
- Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2019. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:cs.LG/1910.02578Google Scholar
- Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2020. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:cs.LG/1910.02578Google Scholar
- Marta TERRON CUADRADO. 2019. ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification. https://ec.europa.eu/cefdigital/wiki/display/EHSEMANTIC/ICD-9-CM%3A+International+Classification+of+Diseases%2C+Ninth+Revision%2C+Clinical+Modification.Google Scholar
- Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014). Google ScholarDigital Library
- A. Fejza, P. Genevès, N. Layaïda, and J. Bosson. 2018. Scalable and Interpretable Predictive Models for Electronic Health Records. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). 341--350. Google ScholarCross Ref
- Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263--1284. Google ScholarDigital Library
- Richeng Jin, Yufan Huang, Xiaofan He, Tianfu Wu, and Huaiyu Dai. 2020. Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees. arXiv:cs.LG/2002.10940Google Scholar
- Raouf Kerkouche, Gergely Ács, and Claude Castelluccia. 2020. Federated Learning in Adversarial Settings. arXiv:cs.CR/2010.07808Google Scholar
- Jakub Konecný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. CoRR abs/1610.05492 (2016). arXiv:1610.05492 http://arxiv.org/abs/1610.05492Google Scholar
- Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In International Conference on Learning Representations, ICLR 2018. https://openreview.net/forum?id=SkhQHMW0WGoogle Scholar
- Rupa Makadia and Patrick B. Ryan. 2014. Transforming the Premier Perspective® Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. In EGEMS.Google Scholar
- Kate McCarthy, Bibi Zabar, and Gary Weiss. 2005. Does cost-sensitive learning beat sampling for classifying rare classes?. In Proceedings of the 1st international workshop on Utility-based data mining. 69--77. Google ScholarDigital Library
- Margaret Mcdonald, Timothy Peng, Sridevi Sridharan, Janice Foust, Polina Kogan, Liliana Pezzin, and Penny Feldman. 2012. Automating the medication regimen complexity index. Journal of the American Medical Informatics Association : JAMIA 20 (12 2012). Google ScholarCross Ref
- H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2016. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.Google Scholar
- Cornelia Meffert, Gerta Rücker, Isaak Hatami, and Gerhild Becker. 2016. Identification of hospital patients in need of palliative care - a predictive score. BMC Palliative Care 15 (12 2016). Google ScholarCross Ref
- Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2018. Inference Attacks Against Collaborative Learning. CoRR abs/1805.04049 (2018). arXiv:1805.04049 http://arxiv.org/abs/1805.04049Google Scholar
- Ilya Mironov, Kunal Talwar, and Li Zhang. 2019. Rényi Differential Privacy of the Sampled Gaussian Mechanism. CoRR abs/1908.10530 (2019). arXiv:1908.10530 http://arxiv.org/abs/1908.10530Google Scholar
- Ajinkya More. 2016. Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048 (2016).Google Scholar
- Sarang Narkhede. 2018. Understanding AUC - ROC Curve. https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.Google Scholar
- Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In IEEE Symposium on Security and Privacy, 2019. 739--753. Google ScholarCross Ref
- Travis E Oliphant. 2006. A guide to NumPy. Vol. 1. Trelgol Publishing USA.Google ScholarDigital Library
- Stephen R. Pfohl, Andrew M. Dai, and Katherine Heller. 2019. Federated and Differentially Private Learning for Electronic Health Records. arXiv:cs.LG/1911.05861Google Scholar
- Alvin Rajkomar and al. 2018. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 1, 1 (2018), 18. url, An earlier version appeared in eprint arXiv:1801.07860. Google ScholarCross Ref
- Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In INTERSPEECH 2014. 1058--1062. http://www.isca-speech.org/archive/interspeech_2014/i14_1058.htmlGoogle Scholar
- Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In ACM SIGSAC Conference on Computer and Communications Security, 2015. 1310--1321. Google ScholarDigital Library
- Stacey Truex and al. 2018. A Hybrid Approach to Privacy-Preserving Federated Learning. CoRR abs/1812.03224 (2018). arXiv:1812.03224 http://arxiv.org/abs/1812.03224Google Scholar
- Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary B. Charles, Dimitris S. Papailiopoulos, and Stephen Wright. 2018. ATOMO: Communication-efficient Learning via Atomic Sparsification. In NeurIPS. Google ScholarDigital Library
- Gary M Weiss, Kate McCarthy, and Bibi Zabar. 2007. Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? Dmin 7, 35-41 (2007), 24.Google Scholar
- Wei Wen and al. 2017. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. CoRR abs/1705.07878 (2017). arXiv:1705.07878 http://arxiv.org/abs/1705.07878 Google ScholarDigital Library
- Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. iDLG: Improved Deep Leakage from Gradients. arXiv preprint arXiv:2001.02610 (2020).Google Scholar
- Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 14747--14756. http://papers.nips.cc/paper/9617-deep-leakage-from-gradientsGoogle Scholar
Index Terms
- Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction
Recommendations
A Hybrid Approach to Privacy-Preserving Federated Learning
AISec'19: Proceedings of the 12th ACM Workshop on Artificial Intelligence and SecurityFederated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. ...
Blockchain-Based Privacy-Preserving Medical Data Sharing Scheme Using Federated Learning
Knowledge Science, Engineering and ManagementAbstractWith the booming development of big data technology and health care applications, data in the medical field is characterized by explosive growth, and medical data is valuable, which is the privacy data of patients. However, the characteristics and ...
Hierarchical Federated Learning with Gaussian Differential Privacy
AISS '22: Proceedings of the 4th International Conference on Advanced Information Science and SystemFederated learning is a privacy preserving machine learning technology. Each participant can build the model without disclosing the underlying data, and only shares the weight update and gradient information of the model with the server. However, a lot ...
Comments