Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction

Authors:
Raouf Kerkouche

Univ. Grenoble Alpes, Grenoble, France

Univ. Grenoble Alpes, Grenoble, France
View Profile

,
Gergely Ács

BME-HIT

BME-HIT
View Profile

,
Claude Castelluccia

Univ. Grenoble Alpes, Grenoble, France

Univ. Grenoble Alpes, Grenoble, France
View Profile

,
Pierre Genevès

Univ. Grenoble Alpes, CNRS, Inria, Grenoble

Univ. Grenoble Alpes, CNRS, Inria, Grenoble
View Profile

CHIL '21: Proceedings of the Conference on Health, Inference, and LearningApril 2021Pages 25–35https://doi.org/10.1145/3450439.3451859

Published:08 April 2021Publication History

CHIL '21: Proceedings of the Conference on Health, Inference, and Learning

Pages 25–35

ABSTRACT

Machine Learning, and in particular Federated Machine Learning, opens new perspectives in terms of medical research and patient care. Although Federated Machine Learning improves over centralized Machine Learning in terms of privacy, it does not provide provable privacy guarantees. Furthermore, Federated Machine Learning is quite expensive in term of bandwidth consumption as it requires participant nodes to regularly exchange large updates. This paper proposes a bandwidth-efficient privacy-preserving Federated Learning that provides theoretical privacy guarantees based on Differential Privacy. We experimentally evaluate our proposal for in-hospital mortality prediction using a real dataset, containing Electronic Health Records of about one million patients. Our results suggest that strong and provable patient-level privacy can be enforced at the expense of only a moderate loss of prediction accuracy.

References

Martín Abadi,, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.Google Scholar
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In ACM CCS. Google ScholarDigital Library
Josephine Akosa. 2017. Predictive accuracy: a misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum. 2--5.Google Scholar
Dan Alistarh, Jerry Li, Ryota Tomioka, and Milan Vojnovic. 2016. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent. CoRR abs/1610.02132 (2016). arXiv:1610.02132 http://arxiv.org/abs/1610.02132Google Scholar
Anand Avati, Kenneth Jung, Stephanie Harman, Lance Downing, Andrew Ng, and Nigam H. Shah. 2018. Improving palliative care with deep learning. BMC Medical Informatics and Decision Making 18, 4 (12 Dec 2018), 122. Google ScholarCross Ref
Brett K. Beaulieu-Jones, William Yuan, Samuel G. Finlayson, and Zhiwei Steven Wu. 2018. Privacy-Preserving Distributed Deep Learning for Clinical Data. arXiv:cs.LG/1812.01484Google Scholar
Mohamed Bekkar, Hassiba Djema, and T.A. Alitouche. 2013. Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications 3 (01 2013), 27--38.Google Scholar
Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. 2018. signSGD: compressed optimisation for non-convex problems. CoRR abs/1802.04434 (2018). arXiv:1802.04434 http://arxiv.org/abs/1802.04434Google Scholar
Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, and Anima Anandkumar. 2018. signSGD with Majority Vote is Communication Efficient And Byzantine Fault Tolerant. CoRR abs/1810.05291 (2018). arXiv:1810.05291 http://arxiv.org/abs/1810.05291Google Scholar
Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M Buhmann. 2010. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition. IEEE, 3121--3124. Google ScholarDigital Library
Kamalika Chaudhuri and Claire Monteleoni. 2009. Privacy-preserving logistic regression. In Advances in neural information processing systems. 289--296. Google ScholarDigital Library
Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research 12, Mar (2011), 1069--1109. Google ScholarDigital Library
François Chollet et al. 2015. Keras. https://keras.io.Google Scholar
Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2019. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:cs.LG/1910.02578Google Scholar
Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2020. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:cs.LG/1910.02578Google Scholar
Marta TERRON CUADRADO. 2019. ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification. https://ec.europa.eu/cefdigital/wiki/display/EHSEMANTIC/ICD-9-CM%3A+International+Classification+of+Diseases%2C+Ninth+Revision%2C+Clinical+Modification.Google Scholar
Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014). Google ScholarDigital Library
A. Fejza, P. Genevès, N. Layaïda, and J. Bosson. 2018. Scalable and Interpretable Predictive Models for Electronic Health Records. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). 341--350. Google ScholarCross Ref
Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263--1284. Google ScholarDigital Library
Richeng Jin, Yufan Huang, Xiaofan He, Tianfu Wu, and Huaiyu Dai. 2020. Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees. arXiv:cs.LG/2002.10940Google Scholar
Raouf Kerkouche, Gergely Ács, and Claude Castelluccia. 2020. Federated Learning in Adversarial Settings. arXiv:cs.CR/2010.07808Google Scholar
Jakub Konecný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. CoRR abs/1610.05492 (2016). arXiv:1610.05492 http://arxiv.org/abs/1610.05492Google Scholar
Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In International Conference on Learning Representations, ICLR 2018. https://openreview.net/forum?id=SkhQHMW0WGoogle Scholar
Rupa Makadia and Patrick B. Ryan. 2014. Transforming the Premier Perspective® Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. In EGEMS.Google Scholar
Kate McCarthy, Bibi Zabar, and Gary Weiss. 2005. Does cost-sensitive learning beat sampling for classifying rare classes?. In Proceedings of the 1st international workshop on Utility-based data mining. 69--77. Google ScholarDigital Library
Margaret Mcdonald, Timothy Peng, Sridevi Sridharan, Janice Foust, Polina Kogan, Liliana Pezzin, and Penny Feldman. 2012. Automating the medication regimen complexity index. Journal of the American Medical Informatics Association : JAMIA 20 (12 2012). Google ScholarCross Ref
H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2016. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.Google Scholar
Cornelia Meffert, Gerta Rücker, Isaak Hatami, and Gerhild Becker. 2016. Identification of hospital patients in need of palliative care - a predictive score. BMC Palliative Care 15 (12 2016). Google ScholarCross Ref
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2018. Inference Attacks Against Collaborative Learning. CoRR abs/1805.04049 (2018). arXiv:1805.04049 http://arxiv.org/abs/1805.04049Google Scholar
Ilya Mironov, Kunal Talwar, and Li Zhang. 2019. Rényi Differential Privacy of the Sampled Gaussian Mechanism. CoRR abs/1908.10530 (2019). arXiv:1908.10530 http://arxiv.org/abs/1908.10530Google Scholar
Ajinkya More. 2016. Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048 (2016).Google Scholar
Sarang Narkhede. 2018. Understanding AUC - ROC Curve. https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.Google Scholar
Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In IEEE Symposium on Security and Privacy, 2019. 739--753. Google ScholarCross Ref
Travis E Oliphant. 2006. A guide to NumPy. Vol. 1. Trelgol Publishing USA.Google ScholarDigital Library
Stephen R. Pfohl, Andrew M. Dai, and Katherine Heller. 2019. Federated and Differentially Private Learning for Electronic Health Records. arXiv:cs.LG/1911.05861Google Scholar
Alvin Rajkomar and al. 2018. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 1, 1 (2018), 18. url, An earlier version appeared in eprint arXiv:1801.07860. Google ScholarCross Ref
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In INTERSPEECH 2014. 1058--1062. http://www.isca-speech.org/archive/interspeech_2014/i14_1058.htmlGoogle Scholar
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In ACM SIGSAC Conference on Computer and Communications Security, 2015. 1310--1321. Google ScholarDigital Library
Stacey Truex and al. 2018. A Hybrid Approach to Privacy-Preserving Federated Learning. CoRR abs/1812.03224 (2018). arXiv:1812.03224 http://arxiv.org/abs/1812.03224Google Scholar
Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary B. Charles, Dimitris S. Papailiopoulos, and Stephen Wright. 2018. ATOMO: Communication-efficient Learning via Atomic Sparsification. In NeurIPS. Google ScholarDigital Library
Gary M Weiss, Kate McCarthy, and Bibi Zabar. 2007. Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? Dmin 7, 35-41 (2007), 24.Google Scholar
Wei Wen and al. 2017. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. CoRR abs/1705.07878 (2017). arXiv:1705.07878 http://arxiv.org/abs/1705.07878 Google ScholarDigital Library
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. iDLG: Improved Deep Leakage from Gradients. arXiv preprint arXiv:2001.02610 (2020).Google Scholar
Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 14747--14756. http://papers.nips.cc/paper/9617-deep-leakage-from-gradientsGoogle Scholar

Index Terms

Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction
1. Computing methodologies
  1. Machine learning
2. Security and privacy

Recommendations

A Hybrid Approach to Privacy-Preserving Federated Learning
AISec'19: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security

Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. ...
Read More
Blockchain-Based Privacy-Preserving Medical Data Sharing Scheme Using Federated Learning
Knowledge Science, Engineering and Management
Abstract
With the booming development of big data technology and health care applications, data in the medical field is characterized by explosive growth, and medical data is valuable, which is the privacy data of patients. However, the characteristics and ...
Read More
Hierarchical Federated Learning with Gaussian Differential Privacy
AISS '22: Proceedings of the 4th International Conference on Advanced Information Science and System

Federated learning is a privacy preserving machine learning technology. Each participant can build the model without disclosing the underlying data, and only shares the weight update and gradient information of the model with the server. However, a lot ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHIL '21: Proceedings of the Conference on Health, Inference, and Learning
April 2021
309 pages
ISBN:9781450383592
DOI:10.1145/3450439
General Chair:
Marzyeh Ghassemi
University of Toronto and Vector Institute
,
Program Chairs:
Tristan Naumann
Microsoft Research Redmond
,
Emma Pierson
Stanford University and Microsoft Research New England
Copyright © 2021 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
differential privacy
electronic health record
federated learning
imbalanced data
in-hospital mortality prediction
medical data
Qualifiers
- research-article
Conference

Acceptance Rates
CHIL '21 Paper Acceptance Rate27of110submissions,25%Overall Acceptance Rate27of110submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 1,069
  Total Downloads
- Downloads (Last 12 months)418
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction

CHIL '21: Proceedings of the Conference on Health, Inference, and Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Hybrid Approach to Privacy-Preserving Federated Learning

Blockchain-Based Privacy-Preserving Medical Data Sharing Scheme Using Federated Learning

Hierarchical Federated Learning with Gaussian Differential Privacy