skip to main content
10.1145/3450439.3451859acmconferencesArticle/Chapter ViewAbstractPublication PageschilConference Proceedingsconference-collections
research-article
Open Access

Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction

Published:08 April 2021Publication History

ABSTRACT

Machine Learning, and in particular Federated Machine Learning, opens new perspectives in terms of medical research and patient care. Although Federated Machine Learning improves over centralized Machine Learning in terms of privacy, it does not provide provable privacy guarantees. Furthermore, Federated Machine Learning is quite expensive in term of bandwidth consumption as it requires participant nodes to regularly exchange large updates. This paper proposes a bandwidth-efficient privacy-preserving Federated Learning that provides theoretical privacy guarantees based on Differential Privacy. We experimentally evaluate our proposal for in-hospital mortality prediction using a real dataset, containing Electronic Health Records of about one million patients. Our results suggest that strong and provable patient-level privacy can be enforced at the expense of only a moderate loss of prediction accuracy.

References

  1. Martín Abadi,, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In ACM CCS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Josephine Akosa. 2017. Predictive accuracy: a misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum. 2--5.Google ScholarGoogle Scholar
  4. Dan Alistarh, Jerry Li, Ryota Tomioka, and Milan Vojnovic. 2016. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent. CoRR abs/1610.02132 (2016). arXiv:1610.02132 http://arxiv.org/abs/1610.02132Google ScholarGoogle Scholar
  5. Anand Avati, Kenneth Jung, Stephanie Harman, Lance Downing, Andrew Ng, and Nigam H. Shah. 2018. Improving palliative care with deep learning. BMC Medical Informatics and Decision Making 18, 4 (12 Dec 2018), 122. Google ScholarGoogle ScholarCross RefCross Ref
  6. Brett K. Beaulieu-Jones, William Yuan, Samuel G. Finlayson, and Zhiwei Steven Wu. 2018. Privacy-Preserving Distributed Deep Learning for Clinical Data. arXiv:cs.LG/1812.01484Google ScholarGoogle Scholar
  7. Mohamed Bekkar, Hassiba Djema, and T.A. Alitouche. 2013. Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications 3 (01 2013), 27--38.Google ScholarGoogle Scholar
  8. Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. 2018. signSGD: compressed optimisation for non-convex problems. CoRR abs/1802.04434 (2018). arXiv:1802.04434 http://arxiv.org/abs/1802.04434Google ScholarGoogle Scholar
  9. Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, and Anima Anandkumar. 2018. signSGD with Majority Vote is Communication Efficient And Byzantine Fault Tolerant. CoRR abs/1810.05291 (2018). arXiv:1810.05291 http://arxiv.org/abs/1810.05291Google ScholarGoogle Scholar
  10. Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M Buhmann. 2010. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition. IEEE, 3121--3124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kamalika Chaudhuri and Claire Monteleoni. 2009. Privacy-preserving logistic regression. In Advances in neural information processing systems. 289--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research 12, Mar (2011), 1069--1109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. François Chollet et al. 2015. Keras. https://keras.io.Google ScholarGoogle Scholar
  14. Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2019. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:cs.LG/1910.02578Google ScholarGoogle Scholar
  15. Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2020. Differential Privacy-enabled Federated Learning for Sensitive Health Data. arXiv:cs.LG/1910.02578Google ScholarGoogle Scholar
  16. Marta TERRON CUADRADO. 2019. ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification. https://ec.europa.eu/cefdigital/wiki/display/EHSEMANTIC/ICD-9-CM%3A+International+Classification+of+Diseases%2C+Ninth+Revision%2C+Clinical+Modification.Google ScholarGoogle Scholar
  17. Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Fejza, P. Genevès, N. Layaïda, and J. Bosson. 2018. Scalable and Interpretable Predictive Models for Electronic Health Records. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). 341--350. Google ScholarGoogle ScholarCross RefCross Ref
  19. Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Richeng Jin, Yufan Huang, Xiaofan He, Tianfu Wu, and Huaiyu Dai. 2020. Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees. arXiv:cs.LG/2002.10940Google ScholarGoogle Scholar
  21. Raouf Kerkouche, Gergely Ács, and Claude Castelluccia. 2020. Federated Learning in Adversarial Settings. arXiv:cs.CR/2010.07808Google ScholarGoogle Scholar
  22. Jakub Konecný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. CoRR abs/1610.05492 (2016). arXiv:1610.05492 http://arxiv.org/abs/1610.05492Google ScholarGoogle Scholar
  23. Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In International Conference on Learning Representations, ICLR 2018. https://openreview.net/forum?id=SkhQHMW0WGoogle ScholarGoogle Scholar
  24. Rupa Makadia and Patrick B. Ryan. 2014. Transforming the Premier Perspective® Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. In EGEMS.Google ScholarGoogle Scholar
  25. Kate McCarthy, Bibi Zabar, and Gary Weiss. 2005. Does cost-sensitive learning beat sampling for classifying rare classes?. In Proceedings of the 1st international workshop on Utility-based data mining. 69--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Margaret Mcdonald, Timothy Peng, Sridevi Sridharan, Janice Foust, Polina Kogan, Liliana Pezzin, and Penny Feldman. 2012. Automating the medication regimen complexity index. Journal of the American Medical Informatics Association : JAMIA 20 (12 2012). Google ScholarGoogle ScholarCross RefCross Ref
  27. H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2016. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.Google ScholarGoogle Scholar
  28. Cornelia Meffert, Gerta Rücker, Isaak Hatami, and Gerhild Becker. 2016. Identification of hospital patients in need of palliative care - a predictive score. BMC Palliative Care 15 (12 2016). Google ScholarGoogle ScholarCross RefCross Ref
  29. Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2018. Inference Attacks Against Collaborative Learning. CoRR abs/1805.04049 (2018). arXiv:1805.04049 http://arxiv.org/abs/1805.04049Google ScholarGoogle Scholar
  30. Ilya Mironov, Kunal Talwar, and Li Zhang. 2019. Rényi Differential Privacy of the Sampled Gaussian Mechanism. CoRR abs/1908.10530 (2019). arXiv:1908.10530 http://arxiv.org/abs/1908.10530Google ScholarGoogle Scholar
  31. Ajinkya More. 2016. Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048 (2016).Google ScholarGoogle Scholar
  32. Sarang Narkhede. 2018. Understanding AUC - ROC Curve. https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.Google ScholarGoogle Scholar
  33. Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In IEEE Symposium on Security and Privacy, 2019. 739--753. Google ScholarGoogle ScholarCross RefCross Ref
  34. Travis E Oliphant. 2006. A guide to NumPy. Vol. 1. Trelgol Publishing USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stephen R. Pfohl, Andrew M. Dai, and Katherine Heller. 2019. Federated and Differentially Private Learning for Electronic Health Records. arXiv:cs.LG/1911.05861Google ScholarGoogle Scholar
  36. Alvin Rajkomar and al. 2018. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 1, 1 (2018), 18. url, An earlier version appeared in eprint arXiv:1801.07860. Google ScholarGoogle ScholarCross RefCross Ref
  37. Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In INTERSPEECH 2014. 1058--1062. http://www.isca-speech.org/archive/interspeech_2014/i14_1058.htmlGoogle ScholarGoogle Scholar
  38. Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In ACM SIGSAC Conference on Computer and Communications Security, 2015. 1310--1321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Stacey Truex and al. 2018. A Hybrid Approach to Privacy-Preserving Federated Learning. CoRR abs/1812.03224 (2018). arXiv:1812.03224 http://arxiv.org/abs/1812.03224Google ScholarGoogle Scholar
  40. Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary B. Charles, Dimitris S. Papailiopoulos, and Stephen Wright. 2018. ATOMO: Communication-efficient Learning via Atomic Sparsification. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Gary M Weiss, Kate McCarthy, and Bibi Zabar. 2007. Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? Dmin 7, 35-41 (2007), 24.Google ScholarGoogle Scholar
  42. Wei Wen and al. 2017. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. CoRR abs/1705.07878 (2017). arXiv:1705.07878 http://arxiv.org/abs/1705.07878 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. iDLG: Improved Deep Leakage from Gradients. arXiv preprint arXiv:2001.02610 (2020).Google ScholarGoogle Scholar
  44. Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep Leakage from Gradients. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 14747--14756. http://papers.nips.cc/paper/9617-deep-leakage-from-gradientsGoogle ScholarGoogle Scholar

Index Terms

  1. Privacy-preserving and bandwidth-efficient federated learning: an application to in-hospital mortality prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CHIL '21: Proceedings of the Conference on Health, Inference, and Learning
        April 2021
        309 pages
        ISBN:9781450383592
        DOI:10.1145/3450439

        Copyright © 2021 ACM

        Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 April 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CHIL '21 Paper Acceptance Rate27of110submissions,25%Overall Acceptance Rate27of110submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader