research-article

Privacy-Preserving Deep Learning

Authors:
Reza Shokri

University of Texas at Austin, Austin, USA

University of Texas at Austin, Austin, USA
View Profile

,
Vitaly Shmatikov

Cornell University, New York, USA

Cornell University, New York, USA
View Profile

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications SecurityOctober 2015Pages 1310–1321https://doi.org/10.1145/2810103.2813687

Published:12 October 2015Publication History

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Pages 1310–1321

ABSTRACT

Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. Many data owners--for example, medical institutions that may want to apply deep learning methods to clinical records--are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning.

In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.

References

A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. A reliable effective terascale linear learning system. JMLR, 15(1):1111--1133, 2014. Google ScholarDigital Library
M. Avriel. Nonlinear Programming: Analysis and Methods. Courier Corporation, 2003.Google Scholar
M. Barni, P. Failla, R. Lazzeretti, A. Sadeghi, and T. Schneider. Privacy-preserving ECG classification with branching programs and neural networks. Trans. Info. Forensics and Security, 6(2):452--468, 2011. Google ScholarDigital Library
R. Bassily, A. Smith, and A. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In FOCS, 2014. Google ScholarDigital Library
Y. Bengio. Learning deep architectures for AI. Foundations and trends in machine learning, 2(1):1--127, 2009. Google ScholarDigital Library
J. Bos, K. Lauter, and M. Naehrig. Private predictive analysis on encrypted medical data. J. Biomed. Informatics, 50:234--243, 2014.Google ScholarCross Ref
J. Camenisch, S. Hohenberger, M. Kohlweiss, A. Lysyanskaya, and M. Meyerovich. How to win the clonewars: Efficient periodic n-times anonymous authentication. In CCS, 2006. Google ScholarDigital Library
K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, 2009.Google ScholarDigital Library
K. Chaudhuri, C. Monteleoni, and A. Sarwate. Differentially private empirical risk minimization. JMLR, 12:1069--1109, 2011. Google ScholarDigital Library
K. Chaudhuri, A. Sarwate, and K. Sinha. A near-optimal algorithm for differentially-private principal components. JMLR, 14(1):2905--2943, 2013. Google ScholarDigital Library
R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011.Google Scholar
H. Corrigan-Gibbs and B. Ford. Dissent: Accountable anonymous group messaging. In CCS, 2010. Google ScholarDigital Library
A. A. Cruz-Roa, J. E. A. Ovalle, A. Madabhushi, and F. A. G. Osorio. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. In MICCAI, 2013.Google ScholarCross Ref
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. Le, et al. Large scale distributed deep networks. In NIPS, 2012.Google ScholarDigital Library
O. Denas and J. Taylor. Deep modeling of gene expression regulation in an erythropoiesis model. In Representation Learning, ICML Workshop, 2013.Google Scholar
L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal and Information Processing, 3, 2014.Google Scholar
W. Du, Y. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In SDM, volume 4, pages 222--233, 2004.Google ScholarCross Ref
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12:2121--2159, 2011. Google ScholarDigital Library
C. Dwork. Differential privacy. In Encyclopedia of Cryptography and Security, pages 338--340. Springer, 2011.Google ScholarCross Ref
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Theoretical Computer Science, 9(3-4):211--407, 2013. Google ScholarDigital Library
C. Dwork, G. Rothblum, and S. Vadhan. Boosting and differential privacy. In FOCS, 2010. Google ScholarDigital Library
R. Fakoor, F. Ladhak, A. Nazi, and M. Huber. Using deep learning to enhance cancer diagnosis and classification. In WHEALTH, 2013.Google Scholar
A. Graves, A.-R. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP, 2013.Google ScholarCross Ref
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al. Deepspeech: Scaling up end-to-end speech recognition. arXiv:1412.5567, 2014.Google Scholar
M. Hardt and G. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In FOCS, 2010. Google ScholarDigital Library
K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. arXiv:1502.01852, 2015.Google Scholar
G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, 29(6):82--97, 2012.Google ScholarCross Ref
G. Jagannathan and R. Wright. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In KDD, 2005. Google ScholarDigital Library
P. Jain, V. Kulkarni, A. Thakurta, and O. Williams. To drop or not to drop: Robustness, consistency and differential privacy properties of dropout. arXiv:1503.02031, 2015.Google Scholar
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.Google ScholarDigital Library
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278--2324, 1998.Google ScholarCross Ref
M. Liang, Z. Li, T. Chen, and J. Zeng. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. Trans. Comput. Biology and Bioinformatics, 12(4):928 -- 937, 2015. Google ScholarDigital Library
Y. Lindell and B. Pinkas. Privacy preserving data mining. In CRYPTO, 2000. Google ScholarDigital Library
K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT press, 2012. Google ScholarDigital Library
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Ng. Reading digits in natural images with unsupervised feature learning. In Deep Learning and Unsupervised Feature Learning, NIPS Workshop, 2011.Google Scholar
M. Pathak and B. Raj. Privacy-preserving speaker verification and identification using gaussian mixture models. Trans. Audio, Speech, and Language Processing, 21(2):397--406, 2013. Google ScholarDigital Library
M. Pathak, S. Rane, and B. Raj. Multiparty differential privacy via aggregation of locally trained classifiers. In NIPS, 2010.Google Scholar
M. Pathak, S. Rane, W. Sun, and B. Raj. Privacy preserving probabilistic inference with Hidden Markov Models. In ICASSP, 2011.Google ScholarCross Ref
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.Google ScholarDigital Library
I. Roy, S. T. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: Security and privacy for MapReduce. In NSDI, 2010. Google ScholarDigital Library
B. Rubinstein, P. Bartlett, L. Huang, and N. Taft. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. J. Privacy and Confidentiality, 4(1):4, 2012.Google ScholarCross Ref
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In Neurocomputing: Foundations of research, pages 673--695. MIT Press, 1988. Google ScholarDigital Library
A. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. Signal Processing Magazine, 30(5):86--94, 2013.Google ScholarCross Ref
D. Shultz. When your voice betrays you. Science, 347(6221), 2015.Google Scholar
P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Document Analysis and Recognition, 2013. Google ScholarDigital Library
N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In Learning Theory, pages 545--560. Springer, 2005. Google ScholarDigital Library
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1):1929--1958, 2014. Google ScholarDigital Library
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014. Google ScholarDigital Library
Torch7. A scientific computing framework for LuaJIT (torch.ch).Google Scholar
J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In KDD, 2002. Google ScholarDigital Library
J. Vaidya, M. Kantarcıoğlu, and C. Clifton. Privacy-preserving naive bayes classification. VLDB, 17(4):879--898, 2008. Google ScholarDigital Library
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, 2008. Google ScholarDigital Library
M. Wainwright, M. Jordan, and J. Duchi. Privacy aware learning. In NIPS, 2012.Google Scholar
D. Wolinsky, H. Corrigan-Gibbs, B. Ford, and A. Johnson. Dissent in numbers: Making strong anonymity scale. In OSDI, 2012. Google ScholarDigital Library
P. Xie, M. Bilenko, T. Finley, R. Gilad-Bachrach, K. Lauter, and M. Naehrig. Crypto-nets: Neural networks over encrypted data. arXiv:1412.6181, 2014.Google Scholar
H. Y. Xiong, B. Alipanahi, L. J. Lee, H. Bretschneider, D. Merico, R. K. Yuen, Y. Hua, S. Gueroussov, H. S. Najafabadi, T. R. Hughes, et al. The human splicing code reveals new insights into the genetic determinants of disease. Science, 347(6218), 2015.Google Scholar
J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett. Functional mechanism: Regression analysis under differential privacy. VLDB, 5(11):1364--1375, 2012. Google ScholarDigital Library
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In ICML, 2004. Google ScholarDigital Library
M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In NIPS, 2010.Google ScholarDigital Library

Index Terms

Privacy-Preserving Deep Learning
1. Security and privacy
2. Social and professional topics
  1. Computing / technology policy
    1. Computer crime

Recommendations

A review of privacy-preserving techniques for deep learning
Highlights
- Reviews more than 45 recent solutions papers, and more than 40 different privacy-preserving deep learning techniques.
Abstract
Deep learning is one of the advanced approaches of machine learning, and has attracted a growing attention in the recent years. It is used nowadays in different domains and applications such as pattern recognition, medical prediction, ...
Read More
Anonymization as homeomorphic data space transformation for privacy-preserving deep learning
Abstract
Industry 4.0 is largely data-driven nowadays. Owners of the data, on the one hand, want to get added value from the data by using remote artificial intelligence tools as services, on the other hand, they concern on privacy of their data within ...
Read More
Image Disguising for Privacy-preserving Deep Learning
CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security

Due to the high training costs of deep learning, model developers often rent cloud GPU servers to achieve better efficiency. However, this practice raises privacy concerns. An adversarial party may be interested in 1) personal identifiable information ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
October 2015
1750 pages
ISBN:9781450338325
DOI:10.1145/2810103
General Chair:
Indrajit Ray
Colorado State University, USA
,
Program Chairs:
Ninghui Li
Purdue University, USA
,
Christopher Kruegel
University of California, Santa Barbara, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
gradient descent
neural networks
privacy
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '15 Paper Acceptance Rate128of660submissions,19%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 906
  Total Citations
  View Citations
- 10,984
  Total Downloads
- Downloads (Last 12 months)1,245
- Downloads (Last 6 weeks)152
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy-Preserving Deep Learning

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

A review of privacy-preserving techniques for deep learning

Anonymization as homeomorphic data space transformation for privacy-preserving deep learning

Image Disguising for Privacy-preserving Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Privacy-Preserving Deep Learning

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

A review of privacy-preserving techniques for deep learning

Anonymization as homeomorphic data space transformation for privacy-preserving deep learning

Image Disguising for Privacy-preserving Deep Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media