skip to main content
10.1145/2810103.2813687acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Privacy-Preserving Deep Learning

Published:12 October 2015Publication History

ABSTRACT

Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance. Many data owners--for example, medical institutions that may want to apply deep learning methods to clinical records--are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning.

In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.

References

  1. A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. A reliable effective terascale linear learning system. JMLR, 15(1):1111--1133, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Avriel. Nonlinear Programming: Analysis and Methods. Courier Corporation, 2003.Google ScholarGoogle Scholar
  3. M. Barni, P. Failla, R. Lazzeretti, A. Sadeghi, and T. Schneider. Privacy-preserving ECG classification with branching programs and neural networks. Trans. Info. Forensics and Security, 6(2):452--468, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bassily, A. Smith, and A. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In FOCS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Bengio. Learning deep architectures for AI. Foundations and trends in machine learning, 2(1):1--127, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Bos, K. Lauter, and M. Naehrig. Private predictive analysis on encrypted medical data. J. Biomed. Informatics, 50:234--243, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Camenisch, S. Hohenberger, M. Kohlweiss, A. Lysyanskaya, and M. Meyerovich. How to win the clonewars: Efficient periodic n-times anonymous authentication. In CCS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Chaudhuri, C. Monteleoni, and A. Sarwate. Differentially private empirical risk minimization. JMLR, 12:1069--1109, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Chaudhuri, A. Sarwate, and K. Sinha. A near-optimal algorithm for differentially-private principal components. JMLR, 14(1):2905--2943, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011.Google ScholarGoogle Scholar
  12. H. Corrigan-Gibbs and B. Ford. Dissent: Accountable anonymous group messaging. In CCS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. A. Cruz-Roa, J. E. A. Ovalle, A. Madabhushi, and F. A. G. Osorio. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. In MICCAI, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. Le, et al. Large scale distributed deep networks. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Denas and J. Taylor. Deep modeling of gene expression regulation in an erythropoiesis model. In Representation Learning, ICML Workshop, 2013.Google ScholarGoogle Scholar
  16. L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal and Information Processing, 3, 2014.Google ScholarGoogle Scholar
  17. W. Du, Y. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In SDM, volume 4, pages 222--233, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12:2121--2159, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Dwork. Differential privacy. In Encyclopedia of Cryptography and Security, pages 338--340. Springer, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Theoretical Computer Science, 9(3-4):211--407, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Dwork, G. Rothblum, and S. Vadhan. Boosting and differential privacy. In FOCS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Fakoor, F. Ladhak, A. Nazi, and M. Huber. Using deep learning to enhance cancer diagnosis and classification. In WHEALTH, 2013.Google ScholarGoogle Scholar
  23. A. Graves, A.-R. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al. Deepspeech: Scaling up end-to-end speech recognition. arXiv:1412.5567, 2014.Google ScholarGoogle Scholar
  25. M. Hardt and G. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In FOCS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. arXiv:1502.01852, 2015.Google ScholarGoogle Scholar
  27. G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, 29(6):82--97, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  28. G. Jagannathan and R. Wright. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In KDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Jain, V. Kulkarni, A. Thakurta, and O. Williams. To drop or not to drop: Robustness, consistency and differential privacy properties of dropout. arXiv:1503.02031, 2015.Google ScholarGoogle Scholar
  30. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. of the IEEE, 86(11):2278--2324, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  32. M. Liang, Z. Li, T. Chen, and J. Zeng. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. Trans. Comput. Biology and Bioinformatics, 12(4):928 -- 937, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Lindell and B. Pinkas. Privacy preserving data mining. In CRYPTO, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Ng. Reading digits in natural images with unsupervised feature learning. In Deep Learning and Unsupervised Feature Learning, NIPS Workshop, 2011.Google ScholarGoogle Scholar
  36. M. Pathak and B. Raj. Privacy-preserving speaker verification and identification using gaussian mixture models. Trans. Audio, Speech, and Language Processing, 21(2):397--406, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Pathak, S. Rane, and B. Raj. Multiparty differential privacy via aggregation of locally trained classifiers. In NIPS, 2010.Google ScholarGoogle Scholar
  38. M. Pathak, S. Rane, W. Sun, and B. Raj. Privacy preserving probabilistic inference with Hidden Markov Models. In ICASSP, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  39. B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. I. Roy, S. T. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. Airavat: Security and privacy for MapReduce. In NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. Rubinstein, P. Bartlett, L. Huang, and N. Taft. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. J. Privacy and Confidentiality, 4(1):4, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  42. D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In Neurocomputing: Foundations of research, pages 673--695. MIT Press, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. Signal Processing Magazine, 30(5):86--94, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  44. D. Shultz. When your voice betrays you. Science, 347(6221), 2015.Google ScholarGoogle Scholar
  45. P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Document Analysis and Recognition, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In Learning Theory, pages 545--560. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1):1929--1958, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Torch7. A scientific computing framework for LuaJIT (torch.ch).Google ScholarGoogle Scholar
  50. J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In KDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Vaidya, M. Kantarcıoğlu, and C. Clifton. Privacy-preserving naive bayes classification. VLDB, 17(4):879--898, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Wainwright, M. Jordan, and J. Duchi. Privacy aware learning. In NIPS, 2012.Google ScholarGoogle Scholar
  54. D. Wolinsky, H. Corrigan-Gibbs, B. Ford, and A. Johnson. Dissent in numbers: Making strong anonymity scale. In OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. P. Xie, M. Bilenko, T. Finley, R. Gilad-Bachrach, K. Lauter, and M. Naehrig. Crypto-nets: Neural networks over encrypted data. arXiv:1412.6181, 2014.Google ScholarGoogle Scholar
  56. H. Y. Xiong, B. Alipanahi, L. J. Lee, H. Bretschneider, D. Merico, R. K. Yuen, Y. Hua, S. Gueroussov, H. S. Najafabadi, T. R. Hughes, et al. The human splicing code reveals new insights into the genetic determinants of disease. Science, 347(6218), 2015.Google ScholarGoogle Scholar
  57. J. Zhang, Z. Zhang, X. Xiao, Y. Yang, and M. Winslett. Functional mechanism: Regression analysis under differential privacy. VLDB, 5(11):1364--1375, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In ICML, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In NIPS, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-Preserving Deep Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
        October 2015
        1750 pages
        ISBN:9781450338325
        DOI:10.1145/2810103

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 October 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '15 Paper Acceptance Rate128of660submissions,19%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader