skip to main content
10.1145/3269206.3272031acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn

Published:17 October 2018Publication History

ABSTRACT

Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable analytics in a privacy-preserving manner, while satisfying product requirements. We present PriPeARL, a framework for privacy-preserving analytics and reporting, inspired by differential privacy. We describe the overall design and architecture, and the key modeling components, focusing on the unique challenges associated with privacy, coverage, utility, and consistency. We perform an experimental study in the context of ads analytics and reporting at LinkedIn, thereby demonstrating the tradeoffs between privacy and utility needs, and the applicability of privacy-preserving mechanisms to real-world data. We also highlight the lessons learned from the production deployment of our system at LinkedIn.

References

  1. N. R. Adam and J. C. Worthmann. Security-control methods for statistical databases: A comparative study. ACM Computing Surveys (CSUR), 21(4), 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, R. Motwani, U. Srivastava, D. Thomas, and Y. Xu. Two can keep a secret: A distributed architecture for secure database services. In CIDR, 2005.Google ScholarGoogle Scholar
  3. R. Agrawal and R. Srikant. Privacy-preserving data mining. ACM SIGMOD Record, 29(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message authentication. In CRYPTO, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. ACM Transactions on Information and System Security, 14(3), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In STOC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ú. Erlingsson, V. Pihur, and A. Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Francis, S. P. Eide, and R. Munz. Diffix: High-utility database anonymization. In Privacy Technologies and Policy: Annual Privacy Forum (APF), 2017.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Gadotti, F. Houssiau, L. Rocher, and Y.-A. de Montjoye. When the signal is in the noise: The limits of Diffix's sticky noise. arXiv:1804.06752, 2018.Google ScholarGoogle Scholar
  15. A. Greenberg. Apple's `differential privacy' is about collecting your data -- but not your data. Wired, June 2016.Google ScholarGoogle Scholar
  16. N. Johnson, J. P. Near, and D. Song. Towards practical differential privacy for SQL queries. In VLDB, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE transactions on knowledge and data engineering, 16(9), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Korolova. Privacy violations using microtargeted ads: A case study. J. Privacy and Confidentiality, 3(1), 2011.Google ScholarGoogle Scholar
  20. A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Kreps, N. Narkhede, and J. Rao. Kafka: A distributed messaging system for log processing. In NetDB, 2011.Google ScholarGoogle Scholar
  22. N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. N. Naga. Real-time analytics at massive scale with Pinot, 2014. https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot.Google ScholarGoogle Scholar
  25. A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Samarati. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Su, A. Shukla, S. Goel, and A. Narayanan. De-anonymizing web browsing data with social networks, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In KDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Voigt and A. von dem Bussche. The EU General Data Protection Regulation (GDPR): A Practical Guide. Springer, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 1965.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
      October 2018
      2362 pages
      ISBN:9781450360142
      DOI:10.1145/3269206

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader