ABSTRACT
Preserving privacy of users is a key requirement of web-scale analytics and reporting applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. We focus on the problem of computing robust, reliable analytics in a privacy-preserving manner, while satisfying product requirements. We present PriPeARL, a framework for privacy-preserving analytics and reporting, inspired by differential privacy. We describe the overall design and architecture, and the key modeling components, focusing on the unique challenges associated with privacy, coverage, utility, and consistency. We perform an experimental study in the context of ads analytics and reporting at LinkedIn, thereby demonstrating the tradeoffs between privacy and utility needs, and the applicability of privacy-preserving mechanisms to real-world data. We also highlight the lessons learned from the production deployment of our system at LinkedIn.
- N. R. Adam and J. C. Worthmann. Security-control methods for statistical databases: A comparative study. ACM Computing Surveys (CSUR), 21(4), 1989. Google ScholarDigital Library
- G. Aggarwal, M. Bawa, P. Ganesan, H. Garcia-Molina, K. Kenthapadi, R. Motwani, U. Srivastava, D. Thomas, and Y. Xu. Two can keep a secret: A distributed architecture for secure database services. In CIDR, 2005.Google Scholar
- R. Agrawal and R. Srikant. Privacy-preserving data mining. ACM SIGMOD Record, 29(2), 2000. Google ScholarDigital Library
- L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In WWW, 2007. Google ScholarDigital Library
- M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message authentication. In CRYPTO, 1996. Google ScholarDigital Library
- T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. ACM Transactions on Information and System Security, 14(3), 2011. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
- C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In STOC, 2010. Google ScholarDigital Library
- C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3--4), 2014. Google ScholarDigital Library
- Ú. Erlingsson, V. Pihur, and A. Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, 2014. Google ScholarDigital Library
- A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003. Google ScholarDigital Library
- P. Francis, S. P. Eide, and R. Munz. Diffix: High-utility database anonymization. In Privacy Technologies and Policy: Annual Privacy Forum (APF), 2017.Google ScholarCross Ref
- A. Gadotti, F. Houssiau, L. Rocher, and Y.-A. de Montjoye. When the signal is in the noise: The limits of Diffix's sticky noise. arXiv:1804.06752, 2018.Google Scholar
- A. Greenberg. Apple's `differential privacy' is about collecting your data -- but not your data. Wired, June 2016.Google Scholar
- N. Johnson, J. P. Near, and D. Song. Towards practical differential privacy for SQL queries. In VLDB, 2018. Google ScholarDigital Library
- M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE transactions on knowledge and data engineering, 16(9), 2004. Google ScholarDigital Library
- K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In PODS, 2005. Google ScholarDigital Library
- A. Korolova. Privacy violations using microtargeted ads: A case study. J. Privacy and Confidentiality, 3(1), 2011.Google Scholar
- A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, 2009. Google ScholarDigital Library
- J. Kreps, N. Narkhede, and J. Rao. Kafka: A distributed messaging system for log processing. In NetDB, 2011.Google Scholar
- N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.Google ScholarCross Ref
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1), 2007. Google ScholarDigital Library
- P. N. Naga. Real-time analytics at massive scale with Pinot, 2014. https://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot.Google Scholar
- A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, 2008. Google ScholarDigital Library
- P. Samarati. Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 2001. Google ScholarDigital Library
- J. Su, A. Shukla, S. Goel, and A. Narayanan. De-anonymizing web browsing data with social networks, 2017.Google ScholarDigital Library
- L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 2002. Google ScholarDigital Library
- J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In KDD, 2002. Google ScholarDigital Library
- P. Voigt and A. von dem Bussche. The EU General Data Protection Regulation (GDPR): A Practical Guide. Springer, 2017. Google ScholarDigital Library
- S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 1965.Google ScholarCross Ref
Index Terms
- PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn
Recommendations
A Novel Differential Privacy Approach that Enhances Classification Accuracy
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringIn the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either ...
Personal big data pricing method based on differential privacy
AbstractPersonal big data can greatly promote social management, business applications, and personal services, and bring certain economic benefits to users. The difficulty with personal big data security and privacy protection lies in ...
Sensitive Disclosures under Differential Privacy Guarantees
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataNon-independent reasoning (NIR) refers to learning the information of one record from other records, under the assumption that these records share the same underlying distribution. Accurate NIR could disclose private information of an individual. An ...
Comments