ABSTRACT
We study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries and has been the focus of a long line of work. For a given set of d linear queries over a database x ∈ RN, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, [5, 32] give an O(log2 d) approximation to the optimal mechanism. Our first contribution is to give an efficient O(log2 d) approximation guarantee for the case of (ε,δ)-differential privacy. Our mechanism adds carefully chosen correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of [44], using tools from convex geometry. We next consider the sparse case when the number of queries exceeds the number of individuals in the database, i.e. when d > n Δ |x|1. The lower bounds used in the previous approximation algorithm no longer apply --- in fact better mechanisms are known in this setting [7, 27, 28, 31, 49]. Our second main contribution is to give an efficient (ε,δ)-differentially private mechanism that, for any given query set A and an upper bound n on |x|1, has mean squared error within polylog(d,N) of the optimal for A and n. This approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the l1 ball. Additionally, we show a similar polylogarithmic approximation guarantee for the optimal ε-differentially private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with entries in {0,1}, there is an ε-differentially private mechanism with expected error ~O(√n) per query, improving on the ~O(n2/3) bound of [7] and matching the lower bound implied by [15] up to logarithmic factors.
The connection between the hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.
- N. Bansal. Constructive algorithms for discrepancy minimization. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 3--10. IEEE, 2010. Google ScholarDigital Library
- B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In L. Libkin, editor, Proceedings of ACM PODS, pages 273--282. ACM, 2007. Google ScholarDigital Library
- I. Bárány and Z. Füredi. Approximation of the sphere by polytopes having few vertices. Proceedings of the American Mathematical Society, 102(3):651--659, 1988.Google ScholarCross Ref
- J. Beck and V. T. Sós. Handbook of combinatorics (vol. 2). chapter Discrepancy theory, pages 1405--1446. MIT Press, Cambridge, MA, USA, 1995. Google ScholarDigital Library
- A. Bhaskara, D. Dadush, R. Krishnaswamy, and K. Talwar. Unconditional differentially private mechanisms for linear queries. In Proceedings of the 44th symposium on Theory of Computing, STOC '12, pages 1269--1284, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128--138. ACM, 2005. Google ScholarDigital Library
- A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC '08: Proceedings of the 40th annual ACM symposium on Theory of computing, pages 609--618, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- J. Bourgain and L. Tzafriri. Invertibility of large submatrices with applications to the geometry of banach spaces and harmonic analysis. Israel journal of mathematics, 57(2):137--224, 1987.Google Scholar
- H. Brenner and K. Nissim. Impossibility of differentially private universally optimal mechanisms. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS '10, pages 71--80, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. In ICALP, 2010. Google ScholarDigital Library
- K. Chandrasekaran and S. Vempala. A discrepancy based approach to integer programming. CoRR, abs/1111.4649, 2011.Google Scholar
- B. Chazelle. The Discrepancy Method: Randomness and Complexity. Cambridge University Press, 2000. Google ScholarDigital Library
- A. De. Lower bounds in differential privacy. Theory of Cryptography, pages 321--338, 2012. Google ScholarDigital Library
- B. Ding, M. Winslett, J. Han, and Z. Li. Differentially private data cubes: optimizing noise sources and consistency. In SIGMOD Conference, pages 217--228, 2011. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In Proc.\ $22$nd PODS, pages 202--210. ACM, 2003. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Proc.\ $25$th EUROCRYPT, pages 486--503. Springer, 2006. Google ScholarDigital Library
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation, 2006.Google Scholar
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. Google ScholarDigital Library
- C. Dwork, F. McSherry, and K. Talwar. The price of privacy and the limits of LP decoding. In Proc.\ $39$th STOC, pages 85--94. ACM, 2007. Google ScholarDigital Library
- C. Dwork, M. Naor, O. Reingold, G. N. Rothblum, and S. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the 41st annual ACM symposium on Theory of computing, STOC '09, pages 381--390, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- C. Dwork, G. N. Rothblum, and S. Vadhan. Boosting and differential privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS '10, pages 51--60, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- C. Dwork and S. Yekhanin. New efficient attacks on statistical disclosure control mechanisms. In Proc. 28th CRYPTO, pages 469--480. Springer, 2008. Google ScholarDigital Library
- N. Fawaz, S. Muthukrishnan, and A. Nikolov. Nearly optimal private convolutions. unpublished manuscript.Google Scholar
- M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval research logistics quarterly, 3(1--2):95--110, 1956.Google Scholar
- A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, pages 351--360, 2009. Google ScholarDigital Library
- E. Gluskin. Extremal properties of orthogonal parallelepipeds and their applications to the geometry of banach spaces. Mathematics of the USSR-Sbornik, 64(1):85, 2007.Google ScholarCross Ref
- A. Gupta, M. Hardt, A. Roth, and J. Ullman. Privately releasing conjunctions and the statistical query barrier. In STOC, pages 803--812, 2011. Google ScholarDigital Library
- A. Gupta, A. Roth, and J. Ullman. Iterative constructions and private data release. In TCC, pages 339--356, 2012. Google ScholarDigital Library
- M. Gupte and M. Sundararajan. Universally optimal privacy mechanisms for minimax agents. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '10, pages 135--146, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- M. Hardt, K. Ligett, and F. McSherry. A simple and practical algorithm for differentially private data release. In NIPS, 2012. To appear.Google ScholarDigital Library
- M. Hardt and G. N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS '10, pages 61--70, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- M. Hardt and K. Talwar. On the geometry of differential privacy. In Proceedings of the 42nd ACM symposium on Theory of computing, STOC '10, pages 705--714, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 3(1):1021--1032, 2010. Google ScholarDigital Library
- F. John. Extremum problems with inequalities as subsidiary conditions. In Studies and Essays presented to R. Courant on his 60th Birthday, pages 187--204, 1948.Google Scholar
- S. Kasiviswanathan, M. Rudelson, and A. Smith. The power of linear reconstruction attacks. In SODA, 2013. To appear.Google ScholarDigital Library
- S. P. Kasiviswanathan, M. Rudelson, A. Smith, and J. Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In Proceedings of the 42nd ACM symposium on Theory of computing, STOC '10, pages 775--784, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- K. G. Larsen. On range searching in the group model and combinatorial discrepancy. In FOCS, pages 542--549, 2011. Google ScholarDigital Library
- C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '10, pages 123--134, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- C. Li and G. Miklau. An adaptive mechanism for accurate query answering under differential privacy. PVLDB, 5(6):514--525, 2012. Google ScholarDigital Library
- C. Li and G. Miklau. Measuring the achievable error of query sets under differential privacy. CoRR, abs/1202.3399, 2012.\newpageGoogle Scholar
- L. Lovász, J. Spencer, and K. Vesztergombi. Discrepancy of set-systems and matrices. European Journal of Combinatorics, 7(2):151--160, 1986. Google ScholarDigital Library
- J. Matousek. Geometric Discrepancy (An Illustrated Guide). Springer, 1999.Google Scholar
- J. Matousek. The determinant bound for discrepancy is almost tight. http://arxiv.org/abs/1101.0767, 2011.Google Scholar
- S. Muthukrishnan and A. Nikolov. Optimal private halfspace counting via discrepancy. In Proceedings of the 44th symposium on Theory of Computing, STOC '12, pages 1285--1292, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Nikolov, K. Talwar, and L. Zhang. The geometry of differential privacy: the sparse and approximate cases. CoRR, abs/1212.0297, 2012.Google Scholar
- K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75--84, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- G. Raskutti, M. Wainwright, and B. Yu. Minimax rates of estimation for high-dimensional linear regression over $\ell_q$-balls. Information Theory, IEEE Transactions on, 57(10):6976--6994, 2011. Google ScholarDigital Library
- V. Rastogi, S. Hong, and D. Suciu. The boundary between privacy and utility in data publishing. In VLDB, pages 531--542, 2007. Google ScholarDigital Library
- A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the 42nd ACM symposium on Theory of computing, STOC '10, pages 765--774, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- R. Vershynin. John's decompositions: Selecting a large part. Israel Journal of Mathematics, 122(1):253--277, 2001.Google ScholarCross Ref
- X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, pages 225--236, 2010.Google ScholarCross Ref
- Y. Xiao, L. Xiong, and C. Yuan. Differentially private data release through multidimensional partitioning. In Secure Data Management, pages 150--168, 2010. Google ScholarCross Ref
- G. Yuan, Z. Zhang, M. Winslett, X. Xiao, Y. Yang, and Z. Hao. Low-rank mechanism: Optimizing batch queries under differential privacy. PVLDB, 5(11):1352--1363, 2012. Google ScholarDigital Library
Index Terms
- The geometry of differential privacy: the sparse and approximate cases
Recommendations
On the geometry of differential privacy
STOC '10: Proceedings of the forty-second ACM symposium on Theory of computingWe consider the noise complexity of differentially private mechanisms in the setting where the user asks d linear queries f:Rn -> R non-adaptively. Here, the database is represented by a vector in R and proximity between databases is measured in the l1-...
The Geometry of Differential Privacy: The Small Database and Approximate Cases
In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries and has been a focus of a long line of work. For a given set ...
Optimal private halfspace counting via discrepancy
STOC '12: Proceedings of the forty-fourth annual ACM symposium on Theory of computingA range counting problem is specified by a set P of size |P| = n of points in Rd, an integer weight xp associated to each point p ∈ P, and a range space R ⊆ 2P. Given a query range R ∈ R, the output is R(x) = ∑p ∈ Rxp. The average squared error of an ...
Comments