PCPs and the Hardness of Generating Synthetic Data

Ullman, Jonathan; Vadhan, Salil

doi:10.1007/s00145-020-09363-y

PCPs and the Hardness of Generating Synthetic Data

Published: 31 July 2020

Volume 33, pages 2078–2112, (2020)
Cite this article

Journal of Cryptology Aims and scope Submit manuscript

Jonathan Ullman¹ &
Salil Vadhan²

492 Accesses
2 Citations
Explore all metrics

Abstract

Assuming the existence of one-way functions, we show that there is no polynomial-time differentially private algorithm \({\mathcal {A}}\) that takes a database \(D\in (\{0,1\}^d)^n\) and outputs a “synthetic database” \({\hat{D}}\) all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows \(x\in \{0,1\}^d\) with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time \(\mathrm {poly}(n,2^d)\). Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on the PCP theorem. We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in D satisfying a predicate c are estimated by applying c to each row of \({\hat{D}}\) and aggregating the results in some way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hardness of Non-interactive Differential Privacy from One-Way Functions

Strong Hardness of Privacy from Weak Traitor Tracing

Universal Samplers with Fast Verification

Notes

Technically, this “real database” may assign fractional weight to some rows.
Recall that a 2-way marginal c(D) computes the fraction of database rows satisfying a conjunction of two literals, i.e., the fraction of rows \(x_i\in \{0,1\}^d\) such that \(x_{i,j}=b\) and \(x_{i,j'}=b'\) for some columns \(j,j'\in [d]\) and values \(b,b'\in \{0,1\}\).
This result also does not explicitly state the existence of the efficient encoder, but the proof of the completeness property gives an efficient algorithm to generate the correct proof \(\pi \).
Theorem 3.6 guarantees only \((\gamma -\delta ,\gamma )\)-hard-to-approximate, but it can be verified that the corresponding PCPs have “perfect completeness” so disjunctions of 3 literals are \((1-\delta , 1)\)-hard-to-approximate.
Given two vectors \(a = (a_1, \dots , a_n)\) and \(b = (b_1, \dots , b_n)\) we say \(b \succeq a\) iff \(b_i \ge a_i\) for every \(i \in [n]\). We say a function \(f: \{0,1\}^n \rightarrow [0,1]\) is monotone if \(b \succeq a \Longrightarrow f(b) \ge f(a)\).
In the preliminaries, we define a predicate to be a \(\{0,1\}\)-valued function, but our definition naturally generalizes to \(\{-1,1\}\)-valued functions. For \(c: \{0,1\}^d\rightarrow \{-1,1\}\) and database \(D= (x_{1}, \dots , x_{n}) \in (\{0,1\}^d)^n\), we define \(c(D) = \frac{1}{n} \sum _{i=1}^{n} c(x_{i})\).
One form of the Chernoff–Hoeffding bound states if \(X_1, \dots , X_n\) are independent random variables over [0, 1] and \(X = (1/n) \sum _{i=1}^{n}\) then \(\Pr [|X - \mathop {{\mathbb {E}}}[X]| \ge t] < 2\exp (-2nt^2)\) [12].

References

M. Alekhnovich, M. Braverman, V. Feldman, A. R. Klivans, T. Pitassi, The complexity of properly learning simple concept classes. in J. Comput. Syst. Sci., 74, 16–34, (2008)
Article MathSciNet Google Scholar
L. Babai, L. Fortnow, L.A. Levin, M. Szegedy, Checking computations in polylogarithmic time. in STOC, 21–31, (1991)
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, K. Talwar, Privacy, accuracy, and consistency too: A holistic solution to contingency table release. in Proceedings of the 26th Symposium on Principles of Database Systems, pp. 273–282, (2007)
B. Barak, O. Goldreich, Universal arguments and their applications. in SIAM J. Comput., vol 38, 1661–1694, (2008)
Article MathSciNet Google Scholar
E. Ben-Sasson, O. Goldreich, P. Harsha, M. Sudan, S.P. Vadhan, Robust pcps of proximity, shorter pcps, and applications to coding. in SIAM J. Comput., vol. 36, pages 889–974, (2006)
Article MathSciNet Google Scholar
A. Blum, C. Dwork, F. McSherry, K. Nissim, Practical privacy: The SuLQ framework. in Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, (June 2005)
A. Blum, K. Ligett, A. Roth, A learning theory approach to non-interactive database privacy. in Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing, (2008)
K. Chandrasekaran, J. Thaler, J. Ullman, A. Wan, Faster private release of marginals on small databases. in ITCS (ACM, New York, 2014), pp. 387–402
N. Creignou, A dichotomy theorem for maximum generalized satisfiability problems. In J. Comput. Syst. Sci., volume 51, pages 511–522, (1995)
Article MathSciNet Google Scholar
I. Dinur, K. Nissim, Revealing information while preserving privacy. in Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210, (2003)
I. Dinur, O. Reingold, Assignment testers: Towards a combinatorial proof of the pcp theorem. in SIAM J. Comput., volume 36, pages 975–1024, (2006)
Article MathSciNet Google Scholar
D.P. Dubhashi, S. Sen, Concentration of measure for randomized algorithms: techniques and applications. in Handbook of Randomized Algorithms, (2001)
C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284, (2006)
C. Dwork, M. Naor, O. Reingold, G. Rothblum, S. Vadhan, When and how can privacy-preserving data release be done efficiently? In Proceedings of the 2009 International ACM Symposium on Theory of Computing (STOC), (2009)
C. Dwork, A. Nikolov, K. Talwar, Using convex relaxations for efficiently and privately releasing marginals. in Proceedings of the thirtieth annual symposium on Computational geometry (ACM, New York, 2014), pp. 261
C. Dwork, K. Nissim, Privacy-preserving datamining on vertically partitioned databases. In Proceedings of CRYPTO 2004, vol. 3152, pp. 528–544, (2004)
C. Dwork, A. Roth, The algorithmic foundations of differential privacy, (2014)
C. Dwork, G. Rothblum, S.P. Vadhan, Boosting and differential privacy. in Proceedings of FOCS 2010, (2010)
V. Feldman, Hardness of proper learning. in The Encyclopedia of Algorithms (Springer, New York, 2008)
V. Feldman, Hardness of approximate two-level logic minimization and PAC learning with membership queries. Journal of Computer and System Sciences, 75(1):13–26, (2009)
Article MathSciNet Google Scholar
O. Goldreich, Foundations of Cryptography, volume 2. Cambridge University Press, (2004)
Book Google Scholar
M. Hardt, G.N. Rothblum, R.A. Servedio, Private data release via learning thresholds. in SODA (SIAM, New York, 2012), pp. 168–187
J. Håstad, Some optimal inapproximability results. in J. ACM, volume 48, pages 798–859, (2001)
Article MathSciNet Google Scholar
J. Justesen, On the complexity of decoding reed-solomon codes (corresp). IEEE Trans. Inf. Theory 22(2):237–238 (1976)
Article MathSciNet Google Scholar
M.J. Kearns, L.G. Valiant, Cryptographic limitations on learning boolean formulae and finite automata. in J. ACM, volume 41, pages 67–95, (1994)
Article MathSciNet Google Scholar
S. Khanna, M. Sudan, L. Trevisan, D.P. Williamson, The approximability of constraint satisfaction problems. in SIAM J. Comput., vol. 30, pp. 1863–1920, (2000)
Article MathSciNet Google Scholar
J. Kilian, A note on efficient zero-knowledge proofs and arguments (extended abstract). in STOC, (1992)
V. Lyubashevsky, D. Micciancio, Asymptotically efficient lattice-based digital signatures. In R. Canetti, editor, TCC, volume 4948 of Lecture Notes in Computer Science (Springer, Berlin, 2008), pp. 37–54
S. Micali, Computationally sound proofs. in SIAM J. Comput., volume 30, pages 1253–1298, (2000)
Article MathSciNet Google Scholar
M. Naor, M. Yung, Universal one-way hash functions and their cryptographic applications. in STOC, pp. 33–43, (1989)
C.H. Papadimitriou, M. Yannakakis, Optimization, approximation, and complexity classes. in J. Comput. Syst. Sci., volume 43, pages 425–440, (1991)
Article MathSciNet Google Scholar
L. Pitt, L.G. Valiant, Computational limitations on learning from examples. in J. ACM, volume 35, pages 965–984, (1988)
Article MathSciNet Google Scholar
J.P. Reiter, J. Drechsler, Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Iab discussion paper, Intitut für Arbeitsmarkt und Berufsforschung (IAB), Nürnberg (Institute for Employment Research, Nuremberg, Germany), (2007)
J. Rompel, One-way functions are necessary and sufficient for secure signatures. in STOC, pp. 387–394, (1990)
A. Roth, T. Roughgarden, Interactive privacy via the median mechanism. in STOC 2010, (2010)
D.A. Spielman, Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory, 42(6):1723–1731, (1996)
Article MathSciNet Google Scholar
J. Thaler, J. Ullman, S.P. Vadhan, Faster algorithms for privately releasing marginals. in ICALP (1) (Springer, Berlin, 2012), pp. 810–821
L.G. Valiant, A theory of the learnable. Communications of the ACM, 27(11):1134–1142, (1984)
Article Google Scholar

Download references

Acknowledgements

We thank Boaz Barak, Irit Dinur, Cynthia Dwork, Vitaly Feldman, Oded Goldreich, Johan Håstad, Valentine Kabanets, Dana Moshkovitz, Anup Rao, Guy Rothblum, and Les Valiant for helpful conversations. We are also grateful to the anonymous referees for helpful comments on the presentation of this work.

Author information

Authors and Affiliations

Khoury College of Computer Science, Northeastern University, Boston, MA, USA
Jonathan Ullman
Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, MA, USA
Salil Vadhan

Authors

Jonathan Ullman
View author publications
You can also search for this author in PubMed Google Scholar
Salil Vadhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan Ullman.

Additional information

Communicated by Ran Canetti.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared in the Theory of Cryptography Conference 2011.

J. Ullman: This work was done while the author was in the Harvard John A. Paulson School of Engineering and Applied Sciences. Supported by NSF Grant CNS-0831289.

S. Vadhan: Supported by NSF Grant CNS-0831289.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ullman, J., Vadhan, S. PCPs and the Hardness of Generating Synthetic Data. J Cryptol 33, 2078–2112 (2020). https://doi.org/10.1007/s00145-020-09363-y

Download citation

Received: 08 August 2014
Revised: 01 July 2020
Published: 31 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00145-020-09363-y

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PCPs and the Hardness of Generating Synthetic Data

Abstract

Access this article

Similar content being viewed by others

Hardness of Non-interactive Differential Privacy from One-Way Functions

Strong Hardness of Privacy from Weak Traitor Tracing

Universal Samplers with Fast Verification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

PCPs and the Hardness of Generating Synthetic Data

Abstract

Access this article

Similar content being viewed by others

Hardness of Non-interactive Differential Privacy from One-Way Functions

Strong Hardness of Privacy from Weak Traitor Tracing

Universal Samplers with Fast Verification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation