Skip to main content
Log in

PCPs and the Hardness of Generating Synthetic Data

  • Published:
Journal of Cryptology Aims and scope Submit manuscript

Abstract

Assuming the existence of one-way functions, we show that there is no polynomial-time differentially private algorithm \({\mathcal {A}}\) that takes a database \(D\in (\{0,1\}^d)^n\) and outputs a “synthetic database” \({\hat{D}}\) all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows \(x\in \{0,1\}^d\) with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time \(\mathrm {poly}(n,2^d)\). Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on the PCP theorem. We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in D satisfying a predicate c are estimated by applying c to each row of \({\hat{D}}\) and aggregating the results in some way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Technically, this “real database” may assign fractional weight to some rows.

  2. Recall that a 2-way marginal c(D) computes the fraction of database rows satisfying a conjunction of two literals, i.e., the fraction of rows \(x_i\in \{0,1\}^d\) such that \(x_{i,j}=b\) and \(x_{i,j'}=b'\) for some columns \(j,j'\in [d]\) and values \(b,b'\in \{0,1\}\).

  3. This result also does not explicitly state the existence of the efficient encoder, but the proof of the completeness property gives an efficient algorithm to generate the correct proof \(\pi \).

  4. Theorem 3.6 guarantees only \((\gamma -\delta ,\gamma )\)-hard-to-approximate, but it can be verified that the corresponding PCPs have “perfect completeness” so disjunctions of 3 literals are \((1-\delta , 1)\)-hard-to-approximate.

  5. Given two vectors \(a = (a_1, \dots , a_n)\) and \(b = (b_1, \dots , b_n)\) we say \(b \succeq a\) iff \(b_i \ge a_i\) for every \(i \in [n]\). We say a function \(f: \{0,1\}^n \rightarrow [0,1]\) is monotone if \(b \succeq a \Longrightarrow f(b) \ge f(a)\).

  6. In the preliminaries, we define a predicate to be a \(\{0,1\}\)-valued function, but our definition naturally generalizes to \(\{-1,1\}\)-valued functions. For \(c: \{0,1\}^d\rightarrow \{-1,1\}\) and database \(D= (x_{1}, \dots , x_{n}) \in (\{0,1\}^d)^n\), we define \(c(D) = \frac{1}{n} \sum _{i=1}^{n} c(x_{i})\).

  7. One form of the Chernoff–Hoeffding bound states if \(X_1, \dots , X_n\) are independent random variables over [0, 1] and \(X = (1/n) \sum _{i=1}^{n}\) then \(\Pr [|X - \mathop {{\mathbb {E}}}[X]| \ge t] < 2\exp (-2nt^2)\) [12].

References

  1. M. Alekhnovich, M. Braverman, V. Feldman, A. R. Klivans, T. Pitassi, The complexity of properly learning simple concept classes. in J. Comput. Syst. Sci., 74, 16–34, (2008)

    Article  MathSciNet  Google Scholar 

  2. L. Babai, L. Fortnow, L.A. Levin, M. Szegedy, Checking computations in polylogarithmic time. in STOC, 21–31, (1991)

  3. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, K. Talwar, Privacy, accuracy, and consistency too: A holistic solution to contingency table release. in Proceedings of the 26th Symposium on Principles of Database Systems, pp. 273–282, (2007)

  4. B. Barak, O. Goldreich, Universal arguments and their applications. in SIAM J. Comput., vol 38, 1661–1694, (2008)

    Article  MathSciNet  Google Scholar 

  5. E. Ben-Sasson, O. Goldreich, P. Harsha, M. Sudan, S.P. Vadhan, Robust pcps of proximity, shorter pcps, and applications to coding. in SIAM J. Comput., vol. 36, pages 889–974, (2006)

    Article  MathSciNet  Google Scholar 

  6. A. Blum, C. Dwork, F. McSherry, K. Nissim, Practical privacy: The SuLQ framework. in Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, (June 2005)

  7. A. Blum, K. Ligett, A. Roth, A learning theory approach to non-interactive database privacy. in Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing, (2008)

  8. K. Chandrasekaran, J. Thaler, J. Ullman, A. Wan, Faster private release of marginals on small databases. in ITCS (ACM, New York, 2014), pp. 387–402

  9. N. Creignou, A dichotomy theorem for maximum generalized satisfiability problems. In J. Comput. Syst. Sci., volume 51, pages 511–522, (1995)

    Article  MathSciNet  Google Scholar 

  10. I. Dinur, K. Nissim, Revealing information while preserving privacy. in Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210, (2003)

  11. I. Dinur, O. Reingold, Assignment testers: Towards a combinatorial proof of the pcp theorem. in SIAM J. Comput., volume 36, pages 975–1024, (2006)

    Article  MathSciNet  Google Scholar 

  12. D.P. Dubhashi, S. Sen, Concentration of measure for randomized algorithms: techniques and applications. in Handbook of Randomized Algorithms, (2001)

  13. C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284, (2006)

  14. C. Dwork, M. Naor, O. Reingold, G. Rothblum, S. Vadhan, When and how can privacy-preserving data release be done efficiently? In Proceedings of the 2009 International ACM Symposium on Theory of Computing (STOC), (2009)

  15. C. Dwork, A. Nikolov, K. Talwar, Using convex relaxations for efficiently and privately releasing marginals. in Proceedings of the thirtieth annual symposium on Computational geometry (ACM, New York, 2014), pp. 261

  16. C. Dwork, K. Nissim, Privacy-preserving datamining on vertically partitioned databases. In Proceedings of CRYPTO 2004, vol. 3152, pp. 528–544, (2004)

  17. C. Dwork, A. Roth, The algorithmic foundations of differential privacy, (2014)

  18. C. Dwork, G. Rothblum, S.P. Vadhan, Boosting and differential privacy. in Proceedings of FOCS 2010, (2010)

  19. V. Feldman, Hardness of proper learning. in The Encyclopedia of Algorithms (Springer, New York, 2008)

  20. V. Feldman, Hardness of approximate two-level logic minimization and PAC learning with membership queries. Journal of Computer and System Sciences, 75(1):13–26, (2009)

    Article  MathSciNet  Google Scholar 

  21. O. Goldreich, Foundations of Cryptography, volume 2. Cambridge University Press, (2004)

    Book  Google Scholar 

  22. M. Hardt, G.N. Rothblum, R.A. Servedio, Private data release via learning thresholds. in SODA (SIAM, New York, 2012), pp. 168–187

  23. J. Håstad, Some optimal inapproximability results. in J. ACM, volume 48, pages 798–859, (2001)

    Article  MathSciNet  Google Scholar 

  24. J. Justesen, On the complexity of decoding reed-solomon codes (corresp). IEEE Trans. Inf. Theory 22(2):237–238 (1976)

    Article  MathSciNet  Google Scholar 

  25. M.J. Kearns, L.G. Valiant, Cryptographic limitations on learning boolean formulae and finite automata. in J. ACM, volume 41, pages 67–95, (1994)

    Article  MathSciNet  Google Scholar 

  26. S. Khanna, M. Sudan, L. Trevisan, D.P. Williamson, The approximability of constraint satisfaction problems. in SIAM J. Comput., vol. 30, pp. 1863–1920, (2000)

    Article  MathSciNet  Google Scholar 

  27. J. Kilian, A note on efficient zero-knowledge proofs and arguments (extended abstract). in STOC, (1992)

  28. V. Lyubashevsky, D. Micciancio, Asymptotically efficient lattice-based digital signatures. In R. Canetti, editor, TCC, volume 4948 of Lecture Notes in Computer Science (Springer, Berlin, 2008), pp. 37–54

  29. S. Micali, Computationally sound proofs. in SIAM J. Comput., volume 30, pages 1253–1298, (2000)

    Article  MathSciNet  Google Scholar 

  30. M. Naor, M. Yung, Universal one-way hash functions and their cryptographic applications. in STOC, pp. 33–43, (1989)

  31. C.H. Papadimitriou, M. Yannakakis, Optimization, approximation, and complexity classes. in J. Comput. Syst. Sci., volume 43, pages 425–440, (1991)

    Article  MathSciNet  Google Scholar 

  32. L. Pitt, L.G. Valiant, Computational limitations on learning from examples. in J. ACM, volume 35, pages 965–984, (1988)

    Article  MathSciNet  Google Scholar 

  33. J.P. Reiter, J. Drechsler, Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality. Iab discussion paper, Intitut für Arbeitsmarkt und Berufsforschung (IAB), Nürnberg (Institute for Employment Research, Nuremberg, Germany), (2007)

  34. J. Rompel, One-way functions are necessary and sufficient for secure signatures. in STOC, pp. 387–394, (1990)

  35. A. Roth, T. Roughgarden, Interactive privacy via the median mechanism. in STOC 2010, (2010)

  36. D.A. Spielman, Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory, 42(6):1723–1731, (1996)

    Article  MathSciNet  Google Scholar 

  37. J. Thaler, J. Ullman, S.P. Vadhan, Faster algorithms for privately releasing marginals. in ICALP (1) (Springer, Berlin, 2012), pp. 810–821

  38. L.G. Valiant, A theory of the learnable. Communications of the ACM, 27(11):1134–1142, (1984)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Boaz Barak, Irit Dinur, Cynthia Dwork, Vitaly Feldman, Oded Goldreich, Johan Håstad, Valentine Kabanets, Dana Moshkovitz, Anup Rao, Guy Rothblum, and Les Valiant for helpful conversations. We are also grateful to the anonymous referees for helpful comments on the presentation of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan Ullman.

Additional information

Communicated by Ran Canetti.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared in the Theory of Cryptography Conference 2011.

J. Ullman: This work was done while the author was in the Harvard John A. Paulson School of Engineering and Applied Sciences. Supported by NSF Grant CNS-0831289.

S. Vadhan: Supported by NSF Grant CNS-0831289.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ullman, J., Vadhan, S. PCPs and the Hardness of Generating Synthetic Data. J Cryptol 33, 2078–2112 (2020). https://doi.org/10.1007/s00145-020-09363-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00145-020-09363-y

Navigation