skip to main content
10.1145/1989323.1989347acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Differentially private data cubes: optimizing noise sources and consistency

Authors Info & Claims
Published:12 June 2011Publication History

ABSTRACT

Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)^2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.

References

  1. www.cs.cmu.edu/~compthink/mindswaps/oct07/difpriv.ppt. 2007.Google ScholarGoogle Scholar
  2. N. R. Adam and J. C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv., 21(4):515--556, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving OLAP. In SIGMOD, pages 251--262, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS, pages 273--282, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, pages 503--512, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289--296, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge Univ. Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Dwork. The differential privacy frontier (extended abstract). In TCC, pages 496--502, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In STOC, pages 361--370, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, pages 493--502, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv., 42(4), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, pages 351--360, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Publishing search logs - a comparative study of privacy guarantees. TKDE, 2011.Google ScholarGoogle Scholar
  19. M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially-private queries through consistency. In PVLDB, pages 1021--1032, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 531--540, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kifer. Attacks on privacy and de Finetti's theorem. In SIGMOD, pages 127--138, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pages 171--180, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing histogram queries under differential privacy. In PODS, pages 123--134, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Li, J. Han, and H. Gonzalez. High-dimensional OLAP: A minimal cubing approach. In VLDB, pages 528--539, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Machanavajjhala, J. Gehrke, D. Kifer, andM. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, page~24, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277--286, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, pages 19--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the Netflix prize contenders. In KDD, pages 627--636, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75--84, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, pages 735--746, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In PODS, page 188, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. D. Silvey. Statistical Inference. Chapman-Hall, 1975.Google ScholarGoogle Scholar
  34. L. Wang, S. Jajodia, and D. Wijesekera. Preserving privacy in on-line analytical processing data cubes. In Secure Data Management in Decentralized Systems, pages 355--380. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  35. R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, pages 225--236, 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Differentially private data cubes: optimizing noise sources and consistency

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
          June 2011
          1364 pages
          ISBN:9781450306614
          DOI:10.1145/1989323

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader