research-article

Differentially private data cubes: optimizing noise sources and consistency

Authors:
Bolin Ding

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Marianne Winslett

Advanced Digital Sciences Center & University of Illinois at Urbana-Champaign, Singapore, Singapore

Advanced Digital Sciences Center & University of Illinois at Urbana-Champaign, Singapore, Singapore
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Zhenhui Li

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataJune 2011Pages 217–228https://doi.org/10.1145/1989323.1989347

Published:12 June 2011Publication History

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Pages 217–228

ABSTRACT

Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this paper, we address this problem using differential privacy (DP), which provides provable privacy guarantees for individuals by adding noise to query answers. We choose an initial subset of cuboids to compute directly from the fact table, injecting DP noise as usual; and then compute the remaining cuboids from the initial set. Given a fixed privacy guarantee, we show that it is NP-hard to choose the initial set of cuboids so that the maximal noise over all published cuboids is minimized, or so that the number of cuboids with noise below a given threshold (precise cuboids) is maximized. We provide an efficient procedure with running time polynomial in the number of cuboids to select the initial set of cuboids, such that the maximal noise in all published cuboids will be within a factor (ln|L| + 1)^2 of the optimal, where |L| is the number of cuboids to be published, or the number of precise cuboids will be within a factor (1 - 1/e) of the optimal. We also show how to enforce consistency in the published cuboids while simultaneously improving their utility (reducing error). In an empirical evaluation on real and synthetic data, we report the amounts of error of different publishing algorithms, and show that our approaches outperform baselines significantly.

References

www.cs.cmu.edu/~compthink/mindswaps/oct07/difpriv.ppt. 2007.Google Scholar
N. R. Adam and J. C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Comput. Surv., 21(4):515--556, 1989. Google ScholarDigital Library
R. Agrawal, R. Srikant, and D. Thomas. Privacy preserving OLAP. In SIGMOD, pages 251--262, 2005. Google ScholarDigital Library
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS, pages 273--282, 2007. Google ScholarDigital Library
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, pages 503--512, 2010. Google ScholarDigital Library
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, pages 609--618, 2008. Google ScholarDigital Library
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2004. Google ScholarDigital Library
K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic regression. In NIPS, pages 289--296, 2008.Google ScholarDigital Library
D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge Univ. Press, 2009. Google ScholarDigital Library
C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google ScholarDigital Library
C. Dwork. The differential privacy frontier (extended abstract). In TCC, pages 496--502, 2009. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006. Google ScholarDigital Library
D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In STOC, pages 361--370, 2009. Google ScholarDigital Library
A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, pages 493--502, 2010. Google ScholarDigital Library
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv., 42(4), 2010. Google ScholarDigital Library
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, pages 265--273, 2008. Google ScholarDigital Library
A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, pages 351--360, 2009. Google ScholarDigital Library
M. Götz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Publishing search logs - a comparative study of privacy guarantees. TKDE, 2011.Google Scholar
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially-private queries through consistency. In PVLDB, pages 1021--1032, 2010. Google ScholarDigital Library
S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In FOCS, pages 531--540, 2008. Google ScholarDigital Library
D. Kifer. Attacks on privacy and de Finetti's theorem. In SIGMOD, pages 127--138, 2009. Google ScholarDigital Library
A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pages 171--180, 2009. Google ScholarDigital Library
C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing histogram queries under differential privacy. In PODS, pages 123--134, 2010. Google ScholarDigital Library
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115, 2007.Google ScholarCross Ref
X. Li, J. Han, and H. Gonzalez. High-dimensional OLAP: A minimal cubing approach. In VLDB, pages 528--539, 2004. Google ScholarDigital Library
A. Machanavajjhala, J. Gehrke, D. Kifer, andM. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, page~24, 2006. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277--286, 2008. Google ScholarDigital Library
F. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD, pages 19--30, 2009. Google ScholarDigital Library
F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the Netflix prize contenders. In KDD, pages 627--636, 2009. Google ScholarDigital Library
K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75--84, 2007. Google ScholarDigital Library
V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, pages 735--746, 2010. Google ScholarDigital Library
P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In PODS, page 188, 1998. Google ScholarDigital Library
S. D. Silvey. Statistical Inference. Chapman-Hall, 1975.Google Scholar
L. Wang, S. Jajodia, and D. Wijesekera. Preserving privacy in on-line analytical processing data cubes. In Secure Data Management in Decentralized Systems, pages 355--380. 2007.Google ScholarCross Ref
R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, pages 543--554, 2007. Google ScholarDigital Library
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, pages 225--236, 2010.Google ScholarCross Ref

Index Terms

Differentially private data cubes: optimizing noise sources and consistency

Recommendations

An efficient method for maintaining data cubes incrementally

The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data ...
Read More
Space-efficient cubes for OLAP range-sum queries

Data cubes support a powerful data analysis method called the range-sum query. The range-sum query is widely used in finding trends and in discovering relationships among attributes in diverse database applications. A range-sum query computes aggregate ...
Read More
Differentially private multidimensional data publishing

Various organizations collect data about individuals for various reasons, such as service improvement. In order to mine the collected data for useful information, data publishing has become a common practice among those organizations and data analysts, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
General Chair:
Timos Sellis
IMIS/RC Athena
,
Program Chair:
Renée J. Miller
University of Toronto
,
Publications Chairs:
Anastasios Kementsietsidis
IBM T.J. Watson Research Center
,
Yannis Velegrakis
University of Trento
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
OLAP
data cube
differential privacy
private data analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 722
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Differentially private data cubes: optimizing noise sources and consistency

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

An efficient method for maintaining data cubes incrementally

Space-efficient cubes for OLAP range-sum queries

Differentially private multidimensional data publishing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Differentially private data cubes: optimizing noise sources and consistency

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

An efficient method for maintaining data cubes incrementally

Space-efficient cubes for OLAP range-sum queries

Differentially private multidimensional data publishing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media