ABSTRACT
The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4, However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
- R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic databases. In Proc. of the 28th International Conference on Very Large Databases, 143--154, 2002. Google ScholarDigital Library
- R. Agrawal and S. Ramakrishnan. Privacy Preserving Data Mining. In Proc. of ACM International Conference on Management of Data, 439--450, 2000. Google ScholarDigital Library
- D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proc. of ACM Symposium on Principles of Database Systems, 2001. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing Information while Preserving Privacy. In Proc. of ACM Symposium on Principles of Database Systems, 202--210, 2003. Google ScholarDigital Library
- A. Evfimievski, J. E. Gehrke, and R. Srikant. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proc. of ACM Symposium on Principles of Database Systems, 211--222, 2003. Google ScholarDigital Library
- D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences 9:256--278, 1974.Google ScholarDigital Library
- J. Kleinberg, C. Papadimitriou, P. Raghavan. Auditing Boolean Attributes. In Proc. of ACM Symposium on Principles of Database Systems, 86--91, 2000. Google ScholarDigital Library
- L. Sweeney. Optimal anonymity using k-similar, a new clustering algorithm. Under review, 2003.Google Scholar
- L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty,Fuzziness and Knowledge-based Systems 10(5), 557--570, 2002. Google ScholarDigital Library
- P. Samarati and L. Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In Proc. of ACM Symposium on Principles of Database Systems, 188, 1998. Google ScholarDigital Library
Recommendations
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningPrivacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the ...
Parameterized complexity of k-anonymity: hardness and tractability
The problem of publishing personal data without giving up privacy is becoming increasingly important. A precise formalization that has been recently proposed is the k-anonymity, where the rows of a table are partitioned into clusters of sizes at least k ...
k-anonymity: a model for protecting privacy
Consider a data holder, such as a hospital or a bank, that has a privately held collection of person-specific, field structured data. Suppose the data holder wants to share a version of the data with researchers. How can a data holder release a version ...
Comments