ABSTRACT
Today there is a strong interest in publishing set-valued data in a privacy-preserving manner. Such data associate individuals to sets of values (e.g., preferences, shopping items, symptoms, query logs). In addition, an individual can be associated with a sensitive label (e.g., marital status, religious or political conviction). Anonymizing such data implies ensuring that an adversary should not be able to (1) identify an individual's record, and (2) infer a sensitive label, if such exists. Existing research on this problem either perturbs the data, publishes them in disjoint groups disassociated from their sensitive labels, or generalizes their values by assuming the availability of a generalization hierarchy. In this paper, we propose a novel alternative. Our publication method also puts data in a generalized form, but does not require that published records form disjoint groups and does not assume a hierarchy either; instead, it employs generalized bitmaps and recasts data values in a nonreciprocal manner; formally, the bipartite graph from original to anonymized records does not have to be composed of disjoint complete subgraphs. We configure our schemes to provide popular privacy guarantees while resisting attacks proposed in recent research, and demonstrate experimentally that we gain a clear utility advantage over the previous state of the art.
Supplemental Material
- C. C. Aggarwal and P. S. Yu. On privacy-preservation of text and sparse binary data with sketches. In SDM, 2007.Google ScholarCross Ref
- S. Agrawal, J. R. Haritsa, and B. A. Prakash. FRAPP: A framework for high-accuracy privacy-preserving mining. Data Min. Knowl. Discov., 18(1):101--139, 2009. Google ScholarDigital Library
- J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In KDD, 2008. Google ScholarDigital Library
- J. Cao, P. Karras, C. Raïssi, and K.-L. Tan. ρ-uncertainty: Inference-proof transaction anonymization. PVLDB, 3(1):1033--1044, 2010. Google ScholarDigital Library
- G. Cormode, N. Li, T. Li, and D. Srivastava. Minimizing minimality and maximizing utility: Analyzing method-based attacks on anonymized data. PVLDB, 3(1):1045--1056, 2010. Google ScholarDigital Library
- E. Dasseni, V. S. Verykios, A. K. Elmagarmid, and E. Bertino. Hiding association rules by using confidence and support. In IHW, 2001. Google ScholarDigital Library
- A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003. Google ScholarDigital Library
- A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In KDD, 2002. Google ScholarDigital Library
- G. Ghinita, P. Kalnis, and Y. Tao. Anonymous publication of sensitive transactional data. IEEE TKDE, 23(2):161--174, 2011. Google ScholarDigital Library
- G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. A framework for efficient data anonymization under privacy and accuracy constraints. ACM TODS, 34(2):1--47, 2009. Google ScholarDigital Library
- G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, 2008. Google ScholarDigital Library
- A. Gionis, A. Mazza, and T. Tassa. κ-anonymization revisited. In ICDE, 2008. Google ScholarDigital Library
- F. Gray. Pulse code communication. US Patent 2632058, 1953.Google Scholar
- Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1):934--945, 2009. Google ScholarDigital Library
- Y. Hong, X. He, J. Vaidya, N. R. Adam, and V. Atluri. Effective anonymization of query logs. In CIKM, 2009. Google ScholarDigital Library
- D. Kifer. Attacks on privacy and deFinetti's theorem. In SIGMOD, 2009. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale datasets. ACM TODS, 33(3):17:1--17:47, 2008. Google ScholarDigital Library
- A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond κ-anonymity. ACM TKDD, 1(1):3, 2007. Google ScholarDigital Library
- S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In VLDB, 2002. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 13(6):1010--1027, 2001. Google ScholarDigital Library
- Y. Saygin, V. S. Verykios, and C. Clifton. Using unknowns to prevent discovery of association rules. SIGMOD Rec., 30(4):45--54, 2001. Google ScholarDigital Library
- H. Sengoku and I. Yoshihara. A fast TSP solver using GA on JAVA. In AROB, 1998.Google Scholar
- M. Terrovitis, N. Mamoulis, and P. Kalnis. Local and global recoding methods for anonymizing set-valued data. The VLDB Journal, 20(1):83--106, 2011. Google ScholarDigital Library
- W. K. Wong, N. Mamoulis, and D. W. L. Cheung. Non-homogeneous generalization in privacy preserving data publishing. In SIGMOD, 2010. Google ScholarDigital Library
- Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, 2008. Google ScholarDigital Library
Index Terms
- Anonymizing set-valued data by nonreciprocal recoding
Recommendations
Local and global recoding methods for anonymizing set-valued data
In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all ...
Anonymizing data with quasi-sensitive attribute values
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementWe study the problem of anonymizing data with quasi-sensitive attributes. Quasi-sensitive attributes are not sensitive by themselves, but certain values or their combinations may be linked to external knowledge to reveal indirect sensitive information ...
A Survey on Privacy Preserving Dynamic Data Publishing
Many organizations, especially small and medium business SMB enterprises require the collection and sharing of data containing personal information. The privacy of this data must be preserved before outsourcing to the commercial public. Privacy ...
Comments