Abstract
In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniques (a) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.
- C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pp. 901--909, 2005. Google ScholarDigital Library
- M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Anonymity preserving pattern discovery. VLDB Journal, 17(4): 703--727, 2008. Google ScholarDigital Library
- M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. New York Times, 2006.Google Scholar
- T. Burghardt, K. Böhm, A. Guttmann, and C. Clifton. Anonymous search histories featuring personalized advertisement - balancing privacy with economic interests. TDP, 4(1): 31--50, 2011. Google ScholarDigital Library
- J. Cao, P. Karras, C. Raissi, and K.-L. Tan. ρ-uncertainty: inference-proof transaction anonymization. PVLDB, 3(1-2): 1033--1044, 2010. Google ScholarDigital Library
- R. Chen, M. Noman, B. C. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. PVLDB, 4(11): 1087--1098, 2011.Google ScholarDigital Library
- V. Ciriani, S. D. C. di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Combining fragmentation and encryption to protect privacy in data storage. TISSEC, 13(3): 1--33, 2010. Google ScholarDigital Library
- G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. PVLDB, 1(1): 833--844, 2008. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pp. 864--875, 2004. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, pp. 265--284, 2006. Google ScholarDigital Library
- G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, pp. 715--724, 2008. Google ScholarDigital Library
- J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In VLDB, pp. 420--431, 1995. Google ScholarDigital Library
- Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1): 934--945, 2009. Google ScholarDigital Library
- A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pp. 171--180, 2009. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: efficient full-domain k-anonymity. In SIGMOD, pp. 49--60, 2005. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, pp. 25, 2006. Google ScholarDigital Library
- J. Li, R. C.-W Wong, A. W.-C. Fu, and J. Pei. Anonymization by local recoding in data with attribute hierarchical taxonomies. TKDE, 20(9): 1181--1194, 2008. Google ScholarDigital Library
- T. Li, N. Li, J. Zhang, and I. Molloy. Slicing: a new approach to privacy preserving data publishing. TKDE, 24(3): 561--574, 2012. Google ScholarDigital Library
- G. Loukides, A. Gkoulalas-Divanis, and B. Malin. Anonymization of electronic medical records for validating genome-wide association studies. PNAS, 17: 7898--7903, 2010.Google ScholarCross Ref
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: privacy beyond k-anonymity. In ICDE, pp. 24, 2006. Google ScholarDigital Library
- M. Nergiz and C. Clifton. Thoughts on k-anonymization. DKE, 63(3): 622--645, 2007. Google ScholarDigital Library
- M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. In ICDE, pp. 1417--1421, 2007.Google ScholarCross Ref
- Netflix Prize FAQ. http://www.netflixprize.com/faq, 2009.Google Scholar
- H. Pang, X. Ding, and X. Xiao. Embellishing text search queries to protect user privacy. PVLDB, 3(1--2): 598--607, 2010. Google ScholarDigital Library
- P. Samarati. Protecting respondents' identities in microdata release. TKDE, 13(6): 1010--1027, 2001. Google ScholarDigital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. IJUFKS, 10(5): 557--570, 2002. Google ScholarDigital Library
- M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. PVLDB, 1(1): 115--125, 2008. Google ScholarDigital Library
- M. Terrovitis, N. Mamoulis, and P. Kalnis. Local and global recoding methods for anonymizing set-valued data. VLDB Journal, 20(1): 83--106, 2010. Google ScholarDigital Library
- K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In CIKM, pp. 483--490, 1999. Google ScholarDigital Library
- X. Xiao and Y Tao. Anatomy: simple and effective privacy preservation. In VLDB, pp. 139--150, 2006. Google ScholarDigital Library
- Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, pp. 767--775, 2008. Google ScholarDigital Library
- R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: how to hide a mob in a crowd? In EDBT, pp. 72--83, 2009. Google ScholarDigital Library
- Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD, pp. 401--406, 2001. Google ScholarDigital Library
Recommendations
Privacy Preservation Techniques for Sequential Data Releasing
IAIT '21: Proceedings of the 12th International Conference on Advances in Information TechnologyPrivacy violation is a serious issue that must be considered when datasets are released for public use. To address this issue, a well-known privacy preservation model, l-Diversity, is proposed. Unfortunately, l-Diversity is generally proposed to ...
Privacy Preservation through Uniformity
WiSec '18: Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile NetworksInter-vehicle communications disclose rich information about vehicle whereabouts. Pseudonymous authentication secures communication while enhancing user privacy thanks to a set of anonymized certificates, termed pseudonyms. Vehicles switch the ...
t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation
Microaggregation is a technique for disclosure limitation aimed at protecting the privacy of data subjects in microdata releases. It has been used as an alternative to generalization and suppression to generate k-anonymous data sets, where the identity of ...
Comments