skip to main content
research-article

Privacy preservation by disassociation

Published:01 June 2012Publication History
Skip Abstract Section

Abstract

In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniques (a) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.

References

  1. C. Aggarwal. On k-anonymity and the curse of dimensionality. In VLDB, pp. 901--909, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Anonymity preserving pattern discovery. VLDB Journal, 17(4): 703--727, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. New York Times, 2006.Google ScholarGoogle Scholar
  4. T. Burghardt, K. Böhm, A. Guttmann, and C. Clifton. Anonymous search histories featuring personalized advertisement - balancing privacy with economic interests. TDP, 4(1): 31--50, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cao, P. Karras, C. Raissi, and K.-L. Tan. ρ-uncertainty: inference-proof transaction anonymization. PVLDB, 3(1-2): 1033--1044, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Chen, M. Noman, B. C. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. PVLDB, 4(11): 1087--1098, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Ciriani, S. D. C. di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati. Combining fragmentation and encryption to protect privacy in data storage. TISSEC, 13(3): 1--33, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. PVLDB, 1(1): 833--844, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pp. 864--875, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, pp. 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, pp. 715--724, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In VLDB, pp. 420--431, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1): 934--945, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In WWW, pp. 171--180, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: efficient full-domain k-anonymity. In SIGMOD, pp. 49--60, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, pp. 25, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Li, R. C.-W Wong, A. W.-C. Fu, and J. Pei. Anonymization by local recoding in data with attribute hierarchical taxonomies. TKDE, 20(9): 1181--1194, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Li, N. Li, J. Zhang, and I. Molloy. Slicing: a new approach to privacy preserving data publishing. TKDE, 24(3): 561--574, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Loukides, A. Gkoulalas-Divanis, and B. Malin. Anonymization of electronic medical records for validating genome-wide association studies. PNAS, 17: 7898--7903, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: privacy beyond k-anonymity. In ICDE, pp. 24, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Nergiz and C. Clifton. Thoughts on k-anonymization. DKE, 63(3): 622--645, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. In ICDE, pp. 1417--1421, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  23. Netflix Prize FAQ. http://www.netflixprize.com/faq, 2009.Google ScholarGoogle Scholar
  24. H. Pang, X. Ding, and X. Xiao. Embellishing text search queries to protect user privacy. PVLDB, 3(1--2): 598--607, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Samarati. Protecting respondents' identities in microdata release. TKDE, 13(6): 1010--1027, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Sweeney. k-anonymity: a model for protecting privacy. IJUFKS, 10(5): 557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. PVLDB, 1(1): 115--125, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Terrovitis, N. Mamoulis, and P. Kalnis. Local and global recoding methods for anonymizing set-valued data. VLDB Journal, 20(1): 83--106, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In CIKM, pp. 483--490, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Xiao and Y Tao. Anatomy: simple and effective privacy preservation. In VLDB, pp. 139--150, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, pp. 767--775, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: how to hide a mob in a crowd? In EDBT, pp. 72--83, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD, pp. 401--406, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 10
    June 2012
    180 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 June 2012
    Published in pvldb Volume 5, Issue 10

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader