research-article

Privacy-Preserving Publishing of Hierarchical Data

Authors:
Ismet Ozalp

Sabanci University, Istanbul, Turkey

Sabanci University, Istanbul, Turkey
View Profile

,
Mehmet Emre Gursoy

University of California Los Angeles

University of California Los Angeles
View Profile

,
Mehmet Ercan Nergiz

Acadsoft Research, Gaziantep, Turkey

Acadsoft Research, Gaziantep, Turkey
View Profile

,
Yucel Saygin

Sabanci University, Istanbul, Turkey

Sabanci University, Istanbul, Turkey
View Profile

Authors Info & Claims

ACM Transactions on Privacy and Security Volume 19 Issue 3Article No.: 7pp 1–29https://doi.org/10.1145/2976738

Published:15 September 2016Publication History

ACM Transactions on Privacy and Security

Abstract

Many applications today rely on storage and management of semi-structured information, for example, XML databases and document-oriented databases. These data often have to be shared with untrusted third parties, which makes individuals’ privacy a fundamental problem. In this article, we propose anonymization techniques for privacy-preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We extend two standards for privacy protection in tabular data (k-anonymity and ℓ-diversity) and apply them to hierarchical data. We present utility-aware algorithms that enforce these definitions of privacy using generalizations and suppressions of data values. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees.

References

Roberto J. Bayardo and Rakesh Agrawal. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, 217--228. Google ScholarDigital Library
Elisa Bertino, Silvana Castano, Elena Ferrari, and Marco Mesiti. 2000. Specifying and enforcing access control policies for XML document sources. World Wide Web 3, 3 (2000), 139--151. Google ScholarDigital Library
Elisa Bertino, Dan Lin, and Wei Jiang. 2008. A survey of quantification of privacy preserving data mining algorithms. In Privacy-Preserving Data Mining. Springer, 183--205.Google Scholar
James Cheng, Ada Wai-Chee Fu, and Jia Liu. 2010. K-isomorphism: Privacy preserving network publication against structural attacks. In Proceedings of the 29th ACM International Conference on Management of Data (SIGMOD 2010). ACM, 459--470. Google ScholarDigital Library
A. Ercument Cicek, Mehmet Ercan Nergiz, and Yucel Saygin. 2014. Ensuring location diversity in privacy-preserving spatio-temporal data publishing. VLDB J. 23, 4 (2014), 609--625. Google ScholarDigital Library
Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. 2002. A fine-grained access control system for XML documents. ACM Trans. Inform. Syst. Secur. 5, 2 (2002), 169--202. Google ScholarDigital Library
Cynthia Dwork. 2008. Differential privacy: A survey of results. In Theory and Applications of Models of Computation. Springer, 1--19. Google ScholarDigital Library
Irini Fundulaki and Maarten Marx. 2004. Specifying access control policies for XML documents with XPath. In Proceedings of the 9th ACM Symposium on Access Control Models and Technologies. ACM, 61--69. Google ScholarDigital Library
Benjamin Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4 (2010), 14. Google ScholarDigital Library
Gabriel Ghinita, Panos Kalnis, and Yufei Tao. 2011. Anonymous publication of sensitive transactional data. IEEE Trans. Knowl. Data Eng. 23, 2 (2011), 161--174. Google ScholarDigital Library
Olga Gkountouna and Manolis Terrovitis. 2015. Anonymizing collections of tree-structured data. (2015).Google Scholar
Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques. Elsevier. Google ScholarDigital Library
Yeye He and Jeffrey F. Naughton. 2009. Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2, 1 (2009), 934--945. Google ScholarDigital Library
Vijay S. Iyengar. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’02). ACM, 279--288. Google ScholarDigital Library
Harold W. Kuhn. 1955. The Hungarian method for the assignment problem. Naval Res. Log. Quart. 2, 1--2 (1955), 83--97.Google ScholarCross Ref
Anders H. Landberg, Kinh Nguyen, Eric Pardede, and J. Wenny Rahayu. 2014. Δ-dependency for privacy-preserving XML data publishing. J. Biomed. Inform. 50 (2014), 77--94.Google ScholarCross Ref
Anders H. Landberg, J. Wenny Rahayu, and Eric Pardede. 2011. n-dependency: Dependency diversity in anatomised microdata tables. Log. J. IGPL 19, 5 (2011), 679--702.Google ScholarCross Ref
Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 24th ACM International Conference on Management of Data (SIGMOD’05). ACM, 49--60. Google ScholarDigital Library
Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, 25--25. Google ScholarDigital Library
Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07). IEEE, 106--115.Google ScholarCross Ref
Junqiang Liu and Ke Wang. 2010. On optimal anonymization for l+-diversity. In Proceedings of the 26th International Conference on Data Engineering (ICDE’10). IEEE, 213--224.Google Scholar
Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In Proceedings of the 27th ACM International Conference on Management of Data (SIGMOD’08). ACM, 93--106. Google ScholarDigital Library
Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007), 3. Google ScholarDigital Library
David J. C. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press. Google ScholarDigital Library
Adam Meyerson and Ryan Williams. 2004. On the complexity of optimal k-anonymity. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems (PODS 2004). ACM, 223--228. Google ScholarDigital Library
James Munkres. 1957. Algorithms for the assignment and transportation problems. J. Soc. Indust. Appl. Math. 5, 1 (1957), 32--38.Google ScholarCross Ref
Mehmet Ercan Nergiz, Maurizio Atzori, and Yucel Saygin. 2008. Towards trajectory anonymization: A generalization-based approach. In Proceedings of the 2008 ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS. ACM, 52--61. Google ScholarDigital Library
Mehmet Ercan Nergiz, Christopher Clifton, and Ahmet Erhan Nergiz. 2009. Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21, 8 (2009), 1104--1117. Google ScholarDigital Library
Mehmet Ercan Nergiz, Acar Tamersoy, and Yucel Saygin. 2011. Instant anonymization. ACM Trans. Database Syst. 36, 1 (2011), 2. Google ScholarDigital Library
David W. Pentico. 2007. Assignment problems: A golden anniversary survey. Eur. J. Operat. Res. 176, 2 (2007), 774--793.Google ScholarCross Ref
Pierangela Samarati and Latanya Sweeney. 1998. Protecting Privacy When Disclosing Information: K-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report. SRI International.Google Scholar
Latanya Sweeney. 2000. Uniqueness of Simple Demographics in the US Population. Technical Report. Carnegie Mellon University.Google Scholar
Latanya Sweeney. 2002a. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzz. Knowl.-Based Syst. 10, 05 (2002), 571--588. Google ScholarDigital Library
Latanya Sweeney. 2002b. k-anonymity: A model for protecting privacy. Int. J. Uncertain., Fuzz. Knowl.-Based Syst. 10, 05 (2002), 557--570. Google ScholarDigital Library
Acar Tamersoy, Grigorios Loukides, Mehmet Ercan Nergiz, Yucel Saygin, and Bradley Malin. 2012. Anonymization of longitudinal electronic medical records. IEEE Trans. Inform. Technol. Biomed. 16, 3 (2012), 413--423. Google ScholarDigital Library
Manolis Terrovitis and Nikos Mamoulis. 2008. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management (MDM’08). IEEE, 65--72. Google ScholarDigital Library
Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2008. Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1, 1 (2008), 115--125. Google ScholarDigital Library
Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2011. Local and global recoding methods for anonymizing set-valued data. VLDB J. 20, 1 (2011), 83--106. Google ScholarDigital Library
Traian Marius Truta and Bindu Vinay. 2006. Privacy protection: P-sensitive k-anonymity property. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06) Workshops. IEEE, 94. Google ScholarDigital Library
B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products. Technometrics 4, 3 (1962), 419--420.Google ScholarCross Ref
Xiaokui Xiao and Yufei Tao. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 139--150. Google ScholarDigital Library
Xiaokui Xiao and Yufei Tao. 2006b. Personalized privacy preservation. In Proceedings of the 25th ACM International Conference on Management of Data (SIGMOD’06). ACM, 229--240. Google ScholarDigital Library
Xiaokui Xiao, Ke Yi, and Yufei Tao. 2010. The hardness and approximation algorithms for l-diversity. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, 135--146. Google ScholarDigital Library
Xiaochun Yang and Chen Li. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB’04). VLDB Endowment, 96--107. Google ScholarDigital Library
Elena Zheleva and Lise Getoor. 2008. Preserving the privacy of sensitive relationships in graph data. In Proceedings of the 2nd International Workshop on Privacy, Security and Trust in KDD (PinKDD’08). Springer, 153--171. Google ScholarDigital Library
Bin Zhou, Yi Han, Jian Pei, Bin Jiang, Yufei Tao, and Yan Jia. 2009. Continuous privacy preserving publishing of data streams. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT’09). ACM, 648--659. Google ScholarDigital Library
Bin Zhou, Jian Pei, and WoShun Luk. 2008. A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newslett. 10, 2 (2008), 12--22. Google ScholarDigital Library

Index Terms

Privacy-Preserving Publishing of Hierarchical Data
1. Information systems
  1. Data management systems
    1. Database administration
2. Social and professional topics
  1. Computing / technology policy

Recommendations

Background knowledge attacks in privacy-preserving data publishing models
Abstract
Massive volumes of data are being generated at every moment through various sources in the cyber-physical world. While storing as well as facilitating these data for business or individual requirements, data disclosure, sensitive data ...
Read More
A Survey on Privacy Preserving Approaches in Data Publishing
DBTA '09: Proceedings of the 2009 First International Workshop on Database Technology and Applications

Privacy preserving in data publishing has become one of the most important research topics in data security field and it has become a serious concern in publication of personal data in recent years. How to efficiently protect individual privacy in data ...
Read More
An effective value swapping method for privacy preserving data publishing

Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Privacy and Security Volume 19, Issue 3
December 2016
88 pages
ISSN:2471-2566
EISSN:2471-2574
DOI:10.1145/2997655
Editor:
David Basin
ETH Zurich, Switzerland
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 September 2016
- Accepted: 1 July 2016
- Revised: 1 April 2016
- Received: 1 August 2015
Published in tops Volume 19, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
k-anonymity
Data privacy
XML
anonymity
complex data
data publishing
hierarchical data
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 564
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy-Preserving Publishing of Hierarchical Data

ACM Transactions on Privacy and Security

Abstract

References

Cited By

Index Terms

Recommendations

Background knowledge attacks in privacy-preserving data publishing models

A Survey on Privacy Preserving Approaches in Data Publishing

An effective value swapping method for privacy preserving data publishing