Abstract
Many applications today rely on storage and management of semi-structured information, for example, XML databases and document-oriented databases. These data often have to be shared with untrusted third parties, which makes individuals’ privacy a fundamental problem. In this article, we propose anonymization techniques for privacy-preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We extend two standards for privacy protection in tabular data (k-anonymity and ℓ-diversity) and apply them to hierarchical data. We present utility-aware algorithms that enforce these definitions of privacy using generalizations and suppressions of data values. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees.
- Roberto J. Bayardo and Rakesh Agrawal. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, 217--228. Google ScholarDigital Library
- Elisa Bertino, Silvana Castano, Elena Ferrari, and Marco Mesiti. 2000. Specifying and enforcing access control policies for XML document sources. World Wide Web 3, 3 (2000), 139--151. Google ScholarDigital Library
- Elisa Bertino, Dan Lin, and Wei Jiang. 2008. A survey of quantification of privacy preserving data mining algorithms. In Privacy-Preserving Data Mining. Springer, 183--205.Google Scholar
- James Cheng, Ada Wai-Chee Fu, and Jia Liu. 2010. K-isomorphism: Privacy preserving network publication against structural attacks. In Proceedings of the 29th ACM International Conference on Management of Data (SIGMOD 2010). ACM, 459--470. Google ScholarDigital Library
- A. Ercument Cicek, Mehmet Ercan Nergiz, and Yucel Saygin. 2014. Ensuring location diversity in privacy-preserving spatio-temporal data publishing. VLDB J. 23, 4 (2014), 609--625. Google ScholarDigital Library
- Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. 2002. A fine-grained access control system for XML documents. ACM Trans. Inform. Syst. Secur. 5, 2 (2002), 169--202. Google ScholarDigital Library
- Cynthia Dwork. 2008. Differential privacy: A survey of results. In Theory and Applications of Models of Computation. Springer, 1--19. Google ScholarDigital Library
- Irini Fundulaki and Maarten Marx. 2004. Specifying access control policies for XML documents with XPath. In Proceedings of the 9th ACM Symposium on Access Control Models and Technologies. ACM, 61--69. Google ScholarDigital Library
- Benjamin Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4 (2010), 14. Google ScholarDigital Library
- Gabriel Ghinita, Panos Kalnis, and Yufei Tao. 2011. Anonymous publication of sensitive transactional data. IEEE Trans. Knowl. Data Eng. 23, 2 (2011), 161--174. Google ScholarDigital Library
- Olga Gkountouna and Manolis Terrovitis. 2015. Anonymizing collections of tree-structured data. (2015).Google Scholar
- Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques. Elsevier. Google ScholarDigital Library
- Yeye He and Jeffrey F. Naughton. 2009. Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2, 1 (2009), 934--945. Google ScholarDigital Library
- Vijay S. Iyengar. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’02). ACM, 279--288. Google ScholarDigital Library
- Harold W. Kuhn. 1955. The Hungarian method for the assignment problem. Naval Res. Log. Quart. 2, 1--2 (1955), 83--97.Google ScholarCross Ref
- Anders H. Landberg, Kinh Nguyen, Eric Pardede, and J. Wenny Rahayu. 2014. Δ-dependency for privacy-preserving XML data publishing. J. Biomed. Inform. 50 (2014), 77--94.Google ScholarCross Ref
- Anders H. Landberg, J. Wenny Rahayu, and Eric Pardede. 2011. n-dependency: Dependency diversity in anatomised microdata tables. Log. J. IGPL 19, 5 (2011), 679--702.Google ScholarCross Ref
- Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 24th ACM International Conference on Management of Data (SIGMOD’05). ACM, 49--60. Google ScholarDigital Library
- Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, 25--25. Google ScholarDigital Library
- Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07). IEEE, 106--115.Google ScholarCross Ref
- Junqiang Liu and Ke Wang. 2010. On optimal anonymization for l+-diversity. In Proceedings of the 26th International Conference on Data Engineering (ICDE’10). IEEE, 213--224.Google Scholar
- Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In Proceedings of the 27th ACM International Conference on Management of Data (SIGMOD’08). ACM, 93--106. Google ScholarDigital Library
- Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007), 3. Google ScholarDigital Library
- David J. C. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press. Google ScholarDigital Library
- Adam Meyerson and Ryan Williams. 2004. On the complexity of optimal k-anonymity. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems (PODS 2004). ACM, 223--228. Google ScholarDigital Library
- James Munkres. 1957. Algorithms for the assignment and transportation problems. J. Soc. Indust. Appl. Math. 5, 1 (1957), 32--38.Google ScholarCross Ref
- Mehmet Ercan Nergiz, Maurizio Atzori, and Yucel Saygin. 2008. Towards trajectory anonymization: A generalization-based approach. In Proceedings of the 2008 ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS. ACM, 52--61. Google ScholarDigital Library
- Mehmet Ercan Nergiz, Christopher Clifton, and Ahmet Erhan Nergiz. 2009. Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21, 8 (2009), 1104--1117. Google ScholarDigital Library
- Mehmet Ercan Nergiz, Acar Tamersoy, and Yucel Saygin. 2011. Instant anonymization. ACM Trans. Database Syst. 36, 1 (2011), 2. Google ScholarDigital Library
- David W. Pentico. 2007. Assignment problems: A golden anniversary survey. Eur. J. Operat. Res. 176, 2 (2007), 774--793.Google ScholarCross Ref
- Pierangela Samarati and Latanya Sweeney. 1998. Protecting Privacy When Disclosing Information: K-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report. SRI International.Google Scholar
- Latanya Sweeney. 2000. Uniqueness of Simple Demographics in the US Population. Technical Report. Carnegie Mellon University.Google Scholar
- Latanya Sweeney. 2002a. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzz. Knowl.-Based Syst. 10, 05 (2002), 571--588. Google ScholarDigital Library
- Latanya Sweeney. 2002b. k-anonymity: A model for protecting privacy. Int. J. Uncertain., Fuzz. Knowl.-Based Syst. 10, 05 (2002), 557--570. Google ScholarDigital Library
- Acar Tamersoy, Grigorios Loukides, Mehmet Ercan Nergiz, Yucel Saygin, and Bradley Malin. 2012. Anonymization of longitudinal electronic medical records. IEEE Trans. Inform. Technol. Biomed. 16, 3 (2012), 413--423. Google ScholarDigital Library
- Manolis Terrovitis and Nikos Mamoulis. 2008. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management (MDM’08). IEEE, 65--72. Google ScholarDigital Library
- Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2008. Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1, 1 (2008), 115--125. Google ScholarDigital Library
- Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2011. Local and global recoding methods for anonymizing set-valued data. VLDB J. 20, 1 (2011), 83--106. Google ScholarDigital Library
- Traian Marius Truta and Bindu Vinay. 2006. Privacy protection: P-sensitive k-anonymity property. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06) Workshops. IEEE, 94. Google ScholarDigital Library
- B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products. Technometrics 4, 3 (1962), 419--420.Google ScholarCross Ref
- Xiaokui Xiao and Yufei Tao. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 139--150. Google ScholarDigital Library
- Xiaokui Xiao and Yufei Tao. 2006b. Personalized privacy preservation. In Proceedings of the 25th ACM International Conference on Management of Data (SIGMOD’06). ACM, 229--240. Google ScholarDigital Library
- Xiaokui Xiao, Ke Yi, and Yufei Tao. 2010. The hardness and approximation algorithms for l-diversity. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, 135--146. Google ScholarDigital Library
- Xiaochun Yang and Chen Li. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB’04). VLDB Endowment, 96--107. Google ScholarDigital Library
- Elena Zheleva and Lise Getoor. 2008. Preserving the privacy of sensitive relationships in graph data. In Proceedings of the 2nd International Workshop on Privacy, Security and Trust in KDD (PinKDD’08). Springer, 153--171. Google ScholarDigital Library
- Bin Zhou, Yi Han, Jian Pei, Bin Jiang, Yufei Tao, and Yan Jia. 2009. Continuous privacy preserving publishing of data streams. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT’09). ACM, 648--659. Google ScholarDigital Library
- Bin Zhou, Jian Pei, and WoShun Luk. 2008. A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newslett. 10, 2 (2008), 12--22. Google ScholarDigital Library
Index Terms
- Privacy-Preserving Publishing of Hierarchical Data
Recommendations
Background knowledge attacks in privacy-preserving data publishing models
AbstractMassive volumes of data are being generated at every moment through various sources in the cyber-physical world. While storing as well as facilitating these data for business or individual requirements, data disclosure, sensitive data ...
A Survey on Privacy Preserving Approaches in Data Publishing
DBTA '09: Proceedings of the 2009 First International Workshop on Database Technology and ApplicationsPrivacy preserving in data publishing has become one of the most important research topics in data security field and it has become a serious concern in publication of personal data in recent years. How to efficiently protect individual privacy in data ...
An effective value swapping method for privacy preserving data publishing
Privacy is an important concern in the society, and it has been a fundamental issue when to analyze and publish data involving human individual's sensitive information. Recently, the slicing method has been popularly used for privacy preservation in ...
Comments