skip to main content
research-article

Privacy-Preserving Publishing of Hierarchical Data

Published:15 September 2016Publication History
Skip Abstract Section

Abstract

Many applications today rely on storage and management of semi-structured information, for example, XML databases and document-oriented databases. These data often have to be shared with untrusted third parties, which makes individuals’ privacy a fundamental problem. In this article, we propose anonymization techniques for privacy-preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We extend two standards for privacy protection in tabular data (k-anonymity and ℓ-diversity) and apply them to hierarchical data. We present utility-aware algorithms that enforce these definitions of privacy using generalizations and suppressions of data values. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees.

References

  1. Roberto J. Bayardo and Rakesh Agrawal. 2005. Data privacy through optimal k-anonymization. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Elisa Bertino, Silvana Castano, Elena Ferrari, and Marco Mesiti. 2000. Specifying and enforcing access control policies for XML document sources. World Wide Web 3, 3 (2000), 139--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Elisa Bertino, Dan Lin, and Wei Jiang. 2008. A survey of quantification of privacy preserving data mining algorithms. In Privacy-Preserving Data Mining. Springer, 183--205.Google ScholarGoogle Scholar
  4. James Cheng, Ada Wai-Chee Fu, and Jia Liu. 2010. K-isomorphism: Privacy preserving network publication against structural attacks. In Proceedings of the 29th ACM International Conference on Management of Data (SIGMOD 2010). ACM, 459--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Ercument Cicek, Mehmet Ercan Nergiz, and Yucel Saygin. 2014. Ensuring location diversity in privacy-preserving spatio-temporal data publishing. VLDB J. 23, 4 (2014), 609--625. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ernesto Damiani, Sabrina De Capitani di Vimercati, Stefano Paraboschi, and Pierangela Samarati. 2002. A fine-grained access control system for XML documents. ACM Trans. Inform. Syst. Secur. 5, 2 (2002), 169--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cynthia Dwork. 2008. Differential privacy: A survey of results. In Theory and Applications of Models of Computation. Springer, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Irini Fundulaki and Maarten Marx. 2004. Specifying access control policies for XML documents with XPath. In Proceedings of the 9th ACM Symposium on Access Control Models and Technologies. ACM, 61--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Benjamin Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4 (2010), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gabriel Ghinita, Panos Kalnis, and Yufei Tao. 2011. Anonymous publication of sensitive transactional data. IEEE Trans. Knowl. Data Eng. 23, 2 (2011), 161--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Olga Gkountouna and Manolis Terrovitis. 2015. Anonymizing collections of tree-structured data. (2015).Google ScholarGoogle Scholar
  12. Jiawei Han, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yeye He and Jeffrey F. Naughton. 2009. Anonymization of set-valued data via top-down, local generalization. Proc. VLDB Endow. 2, 1 (2009), 934--945. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vijay S. Iyengar. 2002. Transforming data to satisfy privacy constraints. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’02). ACM, 279--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Harold W. Kuhn. 1955. The Hungarian method for the assignment problem. Naval Res. Log. Quart. 2, 1--2 (1955), 83--97.Google ScholarGoogle ScholarCross RefCross Ref
  16. Anders H. Landberg, Kinh Nguyen, Eric Pardede, and J. Wenny Rahayu. 2014. Δ-dependency for privacy-preserving XML data publishing. J. Biomed. Inform. 50 (2014), 77--94.Google ScholarGoogle ScholarCross RefCross Ref
  17. Anders H. Landberg, J. Wenny Rahayu, and Eric Pardede. 2011. n-dependency: Dependency diversity in anatomised microdata tables. Log. J. IGPL 19, 5 (2011), 679--702.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2005. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 24th ACM International Conference on Management of Data (SIGMOD’05). ACM, 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, 25--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the 23rd International Conference on Data Engineering (ICDE’07). IEEE, 106--115.Google ScholarGoogle ScholarCross RefCross Ref
  21. Junqiang Liu and Ke Wang. 2010. On optimal anonymization for l+-diversity. In Proceedings of the 26th International Conference on Data Engineering (ICDE’10). IEEE, 213--224.Google ScholarGoogle Scholar
  22. Kun Liu and Evimaria Terzi. 2008. Towards identity anonymization on graphs. In Proceedings of the 27th ACM International Conference on Management of Data (SIGMOD’08). ACM, 93--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007), 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David J. C. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Adam Meyerson and Ryan Williams. 2004. On the complexity of optimal k-anonymity. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems (PODS 2004). ACM, 223--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. James Munkres. 1957. Algorithms for the assignment and transportation problems. J. Soc. Indust. Appl. Math. 5, 1 (1957), 32--38.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mehmet Ercan Nergiz, Maurizio Atzori, and Yucel Saygin. 2008. Towards trajectory anonymization: A generalization-based approach. In Proceedings of the 2008 ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS. ACM, 52--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mehmet Ercan Nergiz, Christopher Clifton, and Ahmet Erhan Nergiz. 2009. Multirelational k-anonymity. IEEE Trans. Knowl. Data Eng. 21, 8 (2009), 1104--1117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mehmet Ercan Nergiz, Acar Tamersoy, and Yucel Saygin. 2011. Instant anonymization. ACM Trans. Database Syst. 36, 1 (2011), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. David W. Pentico. 2007. Assignment problems: A golden anniversary survey. Eur. J. Operat. Res. 176, 2 (2007), 774--793.Google ScholarGoogle ScholarCross RefCross Ref
  31. Pierangela Samarati and Latanya Sweeney. 1998. Protecting Privacy When Disclosing Information: K-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report. SRI International.Google ScholarGoogle Scholar
  32. Latanya Sweeney. 2000. Uniqueness of Simple Demographics in the US Population. Technical Report. Carnegie Mellon University.Google ScholarGoogle Scholar
  33. Latanya Sweeney. 2002a. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzz. Knowl.-Based Syst. 10, 05 (2002), 571--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Latanya Sweeney. 2002b. k-anonymity: A model for protecting privacy. Int. J. Uncertain., Fuzz. Knowl.-Based Syst. 10, 05 (2002), 557--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Acar Tamersoy, Grigorios Loukides, Mehmet Ercan Nergiz, Yucel Saygin, and Bradley Malin. 2012. Anonymization of longitudinal electronic medical records. IEEE Trans. Inform. Technol. Biomed. 16, 3 (2012), 413--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Manolis Terrovitis and Nikos Mamoulis. 2008. Privacy preservation in the publication of trajectories. In Proceedings of the 9th International Conference on Mobile Data Management (MDM’08). IEEE, 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2008. Privacy-preserving anonymization of set-valued data. Proc. VLDB Endow. 1, 1 (2008), 115--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2011. Local and global recoding methods for anonymizing set-valued data. VLDB J. 20, 1 (2011), 83--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Traian Marius Truta and Bindu Vinay. 2006. Privacy protection: P-sensitive k-anonymity property. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06) Workshops. IEEE, 94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products. Technometrics 4, 3 (1962), 419--420.Google ScholarGoogle ScholarCross RefCross Ref
  41. Xiaokui Xiao and Yufei Tao. 2006a. Anatomy: Simple and effective privacy preservation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 139--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaokui Xiao and Yufei Tao. 2006b. Personalized privacy preservation. In Proceedings of the 25th ACM International Conference on Management of Data (SIGMOD’06). ACM, 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xiaokui Xiao, Ke Yi, and Yufei Tao. 2010. The hardness and approximation algorithms for l-diversity. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xiaochun Yang and Chen Li. 2004. Secure XML publishing without information leakage in the presence of data inference. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB’04). VLDB Endowment, 96--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Elena Zheleva and Lise Getoor. 2008. Preserving the privacy of sensitive relationships in graph data. In Proceedings of the 2nd International Workshop on Privacy, Security and Trust in KDD (PinKDD’08). Springer, 153--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Bin Zhou, Yi Han, Jian Pei, Bin Jiang, Yufei Tao, and Yan Jia. 2009. Continuous privacy preserving publishing of data streams. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT’09). ACM, 648--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Bin Zhou, Jian Pei, and WoShun Luk. 2008. A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newslett. 10, 2 (2008), 12--22. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-Preserving Publishing of Hierarchical Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Privacy and Security
        ACM Transactions on Privacy and Security  Volume 19, Issue 3
        December 2016
        88 pages
        ISSN:2471-2566
        EISSN:2471-2574
        DOI:10.1145/2997655
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 September 2016
        • Accepted: 1 July 2016
        • Revised: 1 April 2016
        • Received: 1 August 2015
        Published in tops Volume 19, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader