skip to main content
article

Preserving data privacy in outsourcing data aggregation services

Published:01 August 2007Publication History
Skip Abstract Section

Abstract

Advances in distributed service-oriented computing and Internet technology have formed a strong technology push for outsourcing and information sharing. There is an increasing need for organizations to share their data across organization boundaries both within the country and with countries that may have lesser privacy and security standards. Ideally, we wish to share certain statistical data and extract the knowledge from the private databases without revealing any additional information of each individual database apart from the aggregate result that is permitted. In this article, we describe two scenarios for outsourcing data aggregation services and present a set of decentralized peer-to-peer protocols for supporting data sharing across multiple private databases while minimizing the data disclosure among individual parties. Our basic protocols include a set of novel probabilistic computation mechanisms for important primitive data aggregation operations across multiple private databases such as max, min, and top k selection. We provide an analytical study of our basic protocols in terms of precision, efficiency, and privacy characteristics. Our advanced protocols implement an efficient algorithm for performing kNN classification across multiple private databases. We provide a set of experiments to evaluate the proposed protocols in terms of their correctness, efficiency, and privacy characteristics.

References

  1. Aggarwal, G., Bawa, M., Ganesan, P., Garcia-Molina, H., Kenthapadi, K., Motwani, R., Srivastava, U., Thomas, D., and Xu, Y. 2005. Two can keep a secret: A distributed architecture for secure database services. Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  2. Aggarwal, G., Mishra, N., and Pinkas, B. 2004. Secure computation of the kth ranked element. IACR Conference on Eurocryption.Google ScholarGoogle Scholar
  3. Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data mining algorithms. Symposium on Principles of Database Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Agrawal, R., Bird, P., Grandison, T., Kieman, J., Logan, S., and Rjaibi, W. 2005. Extending relational database systems to automatically enforce privacy policies. 21st International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Agrawal, R., Evfimievski, A., and Srikant, R. 2003. Information sharing across private databases. ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Agrawal, R., Kieman, J., Srikant, R., and Xu, Y. 2002. Hippocratic databases. International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. 2004. Order-preserving encryption for numeric data. ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bawa, M., Bayardo, R. J., and Agrawal, R. 2003. Privacy-preserving indexing of documents on the network. 29th International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bertino, E., Ooi, B., Yang, Y., and Deng, R. H. 2005. Privacy and ownership preserving of outsourced medical data. International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Blaze, M., Feigenbaum, J., and Lacy, J. 1996. Decentralized trust management. IEEE Conference on Privacy and Security. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Clifton, C. 2002. Tutorial on privacy, security, and data mining. 13th European Conference on Machine Learning and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clifton, C., Kantarcioglu, M., Lin, X., Vaidya, J., and Zhu, M. 2003. Tools for privacy preserving distributed data mining. SIGKDD Explorations. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dijkstra, E. W. 1974. Self-stabilizing systems in spite of distributed control. Commun. ACM 17, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Doan, A. and Halevy, A. 2005. Semantic integration research in the database community: A brief survey. AI Magazine (Special Issue on Semantic Integration). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Elmagarmid, A., Rusinkiewicz, M., and Sheth, A., Eds. 1999. Management of Heterogeneous and Autonomous Database Systems 1st Ed. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Garcia-Molina, H., Ullman, J. D., and Widom, J. D. 2001. Information Integration, Chapter 20. Prentice Hall.Google ScholarGoogle Scholar
  17. Goldreich, O. 2001. Secure multi-party computation. Working Draft, version 1.3.Google ScholarGoogle Scholar
  18. Hacigumus, H., Iyer, B., Li, C., and Mehrotra, S. 2002. Executing SQL over encrypted data in the database service provider model. ACM SIGMOD Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hacigumus, H., Iyer, B., and Mehrotra, S. 2002. Providing database as a service. International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Halevy, A. Y., Ashish, N., Bitton, D., Carey, M. J., Draper, D., Pollock, J., Rosenthal, A., and Sikka, V. 2005. Enterprise information integration: successes, challenges and controversies. ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hore, B., Mehrotra, S., and Tsudik, G. 1997. A privacy-preserving index for range queries. ACM Symposium on Principles of Distributed Computing.Google ScholarGoogle Scholar
  22. Jajodia, S. and Sandhu, R. 1991. Toward a multilevel secure relational data model. ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kantarcioglu, M. and Clifton, C. 2004a. Privacy preserving data mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Engin. 16, 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kantarcioglu, M. and Clifton, C. 2004b. Security issues in querying encrypted data. Tech. rep. TR-04-013, Purdue University.Google ScholarGoogle Scholar
  25. Kantarcioglu, M. and Clifton, C. 2005. Privacy preserving k-nn classifier. International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  26. Kantarcoglu, M. and Vaidya, J. 2003. Privacy preserving naive Bayes classifier for horizontally partitioned data. IEEE ICDM Workshop on Privacy Preserving Data Mining.Google ScholarGoogle Scholar
  27. Lindell, Y. and Pinkas, B. 2002. Privacy preserving data mining. J. Crypto. 15, 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lynch, N. A. 1996. Distributed Algorithms. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Markey, E. J. 2005. Outsourcing privacy: Countries processing U.S. social security numbers, health information, tax records lack fundamental privacy safeguards. A staff report prepared at the request of Edward J. Markey, U.S. House of Representatives.Google ScholarGoogle Scholar
  30. Reiter, M. K. and Rubin, A. D. 1998. Crowds: Anonymity for Web transactions. ACM Trans. Inform. Syst. Secur. (TISSEC) 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Syverson, S., Coldsehlag, D. M., and Reed, M. C. 1997. Anonymous connections and onion routing. IEEE Symposium on Security and Privacy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vaidya, J. and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vaidya, J. and Clifton, C. 2003a. Privacy-preserving k-means clustering over vertically partitioned data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vaidya, J. and Clifton, C. 2003b. Privacy preserving naive Bayes classifier for vertically partitioned data. The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Vaidya, J. and Clifton, C. 2005. Privacy-preserving top-k queries. International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wang, K., Fung, B. C. M., and Dong, G. 2005. Integrating private databases for data analysis. IEEE Intelligence and Security Informatics Conference (ISI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wright, M., Adler, M., Levine, B. N., and Shields, C. 2003. Defending anonymous communications against passive logging attacks. IEEE Symposium on Security and Privacy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiao, L., Xu, Z., and Zhang, X. 2003. Mutual anonymity protocols for hybrid peer-to-peer systems. International Conference on Distributed Computing Systems (ICDCS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xiong, L., Chitti, S., and Liu, L. 2005. Topk queries across multiple private databases. 25th International Conference on Distributed Computing Systems (ICDCS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xiong, L. and Liu, L. 2004. PeerTrust: supporting reputation-based trust in peer-to-peer communities. IEEE Trans. Knowl. Data Engin. 16, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yang, Z., Zhong, S., and Wright, R. N. 2005. Privacy-preserving classification of customer data without loss of accuracy. SIAM Conference on Data Mining (SDM). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Preserving data privacy in outsourcing data aggregation services

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 7, Issue 3
      Special Issue on the Internet and Outsourcing
      August 2007
      97 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/1275505
      Issue’s Table of Contents

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 August 2007
      Published in toit Volume 7, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader