research-article

Privacy-preserving decision trees over vertically partitioned data

Authors:
Jaideep Vaidya

Rutgers University, Newark, NJ

Rutgers University, Newark, NJ
View Profile

,
Chris Clifton

Purdue University, West Lafayette, IN

Purdue University, West Lafayette, IN
View Profile

,
Murat Kantarcioglu

University of Texas at Dallas, Richardson, TX

University of Texas at Dallas, Richardson, TX
View Profile

,
A. Scott Patterson

Johns Hopkins University, Baltimore, MD

Johns Hopkins University, Baltimore, MD
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 2 Issue 3Article No.: 14pp 1–27https://doi.org/10.1145/1409620.1409624

Published:27 October 2008Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Privacy and security concerns can prevent sharing of data, derailing data-mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. We introduce a generalized privacy-preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with a proof of security, we discuss what would be necessary to make the protocols completely secure. We also provide experimental results, giving a first demonstration of the practical complexity of secure multiparty computation-based data mining.

References

Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA. ACM Press, New York, 247--255. Google ScholarDigital Library
Agrawal, R., Evfimievski, A., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA. ACM Press, New York. Google ScholarDigital Library
Agrawal, R. and Srikant, R. 2000. Privacy-Preserving data mining. In Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas, TX. ACM Press, New York, 439--450. Google ScholarDigital Library
Atallah, M. J., Elmongui, H. G., Deshpande, V., and Schwarz, L. B. 2003. Secure supply-chain protocols. In Proceedings of the IEEE International Conference on E-Commerce, Newport Beach, CA. IEEE Computer Society Press, 293--302.Google Scholar
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. http://citeseer.comp.nus.edu.sg/context/123650/0.Google Scholar
Cohen, H., Miyaji, A., and Ono, T. 1998. Efficient elliptic curve exponentiation using mixed coordinates. In Proceedings of the International Conference on the Theory and Applications of Cryptology and Information Security (ASIACRYPT). Springer-Verlag, London, UK, 51--65. Google ScholarDigital Library
Cox, M. J., Engelschall, R. S., Henson, S., and rie, B. L. 1998--2005. The OpenSSL Toolkit.Google Scholar
Cramer, R., Damgard, I., and Nielsen, J. B. 2001. Multiparty computation from threshold homomorphic encryption. In Proceedings of the International Conference on the Theory and Application of Cryptographic Techniques (EUROCRYPT). Springer-Verlag, London, UK, 280--299. Google ScholarDigital Library
Damgard, I., Jurik, M., and Nielsen, J. 2003. A generalization of Paillier's public-key system with applications to electronic voting.Google Scholar
Du, W. and Atallah, M. J. 2001. Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of the New Security Paradigms Workshop. ACM, New York, 11--20. Google ScholarDigital Library
Du, W. and Zhan, Z. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan, C. Clifton and V. Estivill-Castro, Eds. vol. 14. Australian Computer Society, 1--8. Google ScholarDigital Library
Duda, R. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, Hoboken, NJ.Google Scholar
Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 217--228. Google ScholarDigital Library
Evidence-Based Medicine Working Group. 1992. Evidence-Based medicine. A new approach to teaching the practice of medicine. J. Amer. Medical Assoc. 268, 17 (Nov.), 2420--2425.Google Scholar
Freedman, M. J., Nissim, K., and Pinkas, B. 2004. Efficient private matching and set intersection. In Proceedings of the 23rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, International Association for Cryptologic Research (IACR), Interlaken, Switzerland. Springer, 1--19.Google Scholar
Goethals, B., Laur, S., Lipmaa, H., and Mielikäinen, T. 2004. On secure scalar product computation for privacy-preserving data mining. In Proceedings of the 7th Annual International Conference in Information Security and Cryptology (ICISC), New York, C. Park and S. Chee, Eds. vol. 3506, Springer, 104--120. Google ScholarDigital Library
Goldreich, O. 2004. General Cryptographic Protocols, Vol. 2. In The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge, UK, 599--764.Google ScholarCross Ref
Goldreich, O., Micali, S., and Wigderson, A. 1987. How to play any mental game—A completeness theorem for protocols with honest majority. In Proceedings of the 19th ACM Symposium on the Theory of Computing. ACM, New York, 218--229. Google ScholarDigital Library
Huang, Z., Du, W., and Chen, B. 2005. Deriving private information from randomized data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD. ACM Press, New York. Google ScholarDigital Library
Jagannathan, G. and Wright, R. N. 2005. Privacy-Preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL. ACM Press, New York, 593--599. Google ScholarDigital Library
Kantarcioglu, M. and Clifton, C. 2002. Privacy-Preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Madison, WI. ACM Press, New York, 24--31.Google Scholar
Kantarcioglu, M. and Vaidya, J. 2002. An architecture for privacy-preserving mining of client information. In Proceedings of the IEEE International Conference on Data Mining, Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan, C. Clifton and V. Estivill-Castro, Eds. vol. 14. Australian Computer Society, 37--42. Google ScholarDigital Library
Kantarcioǧlu, M. and Clifton, C. 2004. Privacy-Preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16, 9 (Sept.), 1026--1037. Google ScholarDigital Library
Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Los Alamitos, CA. Google ScholarDigital Library
Lewis, M. 2003. Department of defense appropriations act, 2004. Title VIII Section 8120. Enacted as Public Law 108-87.Google Scholar
Lin, X., Clifton, C., and Zhu, M. 2005. Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8, 1 (Jul.), 68--81.Google ScholarCross Ref
Lindell, Y. and Pinkas, B. 2000. Privacy preserving data mining. In Advances in Cryptology (CRYPTO). Springer-Verlag, New York, NY, 36--54. Google ScholarDigital Library
Lindell, Y. and Pinkas, B. 2002. Privacy preserving data mining. J. Cryptol. 15, 3, 177--206.Google ScholarDigital Library
Quinlan, J. R. 1986. Induction of decision trees. Mach. Learn. 1, 1, 81--106. Google ScholarCross Ref
Rizvi, S. J. and Haritsa, J. R. 2002. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), Hong Kong, VLDB Endowment, 682--693. Google ScholarDigital Library
Schneier, B. 1995. Applied Cryptography, 2nd ed. John Wiley & Sons, Hoboken, NJ.Google Scholar
Shirao, K., Hoff, P., Ohtsu, A., Loehrer, P., Hyodo, I., Wadler, S., Wadleigh, R., O'Dwyer, P., Muro, K., Yamada, Y., Boku, N., Nagashima, F., and Abbruzzese, J. 2004. Comparison of the efficacy, toxicity, and pharmacokinetics of a uracil/tegafur (UFT) plus oral leucovorin (LV) regimen between Japanese and American patients with advanced colorectal cancer: Joint United States and Japan study of UFT/LV. J. Clinical Oncol. 22, 17 (Sept. 1), 3466--3474.Google ScholarCross Ref
Vaidya, J. and Clifton, C. 2002. Privacy-Preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada. ACM Press, New York, 639--644. Google ScholarDigital Library
Vaidya, J. and Clifton, C. 2003. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC. ACM Press, New York, 206--215. Google ScholarDigital Library
Vaidya, J. and Clifton, C. 2004. Privacy preserving naïve Bayes classifier for vertically partitioned data. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Philadelphia, PA, 522--526.Google Scholar
Vaidya, J. and Clifton, C. 2005a. Privacy-Preserving decision trees over vertically partitioned data. In the 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Storrs, CT. Springer. Google ScholarDigital Library
Vaidya, J. and Clifton, C. 2005b. Secure set intersection cardinality with application to association rule mining. J. Comput. Security 13, 4 (Nov.), 593--622. Google ScholarDigital Library
Wang, K., Xu, Y., She, R., and Yu, P. S. 2006. Classification spanning private databases. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI). AAAI Press, Menlo Park, CA. Google ScholarDigital Library
Witten, I. H. and Frank, E. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Wright, R. and Yang, Z. 2004. Privacy-Preserving Bayesian network structure computation on distributed heterogeneous data. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA. ACM Press, New York. Google ScholarDigital Library
Yao, A. C. 1986. How to generate and exchange secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, Los Alamitos, CA, 162--167. Google ScholarDigital Library

Index Terms

Privacy-preserving decision trees over vertically partitioned data

Recommendations

Privacy preserving clustering on horizontally partitioned data

Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy ...
Read More
Privacy-Preserving decision trees over vertically partitioned data
DBSec'05: Proceedings of the 19th annual IFIP WG 11.3 working conference on Data and Applications Security

Privacy and security concerns can prevent sharing of data, derailing data mining projects.Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized ...
Read More
A novel privacy-preserving scheme for collaborative frequent itemset mining across vertically partitioned data

Privacy preservation while undertaking collaborative data mining is a significant research problem. The vertically partitioned data model is an important data partition model and has varied applications. The vertically partitioned data model ...
Read More

Reviews

Reviewer: Richard CHBEIR

Iterative dichotomiser 3 (ID3) is a classification algorithm that uses a fixed set of examples to build a decision tree. This paper presents an interesting variant of the ID3 algorithm that can be used to classify vertically partitioned data while preserving the privacy of participated sites and parties. This variant is practical and beneficial in various scenarios and applications. In the first section, the authors introduce their work, and present a motivating scenario related to cancer treatments. In the second section, they present how to create the ID3 tree, by explaining and providing a set of required concepts and algorithms. Section 3 is devoted to explaining how the tree can be used, and is illustrated with an example of a weather dataset. Section 4 discusses the proofs related to the security of the provided algorithms, and computational complexity is addressed in Section 5. In Section 6, the authors present the implementation of the algorithm, with a set of experimental studies conducted to show the performance of their approach. In Section 7, the authors address the problem of securing the protocols of ID3, by providing a theoretical study of a secure multiparty dot product protocol complemented with an experimental study. Section 8 is dedicated to presenting current approaches related to this work. Although the paper is very interesting, it remains very technical, and data mining skills are required to understand the concepts (particularly the main classification algorithms). A comparison study with classical algorithms would have made the paper easier to understand. In addition, Sections 6 and 7 should have been merged and restructured. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 2, Issue 3
October 2008
124 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1409620
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2008
- Accepted: 1 August 2008
- Revised: 1 May 2008
- Received: 1 September 2007
Published in tkdd Volume 2, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Decision tree classification
privacy
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 100
  Total Citations
  View Citations
- 1,578
  Total Downloads
- Downloads (Last 12 months)46
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy-preserving decision trees over vertically partitioned data

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Privacy preserving clustering on horizontally partitioned data

Privacy-Preserving decision trees over vertically partitioned data

A novel privacy-preserving scheme for collaborative frequent itemset mining across vertically partitioned data

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Privacy-preserving decision trees over vertically partitioned data

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Privacy preserving clustering on horizontally partitioned data

Privacy-Preserving decision trees over vertically partitioned data

A novel privacy-preserving scheme for collaborative frequent itemset mining across vertically partitioned data

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media