Abstract
Privacy and security concerns can prevent sharing of data, derailing data-mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. We introduce a generalized privacy-preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with a proof of security, we discuss what would be necessary to make the protocols completely secure. We also provide experimental results, giving a first demonstration of the practical complexity of secure multiparty computation-based data mining.
- Agrawal, D. and Aggarwal, C. C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA. ACM Press, New York, 247--255. Google ScholarDigital Library
- Agrawal, R., Evfimievski, A., and Srikant, R. 2003. Information sharing across private databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA. ACM Press, New York. Google ScholarDigital Library
- Agrawal, R. and Srikant, R. 2000. Privacy-Preserving data mining. In Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas, TX. ACM Press, New York, 439--450. Google ScholarDigital Library
- Atallah, M. J., Elmongui, H. G., Deshpande, V., and Schwarz, L. B. 2003. Secure supply-chain protocols. In Proceedings of the IEEE International Conference on E-Commerce, Newport Beach, CA. IEEE Computer Society Press, 293--302.Google Scholar
- Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. http://citeseer.comp.nus.edu.sg/context/123650/0.Google Scholar
- Cohen, H., Miyaji, A., and Ono, T. 1998. Efficient elliptic curve exponentiation using mixed coordinates. In Proceedings of the International Conference on the Theory and Applications of Cryptology and Information Security (ASIACRYPT). Springer-Verlag, London, UK, 51--65. Google ScholarDigital Library
- Cox, M. J., Engelschall, R. S., Henson, S., and rie, B. L. 1998--2005. The OpenSSL Toolkit.Google Scholar
- Cramer, R., Damgard, I., and Nielsen, J. B. 2001. Multiparty computation from threshold homomorphic encryption. In Proceedings of the International Conference on the Theory and Application of Cryptographic Techniques (EUROCRYPT). Springer-Verlag, London, UK, 280--299. Google ScholarDigital Library
- Damgard, I., Jurik, M., and Nielsen, J. 2003. A generalization of Paillier's public-key system with applications to electronic voting.Google Scholar
- Du, W. and Atallah, M. J. 2001. Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of the New Security Paradigms Workshop. ACM, New York, 11--20. Google ScholarDigital Library
- Du, W. and Zhan, Z. 2002. Building decision tree classifier on private data. In Proceedings of the IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan, C. Clifton and V. Estivill-Castro, Eds. vol. 14. Australian Computer Society, 1--8. Google ScholarDigital Library
- Duda, R. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, Hoboken, NJ.Google Scholar
- Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. 2002. Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 217--228. Google ScholarDigital Library
- Evidence-Based Medicine Working Group. 1992. Evidence-Based medicine. A new approach to teaching the practice of medicine. J. Amer. Medical Assoc. 268, 17 (Nov.), 2420--2425.Google Scholar
- Freedman, M. J., Nissim, K., and Pinkas, B. 2004. Efficient private matching and set intersection. In Proceedings of the 23rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, International Association for Cryptologic Research (IACR), Interlaken, Switzerland. Springer, 1--19.Google Scholar
- Goethals, B., Laur, S., Lipmaa, H., and Mielikäinen, T. 2004. On secure scalar product computation for privacy-preserving data mining. In Proceedings of the 7th Annual International Conference in Information Security and Cryptology (ICISC), New York, C. Park and S. Chee, Eds. vol. 3506, Springer, 104--120. Google ScholarDigital Library
- Goldreich, O. 2004. General Cryptographic Protocols, Vol. 2. In The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge, UK, 599--764.Google ScholarCross Ref
- Goldreich, O., Micali, S., and Wigderson, A. 1987. How to play any mental game—A completeness theorem for protocols with honest majority. In Proceedings of the 19th ACM Symposium on the Theory of Computing. ACM, New York, 218--229. Google ScholarDigital Library
- Huang, Z., Du, W., and Chen, B. 2005. Deriving private information from randomized data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD. ACM Press, New York. Google ScholarDigital Library
- Jagannathan, G. and Wright, R. N. 2005. Privacy-Preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL. ACM Press, New York, 593--599. Google ScholarDigital Library
- Kantarcioglu, M. and Clifton, C. 2002. Privacy-Preserving distributed mining of association rules on horizontally partitioned data. In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Madison, WI. ACM Press, New York, 24--31.Google Scholar
- Kantarcioglu, M. and Vaidya, J. 2002. An architecture for privacy-preserving mining of client information. In Proceedings of the IEEE International Conference on Data Mining, Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan, C. Clifton and V. Estivill-Castro, Eds. vol. 14. Australian Computer Society, 37--42. Google ScholarDigital Library
- Kantarcioǧlu, M. and Clifton, C. 2004. Privacy-Preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16, 9 (Sept.), 1026--1037. Google ScholarDigital Library
- Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. 2003. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Los Alamitos, CA. Google ScholarDigital Library
- Lewis, M. 2003. Department of defense appropriations act, 2004. Title VIII Section 8120. Enacted as Public Law 108-87.Google Scholar
- Lin, X., Clifton, C., and Zhu, M. 2005. Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8, 1 (Jul.), 68--81.Google ScholarCross Ref
- Lindell, Y. and Pinkas, B. 2000. Privacy preserving data mining. In Advances in Cryptology (CRYPTO). Springer-Verlag, New York, NY, 36--54. Google ScholarDigital Library
- Lindell, Y. and Pinkas, B. 2002. Privacy preserving data mining. J. Cryptol. 15, 3, 177--206.Google ScholarDigital Library
- Quinlan, J. R. 1986. Induction of decision trees. Mach. Learn. 1, 1, 81--106. Google ScholarCross Ref
- Rizvi, S. J. and Haritsa, J. R. 2002. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), Hong Kong, VLDB Endowment, 682--693. Google ScholarDigital Library
- Schneier, B. 1995. Applied Cryptography, 2nd ed. John Wiley & Sons, Hoboken, NJ.Google Scholar
- Shirao, K., Hoff, P., Ohtsu, A., Loehrer, P., Hyodo, I., Wadler, S., Wadleigh, R., O'Dwyer, P., Muro, K., Yamada, Y., Boku, N., Nagashima, F., and Abbruzzese, J. 2004. Comparison of the efficacy, toxicity, and pharmacokinetics of a uracil/tegafur (UFT) plus oral leucovorin (LV) regimen between Japanese and American patients with advanced colorectal cancer: Joint United States and Japan study of UFT/LV. J. Clinical Oncol. 22, 17 (Sept. 1), 3466--3474.Google ScholarCross Ref
- Vaidya, J. and Clifton, C. 2002. Privacy-Preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada. ACM Press, New York, 639--644. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2003. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC. ACM Press, New York, 206--215. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2004. Privacy preserving naïve Bayes classifier for vertically partitioned data. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Philadelphia, PA, 522--526.Google Scholar
- Vaidya, J. and Clifton, C. 2005a. Privacy-Preserving decision trees over vertically partitioned data. In the 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Storrs, CT. Springer. Google ScholarDigital Library
- Vaidya, J. and Clifton, C. 2005b. Secure set intersection cardinality with application to association rule mining. J. Comput. Security 13, 4 (Nov.), 593--622. Google ScholarDigital Library
- Wang, K., Xu, Y., She, R., and Yu, P. S. 2006. Classification spanning private databases. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI). AAAI Press, Menlo Park, CA. Google ScholarDigital Library
- Witten, I. H. and Frank, E. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Wright, R. and Yang, Z. 2004. Privacy-Preserving Bayesian network structure computation on distributed heterogeneous data. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA. ACM Press, New York. Google ScholarDigital Library
- Yao, A. C. 1986. How to generate and exchange secrets. In Proceedings of the 27th IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, Los Alamitos, CA, 162--167. Google ScholarDigital Library
Index Terms
- Privacy-preserving decision trees over vertically partitioned data
Recommendations
Privacy preserving clustering on horizontally partitioned data
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy ...
Privacy-Preserving decision trees over vertically partitioned data
DBSec'05: Proceedings of the 19th annual IFIP WG 11.3 working conference on Data and Applications SecurityPrivacy and security concerns can prevent sharing of data, derailing data mining projects.Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized ...
A novel privacy-preserving scheme for collaborative frequent itemset mining across vertically partitioned data
Privacy preservation while undertaking collaborative data mining is a significant research problem. The vertically partitioned data model is an important data partition model and has varied applications. The vertically partitioned data model ...
Comments