Privacy-preserving SVM classification

Vaidya, Jaideep; Yu, Hwanjo; Jiang, Xiaoqian

doi:10.1007/s10115-007-0073-7

Privacy-preserving SVM classification

Regular Paper
Published: 24 March 2007

Volume 14, pages 161–178, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jaideep Vaidya¹,
Hwanjo Yu² &
Xiaoqian Jiang²

1709 Accesses
134 Citations
3 Altmetric
Explore all metrics

Abstract

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Strict Differentially Private Support Vector Machines with Dimensionality Reduction

Integrally Private Model Selection for Support Vector Machine

Classification Algorithms for Privacy Preserving in Data Mining: A Survey

References

Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, Santa Barbara, CA, pp 247–255. [Online]. Available: http://doi.acm.org/10.1145/375551.375602
Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Lecture notes in computer science, vol 2992, pp 183–199
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD conference on management of data, ACM, Dallas, TX, pp 439–450. [Online]. Available: http://doi.acm.org/10.1145/342009.335438
Benaloh JC (1986) Secret sharing homomorphisms: Keeping shares of a secret secret. In: Odlyzko A (ed) Advances in cryptography—CRYPTO86: proceedings, vol 263, Lecture notes in computer science, 1986, Springer-Verlag, Berlin, pp 251–260. [Online]. Available: http://springerlink.metapress.com/openurl.asp?genre=article&issn=0302-9743&volume=263&spage=251
Blum M, Goldwasser S (1984) An efficient probabilistic public-key encryption that hides all partial information. In: Blakely R (ed) Advances in cryptology—Crypto 84 proceedings. Springer-Verlag, Berlin
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Article Google Scholar
Christianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York
Google Scholar
Directive 95/46/EC of the European parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off J Eur Communities I(281):31–50
Du W, Atallah MJ (2001) Privacy-preserving statistical analysis. In: Proceedings of the 17th annual computer security applications conference, New Orleans, LA, [Online]. Available: http://www.cerias.purdue.edu/homes/duw/research/paper/acsac2001.ps
Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Clifton C, Estivill-Castro V (eds) IEEE international conference on data mining workshop on privacy, security, and data mining, vol~14. Australian Computer Society, Maebashi City, Japan, pp 1–8. [Online]. Available: http://crpit.com/Vol14.html
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 217–228. [Online]. Available: http://doi.acm.org/10.1145/775047.775080
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the ACM SIGKDD international conference knowledge discovery and data mining (KDD’01), pp 77–86
Goethals B, Laur S, Lipmaa H, Mielikäinen T (2004) On secure scalar product computation for privacy-preserving data mining. In: Park C, Chee S (eds) The 7th annual international conference in information security and cryptology (ICISC 2004), vol 3506, pp 104–120
Goldreich O, Micali S, Wigderson A (1987) How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM Symposium on the Theory of Computing, pp 218–229. [Online]. Available: http://doi.acm.org/10.1145/28395.28420
Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, Baltimore, MD
Ioannidis I, Grama A, Atallah M (2002) A secure protocol for computing dot-products in clustered and distributed environments. In: The 2002 international conference on parallel processing, Vancouver, British Columbia, Canada
Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, pp 593–599
Kantarcıoglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037. [Online]. Available: http://csdl.computer.org/comp/trans/tk/2004/09/k1026abs.htm
Google Scholar
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), Melbourne, FL
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414. [Online]. Available: http://www.springerlink.com/content/va0409rm86aqv9um
Google Scholar
Karr AF, Lin X, Sanil AP, Reiter JP (2005) Secure regressions on distributed databases. J Comput Graph Stat 14:263–279
Article MathSciNet Google Scholar
Lin X, Clifton C, Zhu M (2005) Privacy preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1):68–81
Google Scholar
Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15(3):177–206
Article MATH MathSciNet Google Scholar
Mielikainen T (2004) Privacy problems with anonymized transaction databases. In: Discovery science: 7th international conference proceedings, Lecture notes in computer science, vol 3245, Springer-Verlag, Berlin, January, pp 219–229
Naccache D, Stern J (1998) A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM conference on computer and communications security, ACM, San Francisco, CA, pp 59–66
Okamoto T, Uchiyama S (1998) A new public-key cryptosystem as secure as factoring. In: Advances in cryptology—Eurocrypt ’98, Lecture notes in computer science, vol 1403. Springer-Verlag, Berlin, pp 308–318
Oliveira S, Zaiane O (2003) Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian symposium on databases, pp 304–318. [Online]. Available: citeseer.ifi.unizh.ch/oliveira03privacy.html
Paillier P (1999) Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology—Eurocrypt ’99 Proceedings, Lecture notes in computer science, vol 1592, Springer-Verlag, Berlin, pp 223–238
Ravikumar P, Cohen WW, Fienberg SE (2004) A secure protocol for computing string distance metrics. In: Proceedings of the workshop on privacy and security aspects of data mining at the international conference on data mining, pp 40–46
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of 28th international conference on very large data bases, VLDB, Hong Kong, pp 682–693. [Online]. Available: http://www.vldb.org/conf/2002/S19P03.pdf
Sanil AP, Karr AF, Lin X, Reiter JP (2004) Privacy preserving regression modelling via distributed computation. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 677–682
Standard for privacy of individually identifiable health information. Fed Regist 66(40), 2001. [Online]. Available: http://www.hhs.gov/ocr/hipaa/finalreg.html
Sweeney L, Shamos M (2004) A multiparty computation for randomly ordering players and making random selections. Carnegie Mellon University, School of Computer Science, Tech Rep CMU-ISRI-04-126
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 639–644. [Online]. Available: http://doi.acm.org/10.1145/775047.775142
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 206–215. [Online]. Available: http://doi.acm.org/10.1145/956750.956776
Vaidya J, Clifton C (2004) Privacy preserving naï ve bayes classifier for vertically partitioned data. In: 2004 SIAM international conference on data mining, Lake Buena Vista, FL, pp 522–526
Vaidya J, Clifton C (2004) Privacy-preserving outlier detection. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04). IEEE Computer Society Press, Los Alamitos, CA, pp 233–240
Vaidya J, Clifton C (2005) Privacy-preserving decision trees over vertically partitioned data. In: The 19th annual IFIP WG 11.3 working conference on data and applications security, Storrs, CT, 7–10 August. Springer, Berlin Heidelberg New York [Online]. Available: http://dx.doi.org/10.1007/11535706_11
Vaidya J, Clifton C (2005) Secure set intersection cardinality with application to association rule mining. J Comput Secur 13(4):593–622
Google Scholar
Vaidya J, Clifton C, Zhu M (2005) Privacy-preserving data mining, 1st~edn., Advances in information security, vol~19, Springer-Verlag, Berlin. [Online]. Available: http://www.springeronline.com/sgw/cda/frontpage/0,11855,4-40356-72-52496494-0,00.html
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec 33(1):50–57. [Online]. Available: http://www.acm.org/sigmod/record/issues/0403/B1.bertion-sigmod-record2.pdf
Google Scholar
Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA
Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10(3):383–397. [Online]. Available: http://www.springerlink.com/content/r5778lt2q3763213
Google Scholar
Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, IEEE Press, Los Alamitos, CA, pp 162–167.
Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: SAC ’06: Proceedings of the 2006 ACM symposium on applied computing, ACM, New York, pp 603–610
Yu H, Vaidya J (2004) Secure matrix addition. UIOWA Tech Rep UIOWA-CS-04-04. Available: http://hwanjoyu.org/paper/techreport04-04.pdf, Tech. Rep.
Yu H, Vaidya J, Jiang X (2006) Privacy-preserving SVM classification on vertically partitioned data. In: Proceedings of PAKDD ’06, Lecture notes in computer science, vol 3918. Springer-Verlag, Berlin, pp 647–656. [Online]. Available: http://dx.doi.org/10.1007/11731139_74
Yu H, Vaidya J (in press) Privacy preserving linear SVM classification. Submitted for publication to Data & Knowledge Engineering, Elsevier, Science, Amsterdam
Zhang N, Wang S, Zhao W (2004) A new scheme on privacy-preserving association rule mining. In: The 8th European conference on principles and practice of knowledge discovery in databases (PKDD 2004), Pisa, Italy. [Online]. Available: http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=3202&spage=484
Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 761–766

Download references

Author information

Authors and Affiliations

Management Science and Information Systems Department, Rutgers University, Newark, NJ, 07102, USA
Jaideep Vaidya
Department of Computer Science, University of Iowa, Iowa City, IA, USA
Hwanjo Yu & Xiaoqian Jiang

Authors

Jaideep Vaidya
View author publications
You can also search for this author in PubMed Google Scholar
Hwanjo Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqian Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaideep Vaidya.

Additional information

Jaideep Vaidya received the Bachelor’s degree in Computer Engineering from the University of Mumbai. He received the Master’s and the Ph.D. degrees in Computer Science from Purdue University. He is an Assistant Professor in the Management Science and Information Systems Department at Rutgers University. His research interests include data mining and analysis, information security, and privacy. He has received best paper awards for papers in ICDE and SIDKDD. He is a Member of the IEEE Computer Society and the ACM.

Hwanjo Yu received the Ph.D. degree in Computer Science in 2004 from the University of Illinois at Urbana-Champaign. He is an Assistant Professor in the Department of Computer Science at the University of Iowa. His research interests include data mining, machine learning, database, and information systems. He is an Associate Editor of Neurocomputing and served on the NSF Panel in 2006. He has served on the program committees of 2005 ACM SAC on Data Mining track, 2005 and 2006 IEEE ICDM, 2006 ACM CIKM, and 2006 SIAM Data Mining.

Xiaoqian Jiang received the B.S. degree in Computer Science from Shanghai Maritime University, Shanghai, 2003. He received the M.C.S. degree in Computer Science from the University of Iowa, Iowa City, 2005. Currently, he is pursuing a Ph.D. degree from the School of Computer Science, Carnegie Mellon University. His research interests are computer vision, machine learning, data mining, and privacy protection technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vaidya, J., Yu, H. & Jiang, X. Privacy-preserving SVM classification. Knowl Inf Syst 14, 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7

Download citation

Received: 12 April 2006
Revised: 14 November 2006
Accepted: 26 January 2007
Published: 24 March 2007
Issue Date: February 2008
DOI: https://doi.org/10.1007/s10115-007-0073-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-preserving SVM classification

Abstract

Access this article

Similar content being viewed by others

Strict Differentially Private Support Vector Machines with Dimensionality Reduction

Integrally Private Model Selection for Support Vector Machine

Classification Algorithms for Privacy Preserving in Data Mining: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Privacy-preserving SVM classification

Abstract

Access this article

Similar content being viewed by others

Strict Differentially Private Support Vector Machines with Dimensionality Reduction

Integrally Private Model Selection for Support Vector Machine

Classification Algorithms for Privacy Preserving in Data Mining: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation