Skip to main content
Log in

Privacy-preserving SVM classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, Santa Barbara, CA, pp 247–255. [Online]. Available: http://doi.acm.org/10.1145/375551.375602

  2. Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Lecture notes in computer science, vol 2992, pp 183–199

  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD conference on management of data, ACM, Dallas, TX, pp 439–450. [Online]. Available: http://doi.acm.org/10.1145/342009.335438

  4. Benaloh JC (1986) Secret sharing homomorphisms: Keeping shares of a secret secret. In: Odlyzko A (ed) Advances in cryptography—CRYPTO86: proceedings, vol 263, Lecture notes in computer science, 1986, Springer-Verlag, Berlin, pp 251–260. [Online]. Available: http://springerlink.metapress.com/openurl.asp?genre=article&issn=0302-9743&volume=263&spage=251

  5. Blum M, Goldwasser S (1984) An efficient probabilistic public-key encryption that hides all partial information. In: Blakely R (ed) Advances in cryptology—Crypto 84 proceedings. Springer-Verlag, Berlin

  6. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167

    Article  Google Scholar 

  7. Christianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York

    Google Scholar 

  8. Directive 95/46/EC of the European parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off J Eur Communities I(281):31–50

  9. Du W, Atallah MJ (2001) Privacy-preserving statistical analysis. In: Proceedings of the 17th annual computer security applications conference, New Orleans, LA, [Online]. Available: http://www.cerias.purdue.edu/homes/duw/research/paper/acsac2001.ps

  10. Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Clifton C, Estivill-Castro V (eds) IEEE international conference on data mining workshop on privacy, security, and data mining, vol~14. Australian Computer Society, Maebashi City, Japan, pp 1–8. [Online]. Available: http://crpit.com/Vol14.html

  11. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 217–228. [Online]. Available: http://doi.acm.org/10.1145/775047.775080

  12. Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the ACM SIGKDD international conference knowledge discovery and data mining (KDD’01), pp 77–86

  13. Goethals B, Laur S, Lipmaa H, Mielikäinen T (2004) On secure scalar product computation for privacy-preserving data mining. In: Park C, Chee S (eds) The 7th annual international conference in information security and cryptology (ICISC 2004), vol 3506, pp 104–120

  14. Goldreich O, Micali S, Wigderson A (1987) How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM Symposium on the Theory of Computing, pp 218–229. [Online]. Available: http://doi.acm.org/10.1145/28395.28420

  15. Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, Baltimore, MD

  16. Ioannidis I, Grama A, Atallah M (2002) A secure protocol for computing dot-products in clustered and distributed environments. In: The 2002 international conference on parallel processing, Vancouver, British Columbia, Canada

  17. Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, pp 593–599

  18. Kantarcıoglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037. [Online]. Available: http://csdl.computer.org/comp/trans/tk/2004/09/k1026abs.htm

    Google Scholar 

  19. Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), Melbourne, FL

  20. Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414. [Online]. Available: http://www.springerlink.com/content/va0409rm86aqv9um

    Google Scholar 

  21. Karr AF, Lin X, Sanil AP, Reiter JP (2005) Secure regressions on distributed databases. J Comput Graph Stat 14:263–279

    Article  MathSciNet  Google Scholar 

  22. Lin X, Clifton C, Zhu M (2005) Privacy preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1):68–81

    Google Scholar 

  23. Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15(3):177–206

    Article  MATH  MathSciNet  Google Scholar 

  24. Mielikainen T (2004) Privacy problems with anonymized transaction databases. In: Discovery science: 7th international conference proceedings, Lecture notes in computer science, vol 3245, Springer-Verlag, Berlin, January, pp 219–229

  25. Naccache D, Stern J (1998) A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM conference on computer and communications security, ACM, San Francisco, CA, pp 59–66

  26. Okamoto T, Uchiyama S (1998) A new public-key cryptosystem as secure as factoring. In: Advances in cryptology—Eurocrypt ’98, Lecture notes in computer science, vol 1403. Springer-Verlag, Berlin, pp 308–318

  27. Oliveira S, Zaiane O (2003) Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian symposium on databases, pp 304–318. [Online]. Available: citeseer.ifi.unizh.ch/oliveira03privacy.html

  28. Paillier P (1999) Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology—Eurocrypt ’99 Proceedings, Lecture notes in computer science, vol 1592, Springer-Verlag, Berlin, pp 223–238

  29. Ravikumar P, Cohen WW, Fienberg SE (2004) A secure protocol for computing string distance metrics. In: Proceedings of the workshop on privacy and security aspects of data mining at the international conference on data mining, pp 40–46

  30. Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of 28th international conference on very large data bases, VLDB, Hong Kong, pp 682–693. [Online]. Available: http://www.vldb.org/conf/2002/S19P03.pdf

  31. Sanil AP, Karr AF, Lin X, Reiter JP (2004) Privacy preserving regression modelling via distributed computation. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 677–682

  32. Standard for privacy of individually identifiable health information. Fed Regist 66(40), 2001. [Online]. Available: http://www.hhs.gov/ocr/hipaa/finalreg.html

  33. Sweeney L, Shamos M (2004) A multiparty computation for randomly ordering players and making random selections. Carnegie Mellon University, School of Computer Science, Tech Rep CMU-ISRI-04-126

  34. Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 639–644. [Online]. Available: http://doi.acm.org/10.1145/775047.775142

  35. Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 206–215. [Online]. Available: http://doi.acm.org/10.1145/956750.956776

  36. Vaidya J, Clifton C (2004) Privacy preserving naï ve bayes classifier for vertically partitioned data. In: 2004 SIAM international conference on data mining, Lake Buena Vista, FL, pp 522–526

  37. Vaidya J, Clifton C (2004) Privacy-preserving outlier detection. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04). IEEE Computer Society Press, Los Alamitos, CA, pp 233–240

  38. Vaidya J, Clifton C (2005) Privacy-preserving decision trees over vertically partitioned data. In: The 19th annual IFIP WG 11.3 working conference on data and applications security, Storrs, CT, 7–10 August. Springer, Berlin Heidelberg New York [Online]. Available: http://dx.doi.org/10.1007/11535706_11

  39. Vaidya J, Clifton C (2005) Secure set intersection cardinality with application to association rule mining. J Comput Secur 13(4):593–622

    Google Scholar 

  40. Vaidya J, Clifton C, Zhu M (2005) Privacy-preserving data mining, 1st~edn., Advances in information security, vol~19, Springer-Verlag, Berlin. [Online]. Available: http://www.springeronline.com/sgw/cda/frontpage/0,11855,4-40356-72-52496494-0,00.html

  41. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  42. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec 33(1):50–57. [Online]. Available: http://www.acm.org/sigmod/record/issues/0403/B1.bertion-sigmod-record2.pdf

    Google Scholar 

  43. Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA

  44. Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10(3):383–397. [Online]. Available: http://www.springerlink.com/content/r5778lt2q3763213

    Google Scholar 

  45. Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, IEEE Press, Los Alamitos, CA, pp 162–167.

  46. Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: SAC ’06: Proceedings of the 2006 ACM symposium on applied computing, ACM, New York, pp 603–610

  47. Yu H, Vaidya J (2004) Secure matrix addition. UIOWA Tech Rep UIOWA-CS-04-04. Available: http://hwanjoyu.org/paper/techreport04-04.pdf, Tech. Rep.

  48. Yu H, Vaidya J, Jiang X (2006) Privacy-preserving SVM classification on vertically partitioned data. In: Proceedings of PAKDD ’06, Lecture notes in computer science, vol 3918. Springer-Verlag, Berlin, pp 647–656. [Online]. Available: http://dx.doi.org/10.1007/11731139_74

  49. Yu H, Vaidya J (in press) Privacy preserving linear SVM classification. Submitted for publication to Data & Knowledge Engineering, Elsevier, Science, Amsterdam

  50. Zhang N, Wang S, Zhao W (2004) A new scheme on privacy-preserving association rule mining. In: The 8th European conference on principles and practice of knowledge discovery in databases (PKDD 2004), Pisa, Italy. [Online]. Available: http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=3202&spage=484

  51. Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 761–766

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaideep Vaidya.

Additional information

Jaideep Vaidya received the Bachelor’s degree in Computer Engineering from the University of Mumbai. He received the Master’s and the Ph.D. degrees in Computer Science from Purdue University. He is an Assistant Professor in the Management Science and Information Systems Department at Rutgers University. His research interests include data mining and analysis, information security, and privacy. He has received best paper awards for papers in ICDE and SIDKDD. He is a Member of the IEEE Computer Society and the ACM.

Hwanjo Yu received the Ph.D. degree in Computer Science in 2004 from the University of Illinois at Urbana-Champaign. He is an Assistant Professor in the Department of Computer Science at the University of Iowa. His research interests include data mining, machine learning, database, and information systems. He is an Associate Editor of Neurocomputing and served on the NSF Panel in 2006. He has served on the program committees of 2005 ACM SAC on Data Mining track, 2005 and 2006 IEEE ICDM, 2006 ACM CIKM, and 2006 SIAM Data Mining.

Xiaoqian Jiang received the B.S. degree in Computer Science from Shanghai Maritime University, Shanghai, 2003. He received the M.C.S. degree in Computer Science from the University of Iowa, Iowa City, 2005. Currently, he is pursuing a Ph.D. degree from the School of Computer Science, Carnegie Mellon University. His research interests are computer vision, machine learning, data mining, and privacy protection technologies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vaidya, J., Yu, H. & Jiang, X. Privacy-preserving SVM classification. Knowl Inf Syst 14, 161–178 (2008). https://doi.org/10.1007/s10115-007-0073-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0073-7

Keywords

Navigation