skip to main content
10.1145/2948992.2949027acmotherconferencesArticle/Chapter ViewAbstractPublication PagesuccsConference Proceedingsconference-collections
short-paper

A Novel Differential Privacy Approach that Enhances Classification Accuracy

Published:20 July 2016Publication History

ABSTRACT

In the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either anonymously -- for research purposes, or openly by financial or insurance companies, for decision-making purposes. When is shared anonymously, there is still the possibility of de-anonymizing the data. Privacy Preserving Data Publishing (PPDP) is a way to allow one to share secure data while ensuring protection against identity disclosure of an individual. Generalization of attributes is a technique of data anonymization where an attribute is replaced with a more generalized value. Differential privacy is a technique that ensures the highest level of privacy for a record owner while providing actual information about the data set. This research develops a framework by generalizing attributes of a data set that satisfy differential privacy principles for publishing secure data for sharing. The proposed algorithm is a non-interactive method to publish anonymize data set, and the decision tree classifier showed better results compared to other existing classification works on anonymized data set. In this paper differential privacy refers to ϵ-differential privacy.

References

  1. K. Al-Hussaeni, B. C. M. Fung, and W. K. Cheung. Privacy-preserving trajectory stream publishing. Data Knowl. Eng., 94:89--109, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Boyd, E. Lantz, and D. Page. Differential privacy for classifier evaluation. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, AISec '15, pages 15--23, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. CoRR, abs/1207.1371, 2012.Google ScholarGoogle Scholar
  4. B.-C. Chen, D. Kifer, K. LeFevre, and A. Machanavajjhala. Privacy-preserving data publishing. Found. Trends databases, 2(1-2):1--167, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Dinur and K. Nissim. Revealing information while preserving privacy. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '03, pages 202--210, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, pages 265--284, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211--407, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Fabian, K. Florian, and Contributors. Arx - powerful data anonymization tool. http://arx.deidentifier.org/, June 2016.Google ScholarGoogle Scholar
  10. P. Fabian, K. Florian, L. Ronald, and A. K. Klaus. Arx - a comprehensive tool for anonymizing biomedical data. Proceedings of the AMIA 2014 Annual Symposium, Washington D.C., USA, pages 984--993, Nov 2014.Google ScholarGoogle Scholar
  11. L. Fan and H. Jin. A practical framework for privacy-preserving data analytics. In Proceedings of the 24th International Conference on World Wide Web, WWW '15, pages 311--321, Republic and Canton of Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Friedman and A. Schuster. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pages 493--502, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Friedman, R. Wolff, and A. Schuster. Providing k-anonymity in data mining. The VLDB Journal, 17(4):789--804, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv., 42(4):14:1--14:53, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering, ICDE '05, pages 205--216, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. M. Fung Benjamin, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE Trans. on Knowl. and Data Eng., 19(5):711--725, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. Differential privacy: An economic method for choosing epsilon. CoRR, abs/1402.3329, 2014.Google ScholarGoogle Scholar
  19. V. S. Iyengar. Transforming data to satisfy privacy constraints. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02, pages 279--288, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kisilevich, Y. Elovici, B. Shapira, and L. Rokach. Protecting persons while protecting the people. chapter kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity, pages 63--81. Springer-Verlag, Berlin, Heidelberg, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Kristen, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst., 33(3):17:1--17:47, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Lee and C. Clifton. Differential identifiability. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 1041--1049, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE '06, pages 25--36, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Li, J. Liu, M. M. Baig, and R. C. Wong. Information based data anonymization for classification utility. Data Knowl. Eng., 70(12):1030--1045, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Lichman. UCI machine learning repository, 2015.Google ScholarGoogle Scholar
  26. J. Liu and K. Wang. Anonymizing transaction data by integrating suppression and generalization. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part I, PAKDD'10, pages 171--180, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Loukides, J. Liagouris, A. Gkoulalas-Divanis, and M. Terrovitis. Disassociation for electronic health record privacy. Journal of Biomedical Informatics, 50:46--61, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. F. McSherry and K. Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 94--103, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. D. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 19--30, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Mohammed, R. Chen, B. C. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 493--501, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the Forty-second ACM Symposium on Theory of Computing, STOC '10, pages 765--774, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Sharkey, H. Tian, W. Zhang, and S. Xu. Privacy-preserving data mining through knowledge model sharing. In Proceedings of the 1st ACM SIGKDD International Conference on Privacy, Security, and Trust in KDD, PinKDD'07, pages 97--115, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Vaidya, B. Shafiq, A. Basu, and Y. Hong. Differentially private naive bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01, WI-IAT '13, pages 571--576, Washington, DC, USA, 2013. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '04, pages 249--256, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. C. wing Wong, J. Li, A. W. chee Fu, and K. Wang. (Îś, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In In ACM SIGKDD, pages 754--759, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Zaman and C. Obimbo. Privacy preserving data publishing: A classification perspective. International Journal of Advanced Computer Science and Applications(IJACSA), 5(9):129--134, 2014.Google ScholarGoogle Scholar
  1. A Novel Differential Privacy Approach that Enhances Classification Accuracy

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering
          July 2016
          152 pages
          ISBN:9781450340755
          DOI:10.1145/2948992

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 July 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate12of42submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader