ABSTRACT
In the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either anonymously -- for research purposes, or openly by financial or insurance companies, for decision-making purposes. When is shared anonymously, there is still the possibility of de-anonymizing the data. Privacy Preserving Data Publishing (PPDP) is a way to allow one to share secure data while ensuring protection against identity disclosure of an individual. Generalization of attributes is a technique of data anonymization where an attribute is replaced with a more generalized value. Differential privacy is a technique that ensures the highest level of privacy for a record owner while providing actual information about the data set. This research develops a framework by generalizing attributes of a data set that satisfy differential privacy principles for publishing secure data for sharing. The proposed algorithm is a non-interactive method to publish anonymize data set, and the decision tree classifier showed better results compared to other existing classification works on anonymized data set. In this paper differential privacy refers to ϵ-differential privacy.
- K. Al-Hussaeni, B. C. M. Fung, and W. K. Cheung. Privacy-preserving trajectory stream publishing. Data Knowl. Eng., 94:89--109, 2014. Google ScholarDigital Library
- K. Boyd, E. Lantz, and D. Page. Differential privacy for classifier evaluation. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, AISec '15, pages 15--23, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. CoRR, abs/1207.1371, 2012.Google Scholar
- B.-C. Chen, D. Kifer, K. LeFevre, and A. Machanavajjhala. Privacy-preserving data publishing. Found. Trends databases, 2(1-2):1--167, Jan. 2009. Google ScholarDigital Library
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '03, pages 202--210, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011. Google ScholarDigital Library
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, pages 265--284, 2006. Google ScholarDigital Library
- C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211--407, Aug. 2014. Google ScholarDigital Library
- P. Fabian, K. Florian, and Contributors. Arx - powerful data anonymization tool. http://arx.deidentifier.org/, June 2016.Google Scholar
- P. Fabian, K. Florian, L. Ronald, and A. K. Klaus. Arx - a comprehensive tool for anonymizing biomedical data. Proceedings of the AMIA 2014 Annual Symposium, Washington D.C., USA, pages 984--993, Nov 2014.Google Scholar
- L. Fan and H. Jin. A practical framework for privacy-preserving data analytics. In Proceedings of the 24th International Conference on World Wide Web, WWW '15, pages 311--321, Republic and Canton of Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee. Google ScholarDigital Library
- A. Friedman and A. Schuster. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pages 493--502, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- A. Friedman, R. Wolff, and A. Schuster. Providing k-anonymity in data mining. The VLDB Journal, 17(4):789--804, July 2008. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv., 42(4):14:1--14:53, June 2010. Google ScholarDigital Library
- B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering, ICDE '05, pages 205--216, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- C. M. Fung Benjamin, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE Trans. on Knowl. and Data Eng., 19(5):711--725, May 2007. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, Nov. 2009. Google ScholarDigital Library
- J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. Differential privacy: An economic method for choosing epsilon. CoRR, abs/1402.3329, 2014.Google Scholar
- V. S. Iyengar. Transforming data to satisfy privacy constraints. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02, pages 279--288, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- S. Kisilevich, Y. Elovici, B. Shapira, and L. Rokach. Protecting persons while protecting the people. chapter kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity, pages 63--81. Springer-Verlag, Berlin, Heidelberg, 2009. Google ScholarDigital Library
- L. Kristen, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst., 33(3):17:1--17:47, Sept. 2008. Google ScholarDigital Library
- J. Lee and C. Clifton. Differential identifiability. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 1041--1049, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE '06, pages 25--36, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- J. Li, J. Liu, M. M. Baig, and R. C. Wong. Information based data anonymization for classification utility. Data Knowl. Eng., 70(12):1030--1045, 2011. Google ScholarDigital Library
- M. Lichman. UCI machine learning repository, 2015.Google Scholar
- J. Liu and K. Wang. Anonymizing transaction data by integrating suppression and generalization. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part I, PAKDD'10, pages 171--180, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- G. Loukides, J. Liagouris, A. Gkoulalas-Divanis, and M. Terrovitis. Disassociation for electronic health record privacy. Journal of Biomedical Informatics, 50:46--61, 2014.Google ScholarCross Ref
- F. McSherry and K. Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 94--103, 2007. Google ScholarDigital Library
- F. D. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 19--30, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- N. Mohammed, R. Chen, B. C. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 493--501, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the Forty-second ACM Symposium on Theory of Computing, STOC '10, pages 765--774, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- P. Sharkey, H. Tian, W. Zhang, and S. Xu. Privacy-preserving data mining through knowledge model sharing. In Proceedings of the 1st ACM SIGKDD International Conference on Privacy, Security, and Trust in KDD, PinKDD'07, pages 97--115, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- J. Vaidya, B. Shafiq, A. Basu, and Y. Hong. Differentially private naive bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01, WI-IAT '13, pages 571--576, Washington, DC, USA, 2013. IEEE Computer Society. Google ScholarDigital Library
- K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '04, pages 249--256, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- R. C. wing Wong, J. Li, A. W. chee Fu, and K. Wang. (Îś, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In In ACM SIGKDD, pages 754--759, 2006. Google ScholarDigital Library
- A. Zaman and C. Obimbo. Privacy preserving data publishing: A classification perspective. International Journal of Advanced Computer Science and Applications(IJACSA), 5(9):129--134, 2014.Google Scholar
- A Novel Differential Privacy Approach that Enhances Classification Accuracy
Recommendations
Sensitive Disclosures under Differential Privacy Guarantees
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataNon-independent reasoning (NIR) refers to learning the information of one record from other records, under the assumption that these records share the same underlying distribution. Accurate NIR could disclose private information of an individual. An ...
From t-closeness to differential privacy and vice versa in data anonymization
k-anonymity and ε-differential privacy are two mainstream privacy models, the former introduced to anonymize data sets and the latter to limit the knowledge gain that results from including one individual in the data set. Whereas basic k-anonymity only ...
()-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy
AbstractThe General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new ...
Comments