short-paper

A Novel Differential Privacy Approach that Enhances Classification Accuracy

Authors:
A. N. K. Zaman

University of Guelph, Guelph ON Canada

University of Guelph, Guelph ON Canada
View Profile

,
Charlie Obimbo

University of Guelph, Guelph ON Canada

University of Guelph, Guelph ON Canada
View Profile

,
Rozita A. Dara

University of Guelph, Guelph ON Canada

University of Guelph, Guelph ON Canada
View Profile

C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringJuly 2016Pages 79–84https://doi.org/10.1145/2948992.2949027

Published:20 July 2016Publication History

C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering

Pages 79–84

ABSTRACT

In the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either anonymously -- for research purposes, or openly by financial or insurance companies, for decision-making purposes. When is shared anonymously, there is still the possibility of de-anonymizing the data. Privacy Preserving Data Publishing (PPDP) is a way to allow one to share secure data while ensuring protection against identity disclosure of an individual. Generalization of attributes is a technique of data anonymization where an attribute is replaced with a more generalized value. Differential privacy is a technique that ensures the highest level of privacy for a record owner while providing actual information about the data set. This research develops a framework by generalizing attributes of a data set that satisfy differential privacy principles for publishing secure data for sharing. The proposed algorithm is a non-interactive method to publish anonymize data set, and the decision tree classifier showed better results compared to other existing classification works on anonymized data set. In this paper differential privacy refers to ϵ-differential privacy.

References

K. Al-Hussaeni, B. C. M. Fung, and W. K. Cheung. Privacy-preserving trajectory stream publishing. Data Knowl. Eng., 94:89--109, 2014. Google ScholarDigital Library
K. Boyd, E. Lantz, and D. Page. Differential privacy for classifier evaluation. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, AISec '15, pages 15--23, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. CoRR, abs/1207.1371, 2012.Google Scholar
B.-C. Chen, D. Kifer, K. LeFevre, and A. Machanavajjhala. Privacy-preserving data publishing. Found. Trends databases, 2(1-2):1--167, Jan. 2009. Google ScholarDigital Library
I. Dinur and K. Nissim. Revealing information while preserving privacy. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '03, pages 202--210, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, pages 265--284, 2006. Google ScholarDigital Library
C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211--407, Aug. 2014. Google ScholarDigital Library
P. Fabian, K. Florian, and Contributors. Arx - powerful data anonymization tool. http://arx.deidentifier.org/, June 2016.Google Scholar
P. Fabian, K. Florian, L. Ronald, and A. K. Klaus. Arx - a comprehensive tool for anonymizing biomedical data. Proceedings of the AMIA 2014 Annual Symposium, Washington D.C., USA, pages 984--993, Nov 2014.Google Scholar
L. Fan and H. Jin. A practical framework for privacy-preserving data analytics. In Proceedings of the 24th International Conference on World Wide Web, WWW '15, pages 311--321, Republic and Canton of Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee. Google ScholarDigital Library
A. Friedman and A. Schuster. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pages 493--502, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
A. Friedman, R. Wolff, and A. Schuster. Providing k-anonymity in data mining. The VLDB Journal, 17(4):789--804, July 2008. Google ScholarDigital Library
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv., 42(4):14:1--14:53, June 2010. Google ScholarDigital Library
B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In Proceedings of the 21st International Conference on Data Engineering, ICDE '05, pages 205--216, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
C. M. Fung Benjamin, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE Trans. on Knowl. and Data Eng., 19(5):711--725, May 2007. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, Nov. 2009. Google ScholarDigital Library
J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. Differential privacy: An economic method for choosing epsilon. CoRR, abs/1402.3329, 2014.Google Scholar
V. S. Iyengar. Transforming data to satisfy privacy constraints. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02, pages 279--288, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
S. Kisilevich, Y. Elovici, B. Shapira, and L. Rokach. Protecting persons while protecting the people. chapter kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity, pages 63--81. Springer-Verlag, Berlin, Heidelberg, 2009. Google ScholarDigital Library
L. Kristen, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale datasets. ACM Trans. Database Syst., 33(3):17:1--17:47, Sept. 2008. Google ScholarDigital Library
J. Lee and C. Clifton. Differential identifiability. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 1041--1049, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE '06, pages 25--36, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
J. Li, J. Liu, M. M. Baig, and R. C. Wong. Information based data anonymization for classification utility. Data Knowl. Eng., 70(12):1030--1045, 2011. Google ScholarDigital Library
M. Lichman. UCI machine learning repository, 2015.Google Scholar
J. Liu and K. Wang. Anonymizing transaction data by integrating suppression and generalization. In Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part I, PAKDD'10, pages 171--180, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
G. Loukides, J. Liagouris, A. Gkoulalas-Divanis, and M. Terrovitis. Disassociation for electronic health record privacy. Journal of Biomedical Informatics, 50:46--61, 2014.Google ScholarCross Ref
F. McSherry and K. Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pages 94--103, 2007. Google ScholarDigital Library
F. D. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 19--30, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
N. Mohammed, R. Chen, B. C. Fung, and P. S. Yu. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 493--501, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the Forty-second ACM Symposium on Theory of Computing, STOC '10, pages 765--774, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
P. Sharkey, H. Tian, W. Zhang, and S. Xu. Privacy-preserving data mining through knowledge model sharing. In Proceedings of the 1st ACM SIGKDD International Conference on Privacy, Security, and Trust in KDD, PinKDD'07, pages 97--115, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
J. Vaidya, B. Shafiq, A. Basu, and Y. Hong. Differentially private naive bayes classification. In Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01, WI-IAT '13, pages 571--576, Washington, DC, USA, 2013. IEEE Computer Society. Google ScholarDigital Library
K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '04, pages 249--256, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
R. C. wing Wong, J. Li, A. W. chee Fu, and K. Wang. (Îś, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In In ACM SIGKDD, pages 754--759, 2006. Google ScholarDigital Library
A. Zaman and C. Obimbo. Privacy preserving data publishing: A classification perspective. International Journal of Advanced Computer Science and Applications(IJACSA), 5(9):129--134, 2014.Google Scholar

A Novel Differential Privacy Approach that Enhances Classification Accuracy

Recommendations

Sensitive Disclosures under Differential Privacy Guarantees
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big Data

Non-independent reasoning (NIR) refers to learning the information of one record from other records, under the assumption that these records share the same underlying distribution. Accurate NIR could disclose private information of an individual. An ...
Read More
From t-closeness to differential privacy and vice versa in data anonymization

k-anonymity and ε-differential privacy are two mainstream privacy models, the former introduced to anonymize data sets and the latter to limit the knowledge gain that results from including one individual in the data set. Whereas basic k-anonymity only ...
Read More
( $k, ε, δ$ )-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy
Abstract
The General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering
July 2016
152 pages
ISBN:9781450340755
DOI:10.1145/2948992
Editor:
Evan Desai
ConfSys.org
,
General Chair:
Bipin C. Desai
Concordia University, Canada
,
Program Chairs:
Ana Alameida
ISEP
,
Jorge Bernardino
ISEC
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Differential privacy
data anonymization
data classification
data privacy
privacy preserving data publishing
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate12of42submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 317
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Novel Differential Privacy Approach that Enhances Classification Accuracy

C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering

ABSTRACT

References

Cited By

Recommendations

Sensitive Disclosures under Differential Privacy Guarantees

From t-closeness to differential privacy and vice versa in data anonymization

( $k, ε, δ$ )-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Novel Differential Privacy Approach that Enhances Classification Accuracy

C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering

ABSTRACT

References

Cited By

Recommendations

Sensitive Disclosures under Differential Privacy Guarantees

From t-closeness to differential privacy and vice versa in data anonymization

(k,ε,δ)-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

( $k, ε, δ$ )-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy