Abstract
In this work we show that feature selection can be used to preserve privacy of individuals without compromising the accuracy of data classification. Furthermore, when feature selection is combined with anonymization techniques, we are able to publish privacy preserving datasets. We use several UCI data sets to empirically support our claim. The obtained results show that these privacy-preserving datasets provide classification accuracy comparable and in some cases superior to the accuracy of classification of the original datasets. We generalized the results with a paired t-test applied on different levels of anonymization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sun, X., Wang, H., Li, J., Zhang, Y.: Injecting purpose and trust into data anonymization. Computers & Security 30, 332–345 (2011)
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based System 10(5), 571–588 (2002)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, pp. 277–286 (2006)
Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st International Conference on Data Engineering, pp. 205–216 (2005)
Wang, K., Yu, P., Chakraborty, S.: Bottom-up generalization-A data mining solu-tion to privacy protection. In: Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, UK, pp. 249–256 (2004)
Iyengar, V.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD Iinternational Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 279–288 (2002)
Byun, J.W., Bertino, E., Li, N.: Purpose based access control of complex data for privacy protection. In: The 10th ACM Symposium on Access Control Models and Technologies, Stockholm, Sweden, pp. 102–110 (2005)
Xiong, L., Rangachari, K.: Towards Application-Oriented Data Anonymization. In: First SIAM International Workshop on Practical Privacy-Preserving Data Mining, Atlanta, US, pp. 1–10 (2008)
Hall, M., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(6), 1437–1447 (2003)
Amaldi, E., Kann, V.: On the approximation of minimizing non zero variables or unsatisfied relations in linear systems. Theoretical Computer Science 209, 237–260 (1998)
Guyon, I., Elisseff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Fung, B., Wang, K., Chen, R., Yu, P.: Privacy-Preserving Data Publishing - A Survey of Recent Development. ACM Computing Surveys 42(4), Article 14 (2010)
Nguyen, H.H., Kim, J.: Differential Privacy in Practice. Journal of Computing Science and Engineering 7(3), 177–186 (2013)
Dwork, C.: Differential privacy. In: Proceedings of 33rd International Colloquium on Automata, Languages and Programming, Venice, Italy, pp. 1–12 (2006)
Clifton, C., Tassa, T.: On syntactic anonymity and differential privacy. Transactions of Data Privacy 6, 161–183 (2013)
Soria-Comas, J., Domingo-Ferrer, J., Sanchez, D., Martinez, S.: Improving the utility of differentially private data releases via k-anonymity. CoRR abs/1307.0966 (2013)
Li, N., Qardaji, W.: Su, Dong.: Provably private data anonyization: Or, k-anonymity meets differential privacy, CoRR abs/1101.2604 (2011)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
UCI repository, http://archive.ics.uci.edu/ml/
Lefevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, Washington DC, USA, pp. 25–36 (2006)
Lin, K., Chen, M.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Transaction on Knowedge and Data Engineering 23(11), 1704–1717 (2011)
Monreale, A.: Privacy by design in data mining. PhD dissertation, universit‘a degli studi di pisa (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jafer, Y., Matwin, S., Sokolova, M. (2014). Task Oriented Privacy Preserving Data Publishing Using Feature Selection. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-06483-3_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06482-6
Online ISBN: 978-3-319-06483-3
eBook Packages: Computer ScienceComputer Science (R0)