ABSTRACT
The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when a large amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy-preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.
- 1.R. Agrawal and R. Srikant. Privacy preserving data mining. In Proceedings of the ACM SIGMOD, pages 439-450, 2000.]] Google ScholarDigital Library
- 2.D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.]]Google Scholar
- 3.C. Clifton and D. Marks. Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 15-19, May 1996.]]Google Scholar
- 4.H. Cramer. Mathematical Models of Statistics. Princeton University press, 1946.]]Google Scholar
- 5.L. F. Cranor. Special issue on internet privacy. Communications of the ACM, 42(2), 1999.]] Google ScholarDigital Library
- 6.V. Estivill-Castro and L. Brankovic. Data swapping: Balancing privacy against precision in mining for logic rule. In DaWak99, pages 389-398, 1999.]] Google ScholarDigital Library
- 7.T. Lau, O. Etzioni, and D. S. Weld. Privacy intefaces for information management. CACM, 42(10):89-94, 1999.]] Google ScholarDigital Library
- 8.C. K. Liew, U. J. Choi, and C. J. Liew. A data distortion by probability distribution. ACM TODS, 10(3):395-411, 1985.]] Google ScholarDigital Library
- 9.C. E. Shannon. The Mathematical Theory of Communication. University of Illinois Press, 1949.]] Google ScholarDigital Library
- 10.The Economist. The end of privacy, May 1999.]]Google Scholar
- 11.The World Wide Web Consortium. The platform for privacy preference (P3P). Available from http://www.w3.org/P3P/P3FAQ.html.]]Google Scholar
- 12.K. Thearling. Data mining and privacy: A conflict in making. DS, Mar. 1998.]]Google Scholar
- 13.Time. The death of privacy, Aug. 1997.]]Google Scholar
- 14.H. L. V. Trees. Detection, Estimation, and Modulation Theory, Part I. John Wiley & Sons, New York, 1968.]] Google ScholarDigital Library
- 15.B. P. Truste. An online privacy seal program. Communications of the ACM, 42(2):56-59, 1999.]] Google ScholarDigital Library
- 16.J. Wu. On the convergence properties of the EM algorithm. Annals of Statistics, 11(1):95-103, 1983.]]Google ScholarCross Ref
Index Terms
On the design and quantification of privacy preserving data mining algorithms
Recommendations
Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication TechnologyPrivacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...
Privacy preserving data mining - past and present
Data mining is the process of discovering patterns and correlations within the huge volume of data to forecast the outcomes. There are serious challenges occurring in data mining techniques due to privacy violation and sensitive information disclosure ...
Comments