skip to main content
10.1145/375551.375602acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

On the design and quantification of privacy preserving data mining algorithms

Published:01 May 2001Publication History

ABSTRACT

The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when a large amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy-preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.

References

  1. 1.R. Agrawal and R. Srikant. Privacy preserving data mining. In Proceedings of the ACM SIGMOD, pages 439-450, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.]]Google ScholarGoogle Scholar
  3. 3.C. Clifton and D. Marks. Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 15-19, May 1996.]]Google ScholarGoogle Scholar
  4. 4.H. Cramer. Mathematical Models of Statistics. Princeton University press, 1946.]]Google ScholarGoogle Scholar
  5. 5.L. F. Cranor. Special issue on internet privacy. Communications of the ACM, 42(2), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.V. Estivill-Castro and L. Brankovic. Data swapping: Balancing privacy against precision in mining for logic rule. In DaWak99, pages 389-398, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.T. Lau, O. Etzioni, and D. S. Weld. Privacy intefaces for information management. CACM, 42(10):89-94, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.C. K. Liew, U. J. Choi, and C. J. Liew. A data distortion by probability distribution. ACM TODS, 10(3):395-411, 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.C. E. Shannon. The Mathematical Theory of Communication. University of Illinois Press, 1949.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.The Economist. The end of privacy, May 1999.]]Google ScholarGoogle Scholar
  11. 11.The World Wide Web Consortium. The platform for privacy preference (P3P). Available from http://www.w3.org/P3P/P3FAQ.html.]]Google ScholarGoogle Scholar
  12. 12.K. Thearling. Data mining and privacy: A conflict in making. DS, Mar. 1998.]]Google ScholarGoogle Scholar
  13. 13.Time. The death of privacy, Aug. 1997.]]Google ScholarGoogle Scholar
  14. 14.H. L. V. Trees. Detection, Estimation, and Modulation Theory, Part I. John Wiley & Sons, New York, 1968.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.B. P. Truste. An online privacy seal program. Communications of the ACM, 42(2):56-59, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.J. Wu. On the convergence properties of the EM algorithm. Annals of Statistics, 11(1):95-103, 1983.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On the design and quantification of privacy preserving data mining algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        May 2001
        301 pages
        ISBN:1581133618
        DOI:10.1145/375551

        Copyright © 2001 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 2001

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        PODS '01 Paper Acceptance Rate26of99submissions,26%Overall Acceptance Rate642of2,707submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader