skip to main content
article
Free Access

Security-control methods for statistical databases: a comparative study

Published:01 December 1989Publication History
Skip Abstract Section

Abstract

This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation.

Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented.

To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1).

We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.

References

  1. ABUL-ELA, A.-L., GREENBERG, B. G., AND HORViTZ, D. G. 1967. A multi-proportions randomized response model. J. Am. Stat. Assoc. 62, 319 (Sept.), 990-1008.]]Google ScholarGoogle Scholar
  2. ACHUGBUE, J. O., AND CHIN, F. Y. 1979. The effectiveness of output modification by rounding for protection of statistical databases. INFOR 17, 3 (Aug.), 209-218.]]Google ScholarGoogle Scholar
  3. BECK, L. L. 1980. A security mechanism for statistical databases. ACM Trans. Database Syst. 5, 3 (Sept.), 316-338.]] Google ScholarGoogle Scholar
  4. CHIN, F. Y. 1978. Security in statistical databases for queries with small counts. A CM Trans. Database Syst. 3, 1, 92-104.]] Google ScholarGoogle Scholar
  5. CHIN, F. Y., KOSSOWSKI, P., AND LOH, S. C. 1984. Efficient inference control for range sum queries. TheoL. Comput. Sci. 32, 77-86.]]Google ScholarGoogle Scholar
  6. CHIN, F. Y., AND C)ZSOYOC, LU, G. 1982. Auditing and inference control in statistical databases. IEEE Trans. Softw. Eng. SE-8, 6 (Apr.), 574-582.]]Google ScholarGoogle Scholar
  7. CHIN, F. Y., AND (~)ZSOYOGLU, G. 1981. Statistical database design. A CM Trans. Database Syst. 6, 1 (Mar.), 113-139.]] Google ScholarGoogle Scholar
  8. CHIN, F. Y., AND 0ZSOYO~,LU, G. 1979. Security in partitioned dynamic statistical databases. In Proceedings of the iEEE COMPSAC, pp. 594-601.]]Google ScholarGoogle Scholar
  9. Cox, L. H. 1980. Suppression methodology and statistical disclosure control. J. Am. Star. Assoc. 75, 370 (June), 377-385.]]Google ScholarGoogle Scholar
  10. DALENIUS, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift 3, 202-208.]]Google ScholarGoogle Scholar
  11. DALENIUS, T. 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429-444.]]Google ScholarGoogle Scholar
  12. DALENIUS, T. 1974. The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12, 213-225.]]Google ScholarGoogle Scholar
  13. DENNING, D. E. 1985. Commutative filters for reducing inference threats in multilevel database systems. In Proceedings of the 1985 Symposium on Security and Privacy, IEEE Computer Society, pp. 134-146.]]Google ScholarGoogle Scholar
  14. DENNING, D. E. 1984. Cryptographic check-sums for multilevel database security. In Proceedings of the 1984 Symposium on Security and Privacy, IEEE Computer Society, pp. 52-61.]]Google ScholarGoogle Scholar
  15. DENNING, D. r. 1983. A security model for the statistical database problem. In Proceedings of the 2nd International Workshop on Management, pp. 1-16.]] Google ScholarGoogle Scholar
  16. DENNING, D. E. 1982. Cryptography and Data Security. Addison~Wesley, Reading, Mass.]] Google ScholarGoogle Scholar
  17. DENNING, D. E. 1981. Restricting queries that might lead to compromise. In Proceedings of IEEE Symposium on Security and Privacy (Apr.), pp. 33-40.]]Google ScholarGoogle Scholar
  18. DENNING, D. E. 1980. Secure statistical databases with random sample queries. A CM Trans. Database Syst. 5, 3 (Sept.), 291-315.]] Google ScholarGoogle Scholar
  19. DENNING, D. E., AND SCHLORER, J. 1983. Inference control for statistical databases. Computer 16, 7 (July), 69-82.]]Google ScholarGoogle Scholar
  20. DENNING, D. E., AND SCHLORER, J. 1980. A fast procedure for finding a tracker in a statistical database. ACM Trans. Database Syst. 5, 1 (Mar.), 88-102.]] Google ScholarGoogle Scholar
  21. DENNING, D. E., SCHLORER, J., AND WEHRLE, E. 1982. Memoryless inference controls for statistical databases. Computer Science Dept., Purdue Univ.]]Google ScholarGoogle Scholar
  22. DENNING, D. E., DENNING, P. J., ANO SCHWARTZ, M. D. 1979. The tracker: A threat to statistical database security. A CM Trans. Database Syst. 4, I (Mar.), 76-96.]] Google ScholarGoogle Scholar
  23. DOBKIN, D., JONES, A. K., AND LIPTON, R. J. 1979. Secure databases: Protection against user influence. ACM Trans. Database Syst. 4, I (Mar.), 97-106.]] Google ScholarGoogle Scholar
  24. FELLEGI, I. r. 1972. On the question of statistical confidentiality. J. Am. Stat. Assoc. 67, 337 (Mar.), 7-18.]]Google ScholarGoogle Scholar
  25. FELLEGI, I. P., AND PHILLIPS, J. r. 1974. Statistical confidentiality: Some theory and applications to data dissemination. Ann. Ec. Soc. MeaN. 3, 2 (Apr.), 399-409.]]Google ScholarGoogle Scholar
  26. FRIEDMAN, A. D., AND HOFFMAN, L. J. 1980. Towards a fail-safe approach to secure databases. In Proceedings of IEEE Symposium on Security and Privacy (Apr.).]]Google ScholarGoogle Scholar
  27. GHOSH, S. P. 1986. Statistical relational tables for statistical database management. IEEE Trans. Softw. Eng. SE-12, 12, 1106-1116.]] Google ScholarGoogle Scholar
  28. GHOSU, S. P. 1985. An application of statistical databases in manufacturing testing. IEEE Trans. Softw. Eng. SE-11, 7, 591-596.]]Google ScholarGoogle Scholar
  29. GHOSH, S. P. 1984. An application of statistical databases in manufacturing testing, in Proceedings of IEEE COMPDEC Conference.]] Google ScholarGoogle Scholar
  30. GREENBERG, B. G., ABERNATHY, J. R., AND HORVITZ, D. G. 1969a. Application of randomized response technique in obtaining quantitative data. In Proceedings of Social Statistics Section, America, Statistical Association, (Aug.), 40-43.]]Google ScholarGoogle Scholar
  31. GREENBERG, B. G., ABUL-ELA, A.-L., SIMMONS, W. R., AND HORVITZ, U. G. 1969b. The unrelated question randomized response model: Theoretical framework. J. Am. Star. Assoc. 64, 326 (June), 520-539.]]Google ScholarGoogle Scholar
  32. HAQ, M. I. UL. 1977. On safeguarding statistical disclosure by giving approximate answers to queries. In Proceedings of International Computer Symposium (North-Holland), pp. 491-495.]]Google ScholarGoogle Scholar
  33. HAQ, M. I. UL. 1975. Insuring individual's privacy from statistical database users. In Proceedings of National Computer Conference (Montvale, N.J.), vol. 44. AFIPS Press, Arlington, Va., pp. 941-946.]]Google ScholarGoogle Scholar
  34. HOFFMAN, L. J. 1977. Modern Methods for Computer Security and Privacy. Prentice-Hall, Englewood Cliffs, N.J.]]Google ScholarGoogle Scholar
  35. HOFFMAN, L. J., AND MILLER, W. F. 1970. Getting a personal dossier from a statistical data bank. Datarnation 16, 5 (May), 74-75.]]Google ScholarGoogle Scholar
  36. JONGE, W. DE 1983. Compromising statistical databases: Responding to queries about means. ACM Trans. Database Syst. 8, i (Mar.), 60-80.]] Google ScholarGoogle Scholar
  37. KAM, J. B., AND ULLMAN, J. D. 1977. A model of statistical databases and their security. A CM Trans. Database Syst. 2, 1, 1-10.]] Google ScholarGoogle Scholar
  38. LEFONS, D., SILVESTRI, A., AND TANGORRA, F. 1983. An analytic approach to statistical databases. In Proceedings of 9th Conference on Very Large Databases (Florence, Italy), pp. 260-273.]] Google ScholarGoogle Scholar
  39. LEISS, E. 1982. Randomizing a practical method for protecting statistical databases against compromise. In Proceedings of 8th Conference on Very Large Databases, pp. 189-196.]] Google ScholarGoogle Scholar
  40. LIEW, C. K., CHOI, W. J., AND LIEW, C. J. 1985. A data distortion by probability distribution. A CM Trans. Database Syst. 10, 3, 395-411.]] Google ScholarGoogle Scholar
  41. MATLOFr, N. E. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy, pp. 173-180.]]Google ScholarGoogle Scholar
  42. MCLEISH, M. 1983. An information theoretic approach to statistical databases and their security: A preliminary report. In Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 355-359.]] Google ScholarGoogle Scholar
  43. MILLER, A. R. 1971. The Assault on Privacy-Com~ puters, Data Banks and Dossiers. University of Michigan Press, Ann Arbor, Mich.]]Google ScholarGoogle Scholar
  44. MORCENSTERN, M. 1987. Security and Inference in Multi-level Database and Knowledge-Bare Systems. In Proceedings of A CM Special Interest Group on Management of Data, pp. 357-373.]] Google ScholarGoogle Scholar
  45. 0ZSOYOGLU, G., AND CHIN, F. Y. 1982. Enhancing the security of statistical databases with a ques* tion-answering system and a kernel design. IEEE Trans. Softw. Eng. SE-8, 3, 223-234.]]Google ScholarGoogle Scholar
  46. C)ZSOYO~LU, G., AND CHUNG, J. 1986. Information loss in the lattice model of summary tables due to cell suppression. In Proceedings of IEEE Symposium on Security and Privacy, pp. 75-83.]] Google ScholarGoogle Scholar
  47. 0ZSOYOSLU, G., AND ()ZSOYOS, LU, M. 1981. Update handling techniques in statistical databases. In Proceedings of the 1st LBL Workshop on Statistical Database Management (Berkeley, Calif., Dec.), pp. 249-284.]] Google ScholarGoogle Scholar
  48. 0ZSOYOGLU, G., AND Su, T. A. 1985. Rounding and inference control in conceptual models for statistical databases. In Proceedings of IEEE Symposium on Security and Privacy, pp. 160-173.]]Google ScholarGoogle Scholar
  49. PALLEY, M. A. 1986. Security of statistical databases compromise through attribute correlational modeling. In Proceedings of IEEE Conference on Data Engineering, pp. 67-74.]] Google ScholarGoogle Scholar
  50. PALLEY, M. A., AND SIMONOFF, J. S. 1987. The use of regression methodology for compromise of confidential information in statistical databases. ACM Trans. Database Syst. 12, 4 (Dec.), 593-608.]] Google ScholarGoogle Scholar
  51. REISS, J. P. 1980. Practical data-swapping: The first steps. In Proceedings of IEEE Symposium on Security and Privacy, pp. 36-44.]]Google ScholarGoogle Scholar
  52. REISS, S. P. 1984. Practical data swapping: The first steps. ACM Trans. Database Syst. 9, I (Mar.), 20-37.]] Google ScholarGoogle Scholar
  53. ROWE, N. 1984. Diophantine inference from statistical aggregates on few-valued attributes. In Proceedings of IEEE Conference on Data Engineering, pp. 107-110.]] Google ScholarGoogle Scholar
  54. SANDE, G. 1983. Automated cell suppression to reserve confidentiality of business statistics. In Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 346-353.]] Google ScholarGoogle Scholar
  55. SCHLORER, J. 1983. Information loss in partitioned statistical databases. Comput. J. 26, 3, 218-223.]]Google ScholarGoogle Scholar
  56. SCHLORER, J. 1981. Security of statistical databases: multidimensional transformation. A CM Trans. Database Syst. 6, i (Mar.), 95-112.]] Google ScholarGoogle Scholar
  57. SCHLORER, J. 1980. Disclosure from statistical databases: Quantitative aspects of trackers. A CM Trans. Database Syst. 5, 4 (Dec.), 467-492.]] Google ScholarGoogle Scholar
  58. SCHLORER, J. 1976. Confidentiality of statistical records: A threat monitoring scheme of on-line dialogue. Methods Inform. Med. 15, 1, 36-42.]]Google ScholarGoogle Scholar
  59. SCHLORER, J. 1975. Identification and retrieval of personal records from a statistical data bank. Methods Info. Med. 14, i, 7-13.]]Google ScholarGoogle Scholar
  60. SCHWARTZ, M. D., DENNING, D. E., AND DENNING, P. J. 1979. Linear queries in statistical databases. ACM Trans. Database Syst. 4, 2, 156-167.]] Google ScholarGoogle Scholar
  61. Su, T., AND 0ZSOYOS, LU, G. 1987. Data dependencies and inference control in multilevel relational database systems. In Proceedings of the 1987 Symposium on Security and Privacy, IEEE Computer Society, pp. 202-211.]]Google ScholarGoogle Scholar
  62. TENDICK, P., AND MATLOFr, N. S. 1987. Recent results on the noise addition method for database security. Presented at the Joint ASA/IMS Statis~ tical Meetings, San Francisco.]]Google ScholarGoogle Scholar
  63. TRAUB, J. F., YEMINI, Y., AND WOZNIAKOWSKI, H. 1984. The statistical security of a statistical database. ACM Trans. Database Syst. 9, 4 (Dec.), 672-679.]] Google ScholarGoogle Scholar
  64. TRUEBLOOD, R. P. 1984. Security issues in knowledge systems. In Proceedings of I st International Workshop on Expert Database Systems, vol. 2, pp. 834-840.]]Google ScholarGoogle Scholar
  65. TURN, R., AND SHAPIRO, N. Z. 1978. Privacy and security in databank systems: Measure of effectiveness, costs, and protector-intruder interactions. Computers and Security, C. T. Dinardo, Ed. AFIPS Press, Arlington, Va., pp. 49-57.]]Google ScholarGoogle Scholar
  66. WARNER, S. L. 1971. The linear randomized response model. J. Am. Star. Assoc. 66, 336 (Dec.), 884-888.]]Google ScholarGoogle Scholar
  67. WARNER, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60, 309 (Mar.), 63-69.]]Google ScholarGoogle Scholar
  68. Yu, C. T., ANO CmN, F. Y. 1977. A study on the protection of statistical databases. In Proceedings of A CM SIGMOD International Conference on Management of Data (Aug.), pp. 169-181.]] Google ScholarGoogle Scholar

Recommendations

Reviews

Mary McLeish

A statistical database (SDB) is any traditional database system in which queries are restricted to statistical aggregates (such as sample mean and count); an example is the US Census Bureau database. It is often required that the system be secure from users' attempts to infer confidential information about an individual from the aggregate query responses. Considerable work has been carried out over the last 15 years to discover sufficient conditions on the queries to keep these databases secure. This has been found to be a very difficult task. With the increasing use of large database systems and knowledge bases for expert systems, the issue has become even more relevant in recent years. This careful survey of work on statistical database security examines the different approaches that have been used—conceptual, query restriction, data perturbation, and output perturbation. The authors discuss the main conceptual models due to Chin, O¨zsoyog?lu, Denning, and Schlorer. Different ways to restrict queries are surveyed and compared. The query set size may be restricted, and may overlap between successive queries. The methodology can depend on the data storage technique. The query-set-size control releases a statistic only if certain restrictions are made to the size of the query set. The paper mentions work in this area by Denning, Schlorer, Schwartz, Hoffman, Miller, Jonge, and Traub and notes the problems caused by the application of fast trackers. The work by Dobkin and others on query-set overlap restrictions shows the difficulties that arise in making this solution practical. Auditing involves keeping track of all queries made by a user over time and checking for possible compromise whenever a new query is issued. This paper discusses methods proposed by Chin, O¨zsoyog?lu, and McLeish. Data partitioning clusters individual entities of the population into a number of mutually exclusive subsets. Security control methods that use this technique, which are due to Yu, Chin, Schlorer, and O¨zsoyog?lu, are mentioned. (McLeish's more general results on dynamic partitioned models are not mentioned.) Sande and Cox's work on the use of cell suppression methods for the Canadian census indicates that this method is computationally complex. Data and output perturbation methods provide a very different solution to the security problem—none of the reported answers will be exact. The bias problem (Matloff), data swapping (Reiss), probability-distribution methods (Liew et al.), an analytical method (Lefons et al.), and fixed-data perturbation methods (Traub and Warner) are all reviewed. Output perturbation methods involve using random-sample queries (Denning), and a method due to Beck introduces a varying perturbation to the data. The paper presents a variety of rounding techniques: systematic rounding, random rounding, and controlled rounding. The effectiveness of all these methods depends, of course, on the strictness of the security required. Sometimes an exact compromise (determining the exact value of a protected attribute) is the only situation in which disclosure is of concern; at other times,<__?__Pub Caret> revealing a close estimate of the value (statistical disclosure) is dangerous. The authors discuss the approaches to security control mentioned in the previous paragraph in light of the different measures of disclosure. A comparative analysis of the methods contrasts random-sample queries with Beck's varying-output-perturbation and some fixed-data-perturbation methods. Precision and security criteria for COUNT and SUM queries are presented along with consistency, cost, and robustness criteria. The paper brings some newer threats to the attention of SDB researchers. Logical inference is a new problem arising from logic programming environments. Earlier work concentrated on sum and count queries, but regression analyses present a whole new set of confidentiality problems. The last section makes the point that no single method is adequate for security control. The authors make suggestions for future work to try to overcome the major practical problems of many of the current methods. The paper presents and synthesizes a great deal of published material in this surprisingly large research area. Particular methods are discussed in some detail. The bibliography is not as extensive as in some survey papers.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 21, Issue 4
    Dec. 1989
    107 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/76894
    Issue’s Table of Contents

    Copyright © 1989 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 December 1989
    Published in csur Volume 21, Issue 4

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader