Abstract
This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation.
Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented.
To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1).
We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.
- ABUL-ELA, A.-L., GREENBERG, B. G., AND HORViTZ, D. G. 1967. A multi-proportions randomized response model. J. Am. Stat. Assoc. 62, 319 (Sept.), 990-1008.]]Google Scholar
- ACHUGBUE, J. O., AND CHIN, F. Y. 1979. The effectiveness of output modification by rounding for protection of statistical databases. INFOR 17, 3 (Aug.), 209-218.]]Google Scholar
- BECK, L. L. 1980. A security mechanism for statistical databases. ACM Trans. Database Syst. 5, 3 (Sept.), 316-338.]] Google Scholar
- CHIN, F. Y. 1978. Security in statistical databases for queries with small counts. A CM Trans. Database Syst. 3, 1, 92-104.]] Google Scholar
- CHIN, F. Y., KOSSOWSKI, P., AND LOH, S. C. 1984. Efficient inference control for range sum queries. TheoL. Comput. Sci. 32, 77-86.]]Google Scholar
- CHIN, F. Y., AND C)ZSOYOC, LU, G. 1982. Auditing and inference control in statistical databases. IEEE Trans. Softw. Eng. SE-8, 6 (Apr.), 574-582.]]Google Scholar
- CHIN, F. Y., AND (~)ZSOYOGLU, G. 1981. Statistical database design. A CM Trans. Database Syst. 6, 1 (Mar.), 113-139.]] Google Scholar
- CHIN, F. Y., AND 0ZSOYO~,LU, G. 1979. Security in partitioned dynamic statistical databases. In Proceedings of the iEEE COMPSAC, pp. 594-601.]]Google Scholar
- Cox, L. H. 1980. Suppression methodology and statistical disclosure control. J. Am. Star. Assoc. 75, 370 (June), 377-385.]]Google Scholar
- DALENIUS, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift 3, 202-208.]]Google Scholar
- DALENIUS, T. 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429-444.]]Google Scholar
- DALENIUS, T. 1974. The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12, 213-225.]]Google Scholar
- DENNING, D. E. 1985. Commutative filters for reducing inference threats in multilevel database systems. In Proceedings of the 1985 Symposium on Security and Privacy, IEEE Computer Society, pp. 134-146.]]Google Scholar
- DENNING, D. E. 1984. Cryptographic check-sums for multilevel database security. In Proceedings of the 1984 Symposium on Security and Privacy, IEEE Computer Society, pp. 52-61.]]Google Scholar
- DENNING, D. r. 1983. A security model for the statistical database problem. In Proceedings of the 2nd International Workshop on Management, pp. 1-16.]] Google Scholar
- DENNING, D. E. 1982. Cryptography and Data Security. Addison~Wesley, Reading, Mass.]] Google Scholar
- DENNING, D. E. 1981. Restricting queries that might lead to compromise. In Proceedings of IEEE Symposium on Security and Privacy (Apr.), pp. 33-40.]]Google Scholar
- DENNING, D. E. 1980. Secure statistical databases with random sample queries. A CM Trans. Database Syst. 5, 3 (Sept.), 291-315.]] Google Scholar
- DENNING, D. E., AND SCHLORER, J. 1983. Inference control for statistical databases. Computer 16, 7 (July), 69-82.]]Google Scholar
- DENNING, D. E., AND SCHLORER, J. 1980. A fast procedure for finding a tracker in a statistical database. ACM Trans. Database Syst. 5, 1 (Mar.), 88-102.]] Google Scholar
- DENNING, D. E., SCHLORER, J., AND WEHRLE, E. 1982. Memoryless inference controls for statistical databases. Computer Science Dept., Purdue Univ.]]Google Scholar
- DENNING, D. E., DENNING, P. J., ANO SCHWARTZ, M. D. 1979. The tracker: A threat to statistical database security. A CM Trans. Database Syst. 4, I (Mar.), 76-96.]] Google Scholar
- DOBKIN, D., JONES, A. K., AND LIPTON, R. J. 1979. Secure databases: Protection against user influence. ACM Trans. Database Syst. 4, I (Mar.), 97-106.]] Google Scholar
- FELLEGI, I. r. 1972. On the question of statistical confidentiality. J. Am. Stat. Assoc. 67, 337 (Mar.), 7-18.]]Google Scholar
- FELLEGI, I. P., AND PHILLIPS, J. r. 1974. Statistical confidentiality: Some theory and applications to data dissemination. Ann. Ec. Soc. MeaN. 3, 2 (Apr.), 399-409.]]Google Scholar
- FRIEDMAN, A. D., AND HOFFMAN, L. J. 1980. Towards a fail-safe approach to secure databases. In Proceedings of IEEE Symposium on Security and Privacy (Apr.).]]Google Scholar
- GHOSH, S. P. 1986. Statistical relational tables for statistical database management. IEEE Trans. Softw. Eng. SE-12, 12, 1106-1116.]] Google Scholar
- GHOSU, S. P. 1985. An application of statistical databases in manufacturing testing. IEEE Trans. Softw. Eng. SE-11, 7, 591-596.]]Google Scholar
- GHOSH, S. P. 1984. An application of statistical databases in manufacturing testing, in Proceedings of IEEE COMPDEC Conference.]] Google Scholar
- GREENBERG, B. G., ABERNATHY, J. R., AND HORVITZ, D. G. 1969a. Application of randomized response technique in obtaining quantitative data. In Proceedings of Social Statistics Section, America, Statistical Association, (Aug.), 40-43.]]Google Scholar
- GREENBERG, B. G., ABUL-ELA, A.-L., SIMMONS, W. R., AND HORVITZ, U. G. 1969b. The unrelated question randomized response model: Theoretical framework. J. Am. Star. Assoc. 64, 326 (June), 520-539.]]Google Scholar
- HAQ, M. I. UL. 1977. On safeguarding statistical disclosure by giving approximate answers to queries. In Proceedings of International Computer Symposium (North-Holland), pp. 491-495.]]Google Scholar
- HAQ, M. I. UL. 1975. Insuring individual's privacy from statistical database users. In Proceedings of National Computer Conference (Montvale, N.J.), vol. 44. AFIPS Press, Arlington, Va., pp. 941-946.]]Google Scholar
- HOFFMAN, L. J. 1977. Modern Methods for Computer Security and Privacy. Prentice-Hall, Englewood Cliffs, N.J.]]Google Scholar
- HOFFMAN, L. J., AND MILLER, W. F. 1970. Getting a personal dossier from a statistical data bank. Datarnation 16, 5 (May), 74-75.]]Google Scholar
- JONGE, W. DE 1983. Compromising statistical databases: Responding to queries about means. ACM Trans. Database Syst. 8, i (Mar.), 60-80.]] Google Scholar
- KAM, J. B., AND ULLMAN, J. D. 1977. A model of statistical databases and their security. A CM Trans. Database Syst. 2, 1, 1-10.]] Google Scholar
- LEFONS, D., SILVESTRI, A., AND TANGORRA, F. 1983. An analytic approach to statistical databases. In Proceedings of 9th Conference on Very Large Databases (Florence, Italy), pp. 260-273.]] Google Scholar
- LEISS, E. 1982. Randomizing a practical method for protecting statistical databases against compromise. In Proceedings of 8th Conference on Very Large Databases, pp. 189-196.]] Google Scholar
- LIEW, C. K., CHOI, W. J., AND LIEW, C. J. 1985. A data distortion by probability distribution. A CM Trans. Database Syst. 10, 3, 395-411.]] Google Scholar
- MATLOFr, N. E. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy, pp. 173-180.]]Google Scholar
- MCLEISH, M. 1983. An information theoretic approach to statistical databases and their security: A preliminary report. In Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 355-359.]] Google Scholar
- MILLER, A. R. 1971. The Assault on Privacy-Com~ puters, Data Banks and Dossiers. University of Michigan Press, Ann Arbor, Mich.]]Google Scholar
- MORCENSTERN, M. 1987. Security and Inference in Multi-level Database and Knowledge-Bare Systems. In Proceedings of A CM Special Interest Group on Management of Data, pp. 357-373.]] Google Scholar
- 0ZSOYOGLU, G., AND CHIN, F. Y. 1982. Enhancing the security of statistical databases with a ques* tion-answering system and a kernel design. IEEE Trans. Softw. Eng. SE-8, 3, 223-234.]]Google Scholar
- C)ZSOYO~LU, G., AND CHUNG, J. 1986. Information loss in the lattice model of summary tables due to cell suppression. In Proceedings of IEEE Symposium on Security and Privacy, pp. 75-83.]] Google Scholar
- 0ZSOYOSLU, G., AND ()ZSOYOS, LU, M. 1981. Update handling techniques in statistical databases. In Proceedings of the 1st LBL Workshop on Statistical Database Management (Berkeley, Calif., Dec.), pp. 249-284.]] Google Scholar
- 0ZSOYOGLU, G., AND Su, T. A. 1985. Rounding and inference control in conceptual models for statistical databases. In Proceedings of IEEE Symposium on Security and Privacy, pp. 160-173.]]Google Scholar
- PALLEY, M. A. 1986. Security of statistical databases compromise through attribute correlational modeling. In Proceedings of IEEE Conference on Data Engineering, pp. 67-74.]] Google Scholar
- PALLEY, M. A., AND SIMONOFF, J. S. 1987. The use of regression methodology for compromise of confidential information in statistical databases. ACM Trans. Database Syst. 12, 4 (Dec.), 593-608.]] Google Scholar
- REISS, J. P. 1980. Practical data-swapping: The first steps. In Proceedings of IEEE Symposium on Security and Privacy, pp. 36-44.]]Google Scholar
- REISS, S. P. 1984. Practical data swapping: The first steps. ACM Trans. Database Syst. 9, I (Mar.), 20-37.]] Google Scholar
- ROWE, N. 1984. Diophantine inference from statistical aggregates on few-valued attributes. In Proceedings of IEEE Conference on Data Engineering, pp. 107-110.]] Google Scholar
- SANDE, G. 1983. Automated cell suppression to reserve confidentiality of business statistics. In Proceedings of the 2nd International Workshop on Statistical Database Management, pp. 346-353.]] Google Scholar
- SCHLORER, J. 1983. Information loss in partitioned statistical databases. Comput. J. 26, 3, 218-223.]]Google Scholar
- SCHLORER, J. 1981. Security of statistical databases: multidimensional transformation. A CM Trans. Database Syst. 6, i (Mar.), 95-112.]] Google Scholar
- SCHLORER, J. 1980. Disclosure from statistical databases: Quantitative aspects of trackers. A CM Trans. Database Syst. 5, 4 (Dec.), 467-492.]] Google Scholar
- SCHLORER, J. 1976. Confidentiality of statistical records: A threat monitoring scheme of on-line dialogue. Methods Inform. Med. 15, 1, 36-42.]]Google Scholar
- SCHLORER, J. 1975. Identification and retrieval of personal records from a statistical data bank. Methods Info. Med. 14, i, 7-13.]]Google Scholar
- SCHWARTZ, M. D., DENNING, D. E., AND DENNING, P. J. 1979. Linear queries in statistical databases. ACM Trans. Database Syst. 4, 2, 156-167.]] Google Scholar
- Su, T., AND 0ZSOYOS, LU, G. 1987. Data dependencies and inference control in multilevel relational database systems. In Proceedings of the 1987 Symposium on Security and Privacy, IEEE Computer Society, pp. 202-211.]]Google Scholar
- TENDICK, P., AND MATLOFr, N. S. 1987. Recent results on the noise addition method for database security. Presented at the Joint ASA/IMS Statis~ tical Meetings, San Francisco.]]Google Scholar
- TRAUB, J. F., YEMINI, Y., AND WOZNIAKOWSKI, H. 1984. The statistical security of a statistical database. ACM Trans. Database Syst. 9, 4 (Dec.), 672-679.]] Google Scholar
- TRUEBLOOD, R. P. 1984. Security issues in knowledge systems. In Proceedings of I st International Workshop on Expert Database Systems, vol. 2, pp. 834-840.]]Google Scholar
- TURN, R., AND SHAPIRO, N. Z. 1978. Privacy and security in databank systems: Measure of effectiveness, costs, and protector-intruder interactions. Computers and Security, C. T. Dinardo, Ed. AFIPS Press, Arlington, Va., pp. 49-57.]]Google Scholar
- WARNER, S. L. 1971. The linear randomized response model. J. Am. Star. Assoc. 66, 336 (Dec.), 884-888.]]Google Scholar
- WARNER, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60, 309 (Mar.), 63-69.]]Google Scholar
- Yu, C. T., ANO CmN, F. Y. 1977. A study on the protection of statistical databases. In Proceedings of A CM SIGMOD International Conference on Management of Data (Aug.), pp. 169-181.]] Google Scholar
Recommendations
Disclosure Analysis and Control in Statistical Databases
ESORICS '08: Proceedings of the 13th European Symposium on Research in Computer Security: Computer SecurityDisclosure analysis and control are critical to protect sensitive information in statistical databases when some statistical moments are released. A generic question in disclosure analysis is whether a data snooper can deduce any sensitive information ...
Distributed privacy-preserving methods for statistical disclosure control
DPM'09/SETOP'09: Proceedings of the 4th international workshop, and Second international conference on Data Privacy Management and Autonomous Spontaneous SecurityStatistical disclosure control (SDC) methods aim to protect privacy of the confidential information included in some databases, for example by perturbing the non-confidential parts of the original databases. Such methods are commonly used by statistical ...
Density-based microaggregation for statistical disclosure control
Protection of personal data in statistical databases has recently become a major societal concern. Statistical disclosure control (SDC) is often applied to statistical databases before they are released for public use. Microaggregation for SDC is a ...
Comments