Abstract
In this paper we discuss methodology for the safe release of business microdata. In particular we extend the model-based protection procedure of Franconi and Stander (2002, The Statistician 51: 1–11) by allowing the model to take account of the spatial structure underlying the geographical information in the microdata. We discuss the use of the Gibbs sampler for performing the computations required by this spatial approach. We provide an empirical comparison of these non-spatial and spatial disclosure limitation methods based on the Italian sample from the Community Innovation Survey. We quantify the level of protection achieved for the released microdata and the error induced when various inferences are performed. We find that although the spatial method often induces higher inferential errors, it almost always provides more protection. Moreover the aggregated areas from the spatial procedure can be somewhat more spatially smooth, and hence possibly more meaningful, than those from the non-spatial approach. We discuss possible applications of these model-based protection procedures to more spatially extensive data sets.
Similar content being viewed by others
References
Alexander N., Moyeed R., and Stander J. 2000. Spatial modelling of individual level parasite counts using the negative binomial distribution. Biostatistics 1: 453–463.
Bernardinelli L., Clayton D., and Montomoli C. 1995. Bayesian estimates of disease maps: How important are priors? Statistics in Medicine 14: 2411–2431.
Bernardinelli L. and Montomoli C. 1992. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine 11: 983–1007.
Bernardinelli L., Pascutto C., Best N.G., and Gilks W.R. 1997. Disease mapping with errors in covariates. Statistics in Medicine 16: 741–752.
Besag J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B 36: 192–236.
Besag J. 1989. Towards Bayesian image analysis. Journal of Applied Statistics 16: 395–407.
Besag J. and Mollié A. 1989. Bayesian mapping of mortality rates. Bulletin of the International Statistical Institute 53(1): 127–128.
Besag J., York J., and Mollié A. 1991. Bayesian image restoration, with two applications in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics 43: 1–59.
Brand R. 2002. Microdata protection through noise addition. In: Domingo-Ferrer J. (Ed.), Inference Control in Statistical Databases: From Theory to Practice. Lecture Notes in Computer Science 2316. Springer-Verlag, Berlin, pp. 97–116.
Burridge J. 2003. Information preserving statistical obfuscation. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 321–327.
Cuppen M. and Willenborg L. 2003. Source data perturbation and consistent sets of safe tables. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 355–362.
Dalenius T. and Reiss S.P. 1982. Data-swapping:Atechnique for disclosure control. Journal of Statistical Planning and Inference 6: 73–85.
Dandekar R., Cohen M., and Kirkendall N. 2002. Sensitive micro data protection using Latin Hypercube Sampling technique. In: Domingo-Ferrer J. (Ed.), Inference Control in Statistical Databases: From Theory to Practice. Lecture Notes in Computer Science 2316. Springer-Verlag, Berlin, pp. 117–125.
Defays D. and Anwar M.N. 1998. Masking microdata using microaggregaion. Journal of Official Statistics 14: 449–461.
Diggle P.J., Tawn J.A., and Moyeed R.A. 1998. Model-based geostatistics (with discussion). Applied Statistics 47: 299–350.
Diggle P.J., Moyeed R.A., Rowlingson B., and Thomson M. 2002. Childhood malaria in the Gambia: A case-study in model-based geostatistics. Applied Statistics 51: 493–506.
Dobra A., Karr A.F., and Sanil A.P. 2003. Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 363–370.
Domingo-Ferrer J. and Mateo-Sanz J.M. 2002. Practical dataoriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14: 189–201.
Domingo-Ferrer J. and Torra V. 2003. Disclosure risk assessment in statistical microdata protection via advanced record linkage. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 343–354.
Duncan G.T., Keller-McNulty S.A., and Stokes S.L. 2001. Disclosure risk vs. data utility: The R-U confidentiality map. Available from http://www.niss.org/technicalreports/tr121.pdf or as a Los Alamos National Laboratory Technical Report, LA-UR-01-6428.
Duncan G.T. and Mukherjee S. 2000. Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise. Journal of the American Statistical Association 95: 720–729.
Duncan G.T. and Lambert D. 1989. The risk of disclosure for microdata. Journal of Business and Economic Statistics 7: 207–217.
Fienberg S.E., Makov U., and Sanil A. 1997. A Bayesian approach to data disclosure: Optimal intruder behaviour for continuous data. Journal of Official Statistics 14: 75–89.
Franconi L. and Stander J. 2002. A model based method for disclosure limitation of business microdata. The Statistician 51: 1–11.
Geman S. and Geman D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6: 721–741.
Gilks W.R., Richardson S., and Spiegelhalter D.J. 1996. Introducing Markov chain Monte Carlo. In: Gilks W.R., Richardson S., and Spiegelhalter D.J. (Eds.), Markov Chain Monte Carlo in Practice. Chapman and Hall, London, pp. 1–19.
Mollié A. 1996. Bayesian mapping of disease. In: Gilks W.R., Richardson S., and Spiegelhalter D.J. (Eds.), Markov Chain Monte Carlo in Practice. Chapman and Hall, London, pp. 359–379.
Muralidhar K. and Sarathy R. 2003. A rejoinder to the comments by Polettini and Stander. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 339–342.
Pascutto C., Wakefield J.C., Best N.G., Richardson S., Bernardinelli L., Staines A., and Elliott P. 2000. Statistical issues in the analysis of disease mapping data. Statistics in Medicine 19: 2493–2519.
Polettini S. 2003. Maximum entropy simulation for microdata protection. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 307–320.
Polettini S., Franconi L., and Stander J. 2002. Model based disclosure protection. In: Domingo-Ferrer J. (Ed.), Inference Control in Statistical Databases: From Theory to Practice. Lecture Notes in Computer Science 2316. Springer-Verlag, Berlin, pp. 83–96.
Raghunathan T.E., Reiter J.P., and Rubin D.B. 2002. Multiple imputation for statistical disclosure limitation. Technical Report, Department of Biostatistics, University of Michigan.
Reiter J.P. 2003. Model diagnostics for remote access regression servers. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 371–380.
Rubin D.B. 1993. Discussion of “Statistical disclosure limitation.” Journal of Official Statistics 9: 461–468.
Schouten B. and Cigrang M. 2003. Remote access systems for statistical analysis of microdata. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 381–389.
Trottini M. 2001. Adecision-theoretic approach to data disclosure problems. Research in Official Statistics 4: 7–22.
Willenborg L. and de Waal T. 2001. Elements of statistical disclosure control. Lecture Notes in Statistics 155. Springer-Verlag, New York.
Winkler W.E. 1998. Re-identification methods for evaluating the confidentiality of analytically valid microdata. Research in Official Statistics 1: 87–104.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Franconi, L., Stander, J. Spatial and non-spatial model-based protection procedures for the release of business microdata. Statistics and Computing 13, 295–305 (2003). https://doi.org/10.1023/A:1025654520307
Issue Date:
DOI: https://doi.org/10.1023/A:1025654520307