Skip to main content
Log in

Spatial and non-spatial model-based protection procedures for the release of business microdata

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper we discuss methodology for the safe release of business microdata. In particular we extend the model-based protection procedure of Franconi and Stander (2002, The Statistician 51: 1–11) by allowing the model to take account of the spatial structure underlying the geographical information in the microdata. We discuss the use of the Gibbs sampler for performing the computations required by this spatial approach. We provide an empirical comparison of these non-spatial and spatial disclosure limitation methods based on the Italian sample from the Community Innovation Survey. We quantify the level of protection achieved for the released microdata and the error induced when various inferences are performed. We find that although the spatial method often induces higher inferential errors, it almost always provides more protection. Moreover the aggregated areas from the spatial procedure can be somewhat more spatially smooth, and hence possibly more meaningful, than those from the non-spatial approach. We discuss possible applications of these model-based protection procedures to more spatially extensive data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alexander N., Moyeed R., and Stander J. 2000. Spatial modelling of individual level parasite counts using the negative binomial distribution. Biostatistics 1: 453–463.

    Google Scholar 

  • Bernardinelli L., Clayton D., and Montomoli C. 1995. Bayesian estimates of disease maps: How important are priors? Statistics in Medicine 14: 2411–2431.

    Google Scholar 

  • Bernardinelli L. and Montomoli C. 1992. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine 11: 983–1007.

    Google Scholar 

  • Bernardinelli L., Pascutto C., Best N.G., and Gilks W.R. 1997. Disease mapping with errors in covariates. Statistics in Medicine 16: 741–752.

    Google Scholar 

  • Besag J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B 36: 192–236.

    Google Scholar 

  • Besag J. 1989. Towards Bayesian image analysis. Journal of Applied Statistics 16: 395–407.

    Google Scholar 

  • Besag J. and Mollié A. 1989. Bayesian mapping of mortality rates. Bulletin of the International Statistical Institute 53(1): 127–128.

    Google Scholar 

  • Besag J., York J., and Mollié A. 1991. Bayesian image restoration, with two applications in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics 43: 1–59.

    Google Scholar 

  • Brand R. 2002. Microdata protection through noise addition. In: Domingo-Ferrer J. (Ed.), Inference Control in Statistical Databases: From Theory to Practice. Lecture Notes in Computer Science 2316. Springer-Verlag, Berlin, pp. 97–116.

    Google Scholar 

  • Burridge J. 2003. Information preserving statistical obfuscation. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 321–327.

    Google Scholar 

  • Cuppen M. and Willenborg L. 2003. Source data perturbation and consistent sets of safe tables. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 355–362.

    Google Scholar 

  • Dalenius T. and Reiss S.P. 1982. Data-swapping:Atechnique for disclosure control. Journal of Statistical Planning and Inference 6: 73–85.

    Google Scholar 

  • Dandekar R., Cohen M., and Kirkendall N. 2002. Sensitive micro data protection using Latin Hypercube Sampling technique. In: Domingo-Ferrer J. (Ed.), Inference Control in Statistical Databases: From Theory to Practice. Lecture Notes in Computer Science 2316. Springer-Verlag, Berlin, pp. 117–125.

    Google Scholar 

  • Defays D. and Anwar M.N. 1998. Masking microdata using microaggregaion. Journal of Official Statistics 14: 449–461.

    Google Scholar 

  • Diggle P.J., Tawn J.A., and Moyeed R.A. 1998. Model-based geostatistics (with discussion). Applied Statistics 47: 299–350.

    Google Scholar 

  • Diggle P.J., Moyeed R.A., Rowlingson B., and Thomson M. 2002. Childhood malaria in the Gambia: A case-study in model-based geostatistics. Applied Statistics 51: 493–506.

    Google Scholar 

  • Dobra A., Karr A.F., and Sanil A.P. 2003. Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 363–370.

    Google Scholar 

  • Domingo-Ferrer J. and Mateo-Sanz J.M. 2002. Practical dataoriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14: 189–201.

    Google Scholar 

  • Domingo-Ferrer J. and Torra V. 2003. Disclosure risk assessment in statistical microdata protection via advanced record linkage. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 343–354.

    Google Scholar 

  • Duncan G.T., Keller-McNulty S.A., and Stokes S.L. 2001. Disclosure risk vs. data utility: The R-U confidentiality map. Available from http://www.niss.org/technicalreports/tr121.pdf or as a Los Alamos National Laboratory Technical Report, LA-UR-01-6428.

  • Duncan G.T. and Mukherjee S. 2000. Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise. Journal of the American Statistical Association 95: 720–729.

    Google Scholar 

  • Duncan G.T. and Lambert D. 1989. The risk of disclosure for microdata. Journal of Business and Economic Statistics 7: 207–217.

    Google Scholar 

  • Fienberg S.E., Makov U., and Sanil A. 1997. A Bayesian approach to data disclosure: Optimal intruder behaviour for continuous data. Journal of Official Statistics 14: 75–89.

    Google Scholar 

  • Franconi L. and Stander J. 2002. A model based method for disclosure limitation of business microdata. The Statistician 51: 1–11.

    Google Scholar 

  • Geman S. and Geman D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6: 721–741.

    Google Scholar 

  • Gilks W.R., Richardson S., and Spiegelhalter D.J. 1996. Introducing Markov chain Monte Carlo. In: Gilks W.R., Richardson S., and Spiegelhalter D.J. (Eds.), Markov Chain Monte Carlo in Practice. Chapman and Hall, London, pp. 1–19.

    Google Scholar 

  • Mollié A. 1996. Bayesian mapping of disease. In: Gilks W.R., Richardson S., and Spiegelhalter D.J. (Eds.), Markov Chain Monte Carlo in Practice. Chapman and Hall, London, pp. 359–379.

    Google Scholar 

  • Muralidhar K. and Sarathy R. 2003. A rejoinder to the comments by Polettini and Stander. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 339–342.

    Google Scholar 

  • Pascutto C., Wakefield J.C., Best N.G., Richardson S., Bernardinelli L., Staines A., and Elliott P. 2000. Statistical issues in the analysis of disease mapping data. Statistics in Medicine 19: 2493–2519.

    Google Scholar 

  • Polettini S. 2003. Maximum entropy simulation for microdata protection. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 307–320.

    Google Scholar 

  • Polettini S., Franconi L., and Stander J. 2002. Model based disclosure protection. In: Domingo-Ferrer J. (Ed.), Inference Control in Statistical Databases: From Theory to Practice. Lecture Notes in Computer Science 2316. Springer-Verlag, Berlin, pp. 83–96.

    Google Scholar 

  • Raghunathan T.E., Reiter J.P., and Rubin D.B. 2002. Multiple imputation for statistical disclosure limitation. Technical Report, Department of Biostatistics, University of Michigan.

  • Reiter J.P. 2003. Model diagnostics for remote access regression servers. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 371–380.

    Google Scholar 

  • Rubin D.B. 1993. Discussion of “Statistical disclosure limitation.” Journal of Official Statistics 9: 461–468.

    Google Scholar 

  • Schouten B. and Cigrang M. 2003. Remote access systems for statistical analysis of microdata. Submitted for Publication to Statistics and Computing, Special Issue on Confidentiality 13: 381–389.

    Google Scholar 

  • Trottini M. 2001. Adecision-theoretic approach to data disclosure problems. Research in Official Statistics 4: 7–22.

    Google Scholar 

  • Willenborg L. and de Waal T. 2001. Elements of statistical disclosure control. Lecture Notes in Statistics 155. Springer-Verlag, New York.

    Google Scholar 

  • Winkler W.E. 1998. Re-identification methods for evaluating the confidentiality of analytically valid microdata. Research in Official Statistics 1: 87–104.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franconi, L., Stander, J. Spatial and non-spatial model-based protection procedures for the release of business microdata. Statistics and Computing 13, 295–305 (2003). https://doi.org/10.1023/A:1025654520307

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025654520307

Navigation