Abstract
In this paper we formulate a nonlinear optimization model to estimate population class sizes based on sample information. The model is nonconvex and has several local minima corresponding to different populations that could have been the source of the sample data. We show that many if not all local solutions can be found using a new global optimization algorithm called OptQuest/NLP (OQNLP). This can be used to estimate the number of individuals in a population with unique or rarely occurring characteristics, which is useful for assessing disclosure risk. It can also be used to estimate the number of classes in a population, a problem with applications in a variety of disciplines.
Similar content being viewed by others
References
Bethlehem J.G., Keller W.J., Pannekoek J. (1990) Disclosure control for microdata. J. Am. Stat. Assoc. 85, 38–45
Bunge J., Fitzpatrick M. (1993) Estimating the number of species: A review. J. Am. Stat. Assoc. 88, 364–373
Chen G., Keller-McNulty S. (1998) Estimation of identification disclosure risk in microdata. J. Official Stat. 14, 79–95
Dalenius T. (1981) A Simple Procedure for Controlled Rounding. Norstedts Tryckeri, Stockholm
Dalenius T., Reiss S.P. (1982) Data swapping: A technique for disclosure control. J. Stat. Plan. Infer. 6, 73–85
De Waal A.G., Willenborg L.C.R.J. (1998) Optimal local supression in microdata. J. Official Stat. 14, 421–435
Drud A. (1994) CONOPT—A Large Scale GRG Code. ORSA J. Comput. 6, 207–216
Efron B., Thisted R. (1976) Estimating the number of unseen species: How many words did Shakespear know. Biometrika 63, 435–447
Gill, P.E.,Murray,W., Saunders,M.A.: UsersGuide forSNOPTVersion 7,Department ofManagement Science and Engineering, Systems Optimization Laboratory, Stanford University, Stanford, CA, 94305-4026, USA, March 20, (2006)
Greenberg B.G., Zayatz L.V. (1992) Measuring risk in public use microdata files. Statistica Neerlandica 46, 33–48
Greenberg, B.S. New Approaches to Estimate Disclosure Risk, Presented at the NSF Confidentiality Workshop, Washington, DC, May 12–13 (2003). Retrieved June 1, 2005 from http://www.urban.org/nsfpresentations/pdfs/05_Greenberg.pdf
Haas, P., Naughton, J., Sehadri, S., Stokes, L. Sampling-based estimation of the number of distinct values of an attribute. VLDB 95: Proceedings of the International Conference on Very large Databases (In: Dayal, U., Gray, P., Nishio, S. (eds.) pp. 311–322 (1995).
Hoshino N. (2001) Applying Pittman’s sampling formula to microdata disclosure risk assessment. J. Official Stat. 17, 499–520
Kim, J. A method for limiting disclosure in microdata based on random noise and transformation. Proceedings of the Section on Survey Research Methods Section. American Statistical Association, Alexandria, VA pp. 370–374 (1986)
Laguna, M. Optimization of Complex Systems for OptQuest (1997). Retrieved May 23, 2005 from http://www.crystalball.com/optquest/complexsystems.html
Lasdon, L., Plummer, J., Ugray, Z., Bussieck, M. Improved filters and randomized drivers for multi-start global optimization. Submitted to Journal of Global Optimization, March 2005
Madigan D., York J.C. (1997) Bayesian methods for estimation of the size of a closed population. Biometrika 84(1): 19–31
Nash S.G., Sofer A. (1996) Linear and Nonlinear Programming. McGraw-Hill, New York
Skinner, C.J., Holmes, D.J. Modelling population uniqueness. Proceedings of the International Seminar on Statistical Confidentiality. pp. 175–199. Statistical Office of the European Communities, Luxembourg, (1993)
Smith-Cayama, R.A., Thomas, D.R. Estimating the number of distinct valid signatures in initiative petitions. Proceedings of the Survey Research Methods Section. pp. 238–243. American Statistical Association, Alexandria, VA, (1999)
Takemura, Some superpopulation models for estimating the number of population uniques. Statistical Data Protection—Proceedings of the Conference, Lisbon, 25–27 March 1998–1999 edition, pp. 59–76. Office for Official Publications of the European Communities, Luxembourg (1999)
Ugray, Z., Plummer, J.C., Glover, F.W., Kelly, J., Lasdon, L.S., Marti, R. A multistart scatter search heuristic for smooth NLP and MINLP problems. Conference on Adaptive Memory and Evolution: Tabu Search and Scatter Search. University of Mississippi at Oxford, March 8–10, (2001)
Ugray, Z., Plummer, J.C., Glover, F.W., Kelly, J., Marti, R. Scatter search and local NLP solvers: A multistart framework for global optimization. To appear in INFORMS Journal on Computing.
White, J.K., Sangiovanni-vincentelli, A. Relaxation Techniques for the Simulation of VLSI Circuits, Kluwer Academic Publishers (1987)
Zayatz, L.V. Estimation of the percent of unique population elements in microdata file using the sample. Statistical Research Division Report Series, Census/SRD/RR-91/08 (1991).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Greenberg, B.S., Lasdon, L.S. Using Global Optimization to Estimate Population Class Sizes. J Glob Optim 36, 319–338 (2006). https://doi.org/10.1007/s10898-006-9011-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-006-9011-6