Abstract
We examine several methods for drawing a sequential random sample of n records from a file containing N records. Method D is recommended for general use. The algorithm is on-line (so that CPU time can be overlapped with I/O), has a small constant memory requirement, and is easy to program. An improved implementation is detailed in the Appendix.
- 1 AHRENS, J. H., AND DIETER, U. Computer methods for sampling from the exponential and normal distributions. Commun. ACM 15, 10 (Oct. 1972), 873-882. Google Scholar
- 2 AHRENS, J. H., AND DIETER, U. Sequential random sampling. ACM Trans. Math. Softw. 11, 2 (June 1985) 157-169. Google Scholar
- 3 ERNVALL, J., AND NEVALAINEN, O. An algorithm for unbiased random sampling. Comput. J. 25, 1 (Jan. 1982) 45-47.Google Scholar
- 4 FAN, C. T., MULLER, M. E., AND REZUCHA, I. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Star. Assn. J. 57 (June 1962) 387-402.Google Scholar
- 5 GEHRKE, H. Einfache sequentielle Stichprobenentnahme. Diplomarbeit, Universit/it Kiel, Kiel, West Germany (Aug. 1984).Google Scholar
- 6 JONES, T.G. A note on sampling a tape file. Commun. ACM 5, 6 (June 1962) 343. Google Scholar
- 7 KAWARASAKI, J., AND SIBUYA, M. Random numbers for simple random sampling without replacement. Keio Math. Sere. Rep. 7 (1982) 1-9.Google Scholar
- 8 KNUTH, D. E. The Art of Computer Programming. Vol. 2, Seminumerical Algorithms, 2d ed. Addison-Wesley, Reading, Mass. (1981). Google Scholar
- 9 VTrER, J.S. Faster methods for random sampling. Commun. ACM 27, 7 (July 1984) 703-718. Google Scholar
Index Terms
- An efficient algorithm for sequential random sampling
Recommendations
Faster methods for random sampling
Several new methods are presented for selecting n records at random without replacement from a file containing N records. Each algorithm selects the records for the sample in a sequential manner—in the same order the records appear in the file. The ...
Sequential random sampling
Fast algorithms for selecting a random set of exactly k records from a file of n records are constructed. Selection is sequential: the sample records are chosen in the same order in which they occur in the file. All procedures run in O(k) time. The “...
Randomness-efficient oblivious sampling
SFCS '94: Proceedings of the 35th Annual Symposium on Foundations of Computer ScienceWe introduce a natural notion of obliviousness of a sampling procedure, and construct a randomness-efficient oblivious sampler. Our sampler uses O(l+log /spl delta//sup -1//spl middot/log l) coins to output m=poly(/spl epsiv//sup -1/, log /spl delta//...
Comments