Abstract
Assume that a random sample of size m is selected from a population containing a countable number of classes (subpopulations) of elements (individuals). A partition of the set of sample elements into (unordered) subsets, with each subset containing the elements that belong to same class, induces a random partition of the sample size m, with part sizes {Z 1,Z 2,...,Z N } being positive integer-valued random variables. Alternatively, if N j is the number of different classes that are represented in the sample by j elements, for j=1,2,...,m, then (N 1,N 2,...,N m ) represents the same random partition. The joint and the marginal distributions of (N 1,N 2,...,N m ), as well as the distribution of \(N=\sum^m_{j=1}N_{\!j}\) are of particular interest in statistical inference. From the inference point of view, it is desirable that all the information about the population is contained in (N 1,N 2,...,N m ). This requires that no physical, genetical or other kind of significance is attached to the actual labels of the population classes. In the present paper, combinatorial, probabilistic and compound sampling models are reviewed. Also, sampling models with population classes of random weights (proportions), and in particular the Ewens and Pitman sampling models, on which many publications are devoted, are extensively presented.
Similar content being viewed by others
References
C. E. Antoniak, “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,” Annals of Statistics vol. 2 pp. 1152–1174, 1974.
R. Arratia, A. D. Barbour, and S. Tavaré, “Poisson process approximations for the Ewens sampling formula,” Annals of Applied Probability vol. 2 pp. 519–535, 1992.
D. E. Barton and F. N. David, “Contagious occupancy,” Journal of the Royal Statistical Society, Series B vol. 21 pp. 120–123, 1959a.
D. E. Barton and F. N. David, “Haemacytometer counts and occupancy theory,” Trabajos de Estadistica vol. 10 pp. 13–18, 1959b.
T. Cacoullos and Ch. A. Charalambides, “On minimum variance unbiased estimation for truncated binomial and negative binomial distributions,” Annals of the Institute of Statistical Mathematics vol. 27 pp. 235–244, 1975.
Ch. A. Charalambides, “The asymptotic normality of certain combinatorial distributions,” Annals of the Institute of Statistical Mathematics vol. 28 pp. 499–506, 1976.
Ch. A. Charalambides, “On a restricted occupancy model and its applications,” Biometrical Journal vol. 23 pp. 601–610, 1981.
Ch. A. Charalambides, “On restricted and pseudo-contagious occupancy distributions,” Journal of Applied Probability vol. 20 pp. 872–876, 1983.
Ch. A. Charalambides, Enumerative Combinatorics, CRC Press: Boca Raton, FL, 2002.
Ch. A. Charalambides, Combinatorial Methods in Discrete Distributions, Wiley: Hoboken, NJ, 2005.
A. De Moivre, The Doctrine of Chances, Pearson: London, 1718 (2nd ed. 1738 and 3rd ed. 1756).
P. Donnelly, “Partitions structures, Pólya urns, the Ewens sampling formula, and the ages of alleles,” Theoretical Population Biology vol. 30 pp. 271–288, 1986.
P. Donnelly and G. Grimmett, “On the asymptotic distribution of large prime factors,” Journal of the London Mathematical Society vol. 47 pp. 395–404, 1993.
P. Donnelly and S. Tavaré, “The ages of alleles and a coalescent,” Advances in Applied Probability vol. 18 pp. 1–19, 1986.
S. Engen, Stochastic Abundance Models with Emphasis on Biological Communities and Species Diversity, Chapman & Hall: London, UK, 1978.
W. J. Ewens, “The sampling theory of selectively neutral alleles,” Theoretical Population Biology vol. 3 pp. 87–112, 1972.
W. Feller, An Introduction to Probability Theory and its Applications, (vol. 1, 3rd edn) Wiley: New York, 1968.
C. M. Goldie, “Records, permutations and greatest convex minorants,” Mathematical Proceedings of the Cambridge Philosophical Society vol. 106 pp. 169–177, 1989.
R. C. Griffiths, “Lines of descent in the diffusion approximation of neutral Wright–Fisher models,” Theoretical Population Biology vol. 17 pp. 37–50, 1980.
J. C. Hansen, “A functional central limit theorem for the Ewens sampling formula,” Journal of Applied Probability vol. 27 pp. 28–43, 1990.
F. M. Hoppe, “Pólya-like urns and the Ewens sampling formula,” Journal of Mathematical Biology vol. 20 pp. 91–99, 1984.
F. M. Hoppe, “Size-biased filtering of Poisson–Dirichlet samples with an application to partition structures in genetics,” Journal of Applied Probability vol. 23 pp. 1008–1012, 1986.
F. M. Hoppe, “The sampling theory of neutral alleles and an urn model in population genetics,” Journal of Mathematical Biology vol. 25 pp. 123–159, 1987.
N. Hoshino, “Engen’s extended negative binomial model revisited,” Annals of the Institute of Statistical Mathematics vol. 57 pp. 369–387, 2005.
T. Huillet, “Sampling formulae arising from random Dirichlet populations,” Communications in Statistics. Theory and Methods vol. 34 pp. 1019–1040, 2005.
N. L. Johnson and S. Kotz, Urn Models and Their Applications, Wiley: New York, 1977.
N. L. Johnson and S. Kotz, “Developments in discrete distributions, 1969–1980,” International Statistical Review vol. 50 pp. 71–101, 1982.
N. L. Johnson, S. Kotz, and N. Balakrishnan, Discrete Multivariate Distributions, Wiley: New York, 1997.
N. L. Johnson, S. Kotz, and A. W. Kemp, Univariate Discrete Distributions, (2nd edn) Wiley: New York, 1992.
P. Joyce, “Partition structures and sufficient statistics,” Journal of Applied Probability vol. 35 pp. 622–632, 1998.
S. Karlin and J. McGregor, “Addendum to a paper of W. Ewens,” Theoretical Population Biology vol. 3 pp. 113–116, 1972.
F. P. Kelly, “On stochastic population models in genetics,” Journal of Applied Probability vol. 13 pp. 127–131, 1976.
F. P. Kelly, “Exact results for the Moran neutral allele model,” Advances of Applied Probability vol. 9 pp. 197–201, 1977.
R. Keener, E. Rothman, and N. Starr, “Distributions on partitions,” Annals of Statistics vol. 15 pp. 1466–1481, 1987.
J. F. C. Kingman, “Random discrete distributions,” Journal of Royal Statistical Society, Series B vol. 37 pp. 1–22, 1975.
J. F. C. Kingman, “The population structure associated with the Ewens sampling formula,” Theoretical Population Biology vol. 11 pp. 274–283, 1977.
J. F. C. Kingman, “Random partitions in population genetics,” Proceedings of the Royal Society London, Series A vol. 361 pp. 1–20, 1978a.
J. F. C. Kingman, “The representation of partition structures,” Journal of the London Mathematical Society vol. 18 pp. 374–380, 1978b.
J. F. C. Kingman, “On the genealogy of large populations,” Journal of Applied Probability vol. 19A pp. 27–43, 1982a.
J. F. C. Kingman, “The coalescent,” Stochastic Processes and Their Applications vol. 13 pp. 235–248, 1982b.
S. Kotz and N. Balakrishnan, “Advances in urn models during the past two decades.” In N. Balakrishnan (ed.), Advances in Combinatorial Methods and Applications to Probability and Statistics, pp. 203–257, Birkhäuser: Boston, MA, 1997.
M. Koutras, “Non-central Stirling numbers and some applications,” Discrete Mathematics vol. 42 pp. 73–89, 1982.
S. Kullback, “On certain distributions derived from the multinomial distribution,” Annals of Mathematical Statistics vol. 8 pp. 128–144, 1937.
J. W. McGloskey, “A model for the distribution of individuals by species in an environment,” Ph.D. thesis, Michigan State University, 1965.
K. Nishimura and M. Sibuya, “Extended Stirling family of discrete probability distributions,” Communications in Statistics. Theory and Methods vol. 26 pp. 1727–1744, 1997.
G. P. Patil and S. Bildikar, “On minimum variance unbiased estimation for the logarithmic series distribution,” Sankyā, Series A vol. 28 pp. 239–250, 1966.
G. P. Patil and C. Taillie, “Diversity as a concept and its applications for random communities,” Bulletin of the International Statistical Institute vol. XLVII pp. 497–515, 1977.
G. P. Patil and J. K. Wani, “On certain structural properties of the logarithmic series distribution and the first type Stirling distribution,” Sankyā, Series A vol. 27 pp. 271–280, 1965.
M. Perman, J. Pitman, and M. Yor, “Size-biased sampling of Poisson point processes and excursions,” Probability Theory and Related Fields vol. 92 pp. 21–39, 1992.
J. Pitman, “Exchangeable and partially exchangeable random partitions,” Probability Theory and Related Fields vol. 102 pp. 145–158, 1995.
J. Pitman, “Random discrete distributions invariant under size-biased permutation,” Advances in Applied Probability vol. 28 pp. 525–539, 1996.
J. Pitman and M. Yor, “The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator,” Annals of Probability vol. 25 pp. 855–900, 1997.
G. B. Price, “Distributions derived from the multinomial expansion,” American Mathematical Monthly vol. 53 pp. 59–74, 1946.
V. Romanovsky, “Su due problemi di distribuzione casuale,” Giornalle dell’ Istituto Italiano degli Attuari vol. 5 pp. 196–218, 1934.
M. Sibuya, “A random clustering process,” Annals of the Institute of Statistical Mathematics vol. 45 pp. 459–465, 1993.
M. Sibuya and H. Yamato, “Ordered and unordered random partitions of an integer and the GEM distribution,” Statistics & Probability Letters vol. 25 177–183, 1995.
F. M. Steward, “Variability in the amount of heterozygosity maintained by neutral mutations,” Theoretical Population Biology vol. 9 pp. 188–201, 1976.
A. C. Trajstman, “On a conjecture of G. A. Watterson,” Advances in Applied Probability vol. 6 pp. 489–493, 1974.
G. Trieb, “A Pólya urn model and the coalescent,” Journal of Applied Probability vol. 29 pp. 1–10, 1992.
G. A. Watterson, “Models for the logarithmic species abudance distributions,” Theoretical Population Biology vol. 6 pp. 217–250, 1974a.
G. A. Watterson, “The sampling theory of selectively neutral alleles,” Advances in Applied Probability vol. 6 pp. 463–488, 1974b.
G. A. Watterson, “The stationary distribution of the infinitely-many neutral alleles diffusion model,” Journal of Applied Probability vol. 13 pp. 639–651, 1976.
H. Yamato, “A Pólya urn model with a continuum of colours,” Annals of the Institute of Statistical Mathematics vol. 45 pp. 453–458, 1993.
H. Yamato and M. Sibuya, “Moments of some statistics of Pitman sampling formula,” Bulletin of Informatics and Cybernetics vol. 32 pp. 1–10, 2000.
H. Yamato, M. Sibuya, and T. Nomachi, “Ordered sample from two-parameter GEM distribution,” Statistics & Probability Letters vol. 55 pp. 19–27, 2001.
J. E. Young, “Binary sequential representations of random partitions,” Bernoulli vol. 11 pp. 847–861, 2005.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Charalambides, C.A. Distributions of Random Partitions and Their Applications. Methodol Comput Appl Probab 9, 163–193 (2007). https://doi.org/10.1007/s11009-007-9018-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-007-9018-6
Keywords
- Combinatorial sampling model
- Compound sampling model
- Dirichlet–Poisson distribution
- Exchangeable random partitions
- Ewens sampling formula
- Partition structures
- Pitman sampling formula
- Pólya urn model
- Stirling numbers