Skip to main content

Bayesian Inference on Population Structure: From Parametric to Nonparametric Modeling

  • Chapter
Nonparametric Bayesian Inference in Biostatistics

Abstract

Making inference on population structure from genotype data requires to identify the actual subpopulations and assign individuals to these populations. The source populations are assumed to be in Hardy-Weinberg equilibrium, but the allelic frequencies of these populations and even the number of populations present in a sample are unknown. In this chapter we present a review of some Bayesian parametric and nonparametric models for making inference on population structure, with emphasis on model-based clustering methods. Our aim is to show how recent developments in Bayesian nonparametrics have been usefully exploited in order to introduce natural nonparametric counterparts of some of the most celebrated parametric approaches for inferring population structure. We use data from the 1000 Genomes project (http://www.1000genomes.org/) to provide a brief illustration of some of these nonparametric approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aldous, D. J. (1985). Exchangeability and related topics. Ecole d’ete de probabilites de Saint-Flour, XIII. Lecture notes in Mathematics N. 1117, Springer, Berlin.

    Google Scholar 

  • Alexander, D.H., Novembre, J. and Lange K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19, 1655–1664.

    Google Scholar 

  • Anderson, E.C. and Thompson, E.A. (2002). A model-based method for identifying species hybrids using multilocus genetic data. Genetics 160, 1217–1229.

    Google Scholar 

  • Balding, D.J. and Nichols, R.A. (1995). A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96, 3–12.

    Google Scholar 

  • Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1, 353–355.

    Google Scholar 

  • Corander, J., Waldmann, P. and Sillanpää, M.J. (2003). Bayesian analysis of genetic differentiation between populations. Genetics 163, 367–374.

    Google Scholar 

  • Corander, J., Waldmann, P., Marttinen, P. and Sillanpää, M.J. (2004). BAPS2: enhanced possibilities for the analysis of population structure. Bioinformatics 20, 2363–2369.

    Google Scholar 

  • Dawson, K.J. and Belkhir, K. (2001). A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet. Res. 78, 59–77.

    Google Scholar 

  • De Iorio, M., Elliott, L., Favaro, S., Adhikari, K. and Teh, Y.W. (2015). Modeling population structure under hierarchical Dirichlet processes. Preprint arXiv:1503.08278.

    Google Scholar 

  • Evanno, G., Regnaut, S. and Goudet, J. (2005). Detecting the number of clusters of individuals using the software Structure: a simulation study. Mol. Ecol. 14, 2611–2620.

    Google Scholar 

  • Falush, D., Stephens, M. and Pritchard, J.K. (2003). Inference of population structure from multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587.

    Google Scholar 

  • Falush, D., Stephens, M. and Pritchard, J.K. (2007). Inference of population structure using multi locus genotype data: dominant markers and null alleles. Mol. Ecol. Notes 7, 574–578.

    Google Scholar 

  • Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist., 1, 209–230.

    Article  MathSciNet  MATH  Google Scholar 

  • Field, D.L., Ayre, D.J., Whelan, R.J. and Young, A.G. (2011). Patterns of hybridization and asymmetrical gene flow in hybrid zones of the rare Eucalyptus aggregata and common E. rubida. Heredity 106, 841–853.

    Google Scholar 

  • Fritsch, A. and Ickstadt, K. (2009). Improved criteria for clustering based on the posterior similarity matrix. Bayesian Analysis, 4, 367–392.

    Article  MathSciNet  Google Scholar 

  • Hubisz, M.J., Falush, D., Stephens, M. and Pritchard, J.K. (2009). Inferring weak population structure with the assistance of sample group information. Mol. Ecol. Resources 9, 1322–1332.

    Google Scholar 

  • Huelsenbeck, J.P. and Andolfatto, P. (2007). Inference of population structure under a Dirichlet process model. Genetics 175, 1787–1802.

    Google Scholar 

  • Miller, J.W. and Harrison, M.T. (2014) Inconsistency of Pitman-Yor process mixtures for the number of components. Journal of Machine Learning Research 15, 3333–3370.

    Google Scholar 

  • Novembre, J. and Stephens, M. (2008) Interpreting principal components analyses of spatial population genetic variation. Nature Gentics 40, 646–649.

    Google Scholar 

  • Papaspiliopoulos, O. and Roberts, G.O. (2008). Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95, 169–186.

    Google Scholar 

  • Parker, H.G., Kim, L.V., Sutter, N.B., Carlson, S., Lorentzen, T.D., Malek, T.B., Johnson, G.S., DeFrance, H.B., Ostrander, E.A. and Kruglya, L. (2004). Genetic structure of the purebred domestic dog. Science 304, 1160–1164.

    Google Scholar 

  • Patterson, N., Price, A.L. and Reich, D. (2006) Population structure and eigenanalysis. PLoS Genetics 2, 2074–2093.

    Google Scholar 

  • Pella, J. and Masuda, M. (2006). The Gibbs and split-merge sampler for population mixture analysis from genetic data with incomplete baselines. Can. J. Fish. Aquat. Sci. 63, 576–596.

    Google Scholar 

  • Pritchard, J.K., Stephens, M. and Donelly, P. (2000). Inference on population structure using multilocus genotype data. Genetics 155, 945–959.

    Google Scholar 

  • Ranalla, B. and Mountain, J.L. (1997). Detecting immigration by using multilocus genotypes. Proc. Natl. Acad. Sci. 94, 9197–9201.

    Google Scholar 

  • Ray, A. and Quader, S. (2014). Genetic diversity and population structure of Lantana camara in India indicates multiple introductions and gene flow. Plant Biology 16, 651–658.

    Google Scholar 

  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica. 4, 639–650.

    MathSciNet  MATH  Google Scholar 

  • Teh, Y.W., Jordan, M.I., Beal, M,J. and Blei, D.M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101, 1566–1581.

    Google Scholar 

  • Walker, S.G. (2007). Sampling the Dirichlet mixture model with slices. Comm. Statist. Simulation Comput. 36, 45–54.

    Google Scholar 

  • Wasser, S.K., Mailand, C., Booth, R., Mutayoba, B., Kisamo, E., Clark, B. and Stephens, M. (2007). Using DNA to track the origin of the largest ivory seizure since the 1989 trade ban. Proceedings of the National Academy of Sciences 104, 4228–4233.

    Google Scholar 

Download references

Acknowledgements

We would like to thank Kaustubh Adhikari for kindly providing the pahsed data and Lloyd Elliott for developing user-friendly MATLAB functions for the linked hierarchical Dirichlet process. Stefano Favaro is supported by the European Research Council (ERC) through StG N-BNP 306406. Yee Whye Teh is supported by the European Research Council (ERC) through the European Unions Seventh Framework Programme (FP7/2007–2013) ERC grant agreement 617411.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria De Iorio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

De Iorio, M., Favaro, S., Teh, Y.W. (2015). Bayesian Inference on Population Structure: From Parametric to Nonparametric Modeling. In: Mitra, R., Müller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_7

Download citation

Publish with us

Policies and ethics