Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter August 3, 2013

Improving the efficiency of genomic selection

  • Marco Scutari EMAIL logo , Ian Mackay and David Balding

Abstract

We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets with continuous phenotypic traits from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.


Corresponding author: Marco Scutari, Genetics Institute, University College London (UCL), London WCIE 6BT, UK

The work presented in this paper forms part of the MIDRIB project, which is funded by the UK Technology Strategy Board (TSB) and Biotechnology & Biological Sciences Research Council (BBSRC), grant TS/I002170/1. We thank our project partners for helpful discussions. We also thank the AGOUEB Consortium (supported by UK DEFRA, the Scottish Government, through the Sustainable Arable LINK Program Grant 302/BB/D522003/1) for making their data available.

References

Aliferis, C. F., A. Statnikov, I. Tsamardinos, S. Mani and X. D. Xenofon (2010): “Local causal and markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation,” J. Mach. Learn. Res., 11, 171–234.Search in Google Scholar

Astle, W. and D. J. Balding (2009): ”Population structure and cryptic relatedness in genetic association studies,” Stat. Sci., 24, 451–471.Search in Google Scholar

Bravo, H. C., K. E. Leeb, B. E. K. Kleinb, R. Kleinb, S. K. Iyengarc and G. Wahbad (2009): “Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models,” PNAS, 106, 8128–8133.10.1073/pnas.0902906106Search in Google Scholar PubMed PubMed Central

Cockram, J., J. White, D. L. Zuluaga, D. Smith, J. Comadran, M. Macaulay, Z. Luo, M. J. Kearsey, P. Werner, D. Harrap, C. Tapsell, H. Liu, P. E. Hedley, N. Stein, D. Schulte, B. Steuernagel, D. F. Marshall, W. T. Thomas, L. Ramsay, I. Mackay, D. J. Balding, The AGOUEB Consortium, R. Waugh and D. M. O’Sullivan (2010): “Genome-wide association mapping to candidate polymorphism resolution in the unsequenced barley genome,” PNAS, 107, 21611–21616.10.1073/pnas.1010179107Search in Google Scholar PubMed PubMed Central

de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler and M. P. L. Calus (2012): “Whole-genome regression and prediction methods applied to plant and animal breeding,” Genetics, 193, 327–345.10.1534/genetics.112.143313Search in Google Scholar PubMed PubMed Central

Forni, S., I. Aguilar and I. Misztal (2011): “Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information,” Genet. Sel. Evol., 43, 1–7.Search in Google Scholar

Friedman, J. H., T. Hastie and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Soft., 33, 1–22.Search in Google Scholar

Gianola, D., G. de los Campos, W. G. Hill, E. Manfredi and R. Fernando (2009): “Additive genetic variability and the bayesian alphabet,” Genetics, 183, 347–363.10.1534/genetics.109.103952Search in Google Scholar PubMed PubMed Central

Goeman, J. J. (2012): penalized R package, R package version 0.9-41.Search in Google Scholar

Guan, Y. and M. Stephens (2011): “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” Ann. Appl. Stat., 5, 1780–1815.Search in Google Scholar

Habier, D., R. L. Fernando and J. C. M. Dekkers (2007): “The impact of genetic relationship information on genome-assisted breeding values,” Genetics, 177, 2389–2397.10.1534/genetics.107.081190Search in Google Scholar PubMed PubMed Central

Hastie, T., R. Tibshirani, B. Narasimhan and G. Chu (2012): impute: Imputation for Microarray Data, R package version 1.30.0.Search in Google Scholar

Hayes, B. J., P. J. Bowman, A. J. Chamberlain and M. E. Goddard (2009): “Genomic selection in dairy cattle: progress and challenges,” J. Dairy Sci., 92, 433–443.Search in Google Scholar

Heffner, E. L., M. E. Sorrells and J.-L. Jannink (2009): “Genomic selection for crop improvement,” Crop Sci., 49, 1–12.Search in Google Scholar

Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, 12, 55–67.10.1080/00401706.1970.10488634Search in Google Scholar

Hooper, J. W. (1958): “The sampling variance of correlation coefficients under assumptions of fixed and mixed variates,” Biometrika, 45, 471–477.10.1093/biomet/45.3-4.471Search in Google Scholar

Hotelling, H. (1953): “New light on the correlation coefficient and its transforms,” J. Roy. Stat. Soc. B, 15, 193–232.Search in Google Scholar

Koller, D. and M. Sahami (1996): “Toward optimal feature selection,” In: Proceedings of the 13th International Conference on Machine Learning (ICML), San Francisco, CA: Morgan Kaufmann, 284–292.Search in Google Scholar

Legendre, P. (2000): “Comparison of permutation methods for the partial correlation and partial mantel tests,” J. S. Comput. Sim., 67, 37–73.Search in Google Scholar

Li, Y., C. J. Willer, J. Ding, P. Scheet and G. R. Abecasis (2010): “MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes,” Genet. Epidemiol., 34, 816–834.Search in Google Scholar

Macciotta, N. P. P., G. Gaspa, R. Steri, C. Pieramati, P. Carnier and C. Dimauro (2009): “Pre-selection of most significant snps for the estimation of genomic breeding values,” BMC Proc., 3, 1–4.Search in Google Scholar

Meuwissen, T. H. E., B. J. Hayes and M. E. Goddard (2001): “Prediction of total genetic value using genome-wide dense marker maps,” Genetics, 157, 1819–1829.10.1093/genetics/157.4.1819Search in Google Scholar PubMed PubMed Central

Mevik, B.-H., R. Wehrens and K. H. Liland (2011): pls: Partial Least Squares and Principal Component Regression, R package version 2.3-0.Search in Google Scholar

Morris, A. P. and L. R. Cardon (2007): Whole Genome Association. In: D. J. Balding, M. Bishop, and C. Cannings. (Eds.), Handbook of Statistical Genetics, 3rd edition. Hoboken, NJ: Wiley.10.1002/9780470061619.ch37Search in Google Scholar

Park, T. and G. Casella (2008): “The Bayesian Lasso,” J. Am. Stat. Assoc., 103, 681–686.Search in Google Scholar

Pearl, J. (1988): Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.10.1016/B978-0-08-051489-5.50008-4Search in Google Scholar

Piepho, H.-P. (2009): “Ridge regression and extensions for genomewide selection in maize,” Crop Sci., 49, 1165–1176.Search in Google Scholar

Piepho, H.-P., J. O. Ogutu, T. Schulz-Streeck, B. Estaghvirou, A. Gordillo and F. Technow (2012): “Efficient computation of ridge-regression best linear unbiased prediction in genomic selection in plant breeding,” Crop Sci., 52, 1093–1104.Search in Google Scholar

Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Mailer, P. Sklar, P. I. de Bakker, M. J. Daly and P. C. Sham (2007): “PLINK: a tool set for whole-genome association and population-based linkage analyses,” Am. J. Hum. Genet., 81, 559–575.Search in Google Scholar

Rostoks, N., L. Ramsay, K. MacKenzie, L. Cardle, P. R. Bhat, M. L. Roose, J. T. Svensson, N. Stein, R. K. Varshney, D. F. Marshall, A. Graner, T. J. Close and R. Waugh (2006): “Recent history of artificial outcrossing facilitates whole-genome association mapping in elite inbred crop varieties,” PNAS, 106, 18656–18661.10.1073/pnas.0606133103Search in Google Scholar PubMed PubMed Central

Schulz-Streeck, T., J. Ogutu and H.-P. Piepho (2011): “Pre-selection of markers for genomic selection,” BMC Proc., 5, S12.Search in Google Scholar

Scutari, M. (2010): “Learning Bayesian networks with the bnlearn R package,” J. Stat. Soft., 35, 1–22.Search in Google Scholar

Scutari, M. and A. Brogini (2012): “Bayesian network structure learning with permutation tests,” Commun. Stat. Theory, 41, 3233–3243, special Issue “Statistics for Complex Problems: Permutation Testing Methods and Related Topics”. Proceedings of the Conference “Statistics for Complex Problems: the Multivariate Permutation Approach and Related Topics”, Padova, June 14–15, 2010.Search in Google Scholar

Solberg, L. C., W. Valdar, D. Gauguier, G. Nunez, A. Taylor, S. Burnett, C. Arboledas-Hita, P. Hernandez-Pliego, S. Davidson, P. Burns, S. Bhattacharya, T. Hough, D. Higgs, P. K. W. O. Cookson, Y. Zhang, R. M. Deacon, J. N. Rawlins, R. Mott and J. Flint (2006): “A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice,” Mamm. Genome, 17, 129–146.Search in Google Scholar

Speed, D., G. Hermani, M. R. Johnson and D. J. Balding (2012): “Improved heritability estimation from genome-wide SNPs,” Am. J. Hum. Genet., 91, 1011–1021.Search in Google Scholar

Speed, D., G. Hermani, M. R. Johnson, and D. J. Balding (2013): LDAK, http://dougspeed.com/ldak/.Search in Google Scholar

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc. B, 58, 267–288.Search in Google Scholar

Valdar, W., L. C. Solberg, D. Gauguier, S. Burnett, P. Klenerman, W. O. Cookson, M. S. Taylor, J. N. Rawlins, R. Mott and J. Flint (2006): “Genome-wide genetic association of complex traits in heterogeneous stock mice,” Nat. Genet., 8, 879–887.Search in Google Scholar

VanRaden, P. (2008): “Efficient methods to compute genomic predictions,” J. Dairy Sci., 91, 4414–4423.Search in Google Scholar

Vazquez, A. I., G. de los Campos, Y. C. Klimentidis, G. J. M. Rosa, D. Gianola, N. Yi and D. B. Allison (2012): “A comprehensive genetic approach for improving prediction of skin cancer risk in humans,” Genetics, 192, 1493–1502.10.1534/genetics.112.141705Search in Google Scholar PubMed PubMed Central

Waugh, R., D. Marshall, B. Thomas, J. Comadran, J. Russell, T. Close, N. Stein, P. Hayes, G. Muehlbauer, J. Cockram, D. O’Sullivan, I. Mackay, A. Flavell, AGOUEB, BarleyCAP and L. Ramsay (2010): “Whole-genome association mapping in elite inbred crop varieties,” Genome, 53, 967–972.10.1139/G10-078Search in Google Scholar PubMed

Wimmer, V., T. Albrecht, H.-J. Auinger and C.-C. Schön (2012): “synbreed: framework for the analysis of genomic prediction data using R,” Bioinformatics, 18, 2086–2087.10.1093/bioinformatics/bts335Search in Google Scholar PubMed

Zhao, K., C. Tung, G. C. Eizenga, M. H. Wright, M. L. Ali, A. H. Price, G. J. Norton, M. R. Islam, A. Reynolds, J. Mezey, A. M. McClung, C. D. Bustamante and S. R. McCouch (2011): “Genome-wide association mapping reveals a rich genetic architecture of complex traits in oryza sativa,” Nat. Commun., 2, 467.Search in Google Scholar

Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. Roy. Stat. Soc. B, 67, 301–320.Search in Google Scholar

Published Online: 2013-08-03
Published in Print: 2013-08-01

©2013 by Walter de Gruyter Berlin Boston

Downloaded on 20.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2013-0002/html
Scroll to top button