Abstract
Key message
Phenomic selection is a promising alternative or complement to genomic selection in wheat breeding. Models combining spectra from different environments maximise the predictive ability of grain yield and heading date of wheat breeding lines.
Abstract
Phenomic selection (PS) is a recent breeding approach similar to genomic selection (GS) except that genotyping is replaced by near-infrared (NIR) spectroscopy. PS can potentially account for non-additive effects and has the major advantage of being low cost and high throughput. Factors influencing GS predictive abilities have been intensively studied, but little is known about PS. We tested and compared the abilities of PS and GS to predict grain yield and heading date from several datasets of bread wheat lines corresponding to the first or second years of trial evaluation from two breeding companies and one research institute in France. We evaluated several factors affecting PS predictive abilities including the possibility of combining spectra collected in different environments. A simple H-BLUP model predicted both traits with prediction ability from 0.26 to 0.62 and with an efficient computation time. Our results showed that the environments in which lines are grown had a crucial impact on predictive ability based on the spectra acquired and was specific to the trait considered. Models combining NIR spectra from different environments were the best PS models and were at least as accurate as GS in most of the datasets. Furthermore, a GH-BLUP model combining genotyping and NIR spectra was the best model of all (prediction ability from 0.31 to 0.73). We demonstrated also that as for GS, the size and the composition of the training set have a crucial impact on predictive ability. PS could therefore replace or complement GS for efficient wheat breeding programs.
Similar content being viewed by others
References
Albrecht T, Wimmer V, Auinger H-J et al (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350. https://doi.org/10.1007/s00122-011-1587-7
Allard RW, Bradshaw AD (1964) Implications of genotype-environmental interactions in applied plant breeding 1. Crop Sci 4:1. Doi: https://doi.org/10.2135/cropsci1964.0011183X000400050021x
Azodi CB, Pardo J, VanBuren R et al (2020) Transcriptome-based prediction of complex traits in maize. Plant Cell 32:139–151. https://doi.org/10.1105/tpc.19.00332
Ben-Sadoun S, Rincent R, Auzanneau J et al (2020) Economical optimization of a breeding scheme by selective phenotyping of the calibration set in a multi-trait context: application to bread making quality. Theor Appl Genet 133:2197–2212. https://doi.org/10.1007/s00122-020-03590-4
Blanco M, Villarroya I (2002) NIR spectroscopy: a rapid-response analytical tool. TrAC Trends Anal Chem 21:240–250. https://doi.org/10.1016/S0165-9936(02)00404-1
Charmet G, Tran L-G, Auzanneau J et al (2020) BWGS: A R package for genomic selection and its application to a wheat breeding programme. PLoS ONE 15:e0222733. https://doi.org/10.1371/journal.pone.0222733
Consortium R, Fugeray-Scarbel A, Bastien C et al (2021) Why and how to switch to genomic selection: lessons from plant and animal breeding experience. Front Genet. https://doi.org/10.3389/fgene.2021.629737
Covarrubias-Pazaran G (2016) Genome-assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11:e0156744. https://doi.org/10.1371/journal.pone.0156744
Crossa J, de los CG, Pérez P et al (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724. https://doi.org/10.1534/genetics.110.118521
Crossa J, Pérez-Rodríguez P, Cuevas J et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22:961–975. https://doi.org/10.1016/j.tplants.2017.08.011
Cuevas J, Montesinos-López O, Juliana P et al (2019) Deep Kernel for Genomic and near Infrared Predictions in Multi-Environment Breeding Trials G3(9):2913–2924. https://doi.org/10.1534/g3.119.400493
Daetwyler HD, Calus MPL, Pong-Wong R et al (2013) Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193:347–365. https://doi.org/10.1534/genetics.112.147983
Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031. https://doi.org/10.1534/genetics.110.116855
de los Campos G, Hickey JM, Pong-Wong R, et al (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345. https://doi.org/10.1534/genetics.112.143313
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255. https://doi.org/10.3835/plantgenome2011.08.0024
Endelman JB, Jannink J L (2012) shrinkage estimation of the realized relationship matrix. G3 Genes Genomes Genetics 2:1405–1413: https://doi.org/10.1534/g3.112.004259
Fernandez O, Urrutia M, Bernillon S et al (2016) Fortune telling: metabolic markers of plant performance. Metabolomics 12:158. https://doi.org/10.1007/s11306-016-1099-1
Forni S, Aguilar I, Misztal I (2011) Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genet Sel Evol 43:1. https://doi.org/10.1186/1297-9686-43-1
Friedman J, Hastie T, Tibshirani R (2010) Regularization Paths for generalized linear models via coordinate descent. J Stat Soft 33. Doi: https://doi.org/10.18637/jss.v033.i01
Frisch M, Thiemann A, Fu J et al (2010) Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize. Theor Appl Genet 120:441–450. https://doi.org/10.1007/s00122-009-1204-1
Fu J, Falke KC, Thiemann A et al (2012) Partial least squares regression, support vector machine regression, and transcriptome-based distances for prediction of maize hybrid performance with gene expression data. Theor Appl Genet 124:825–833. https://doi.org/10.1007/s00122-011-1747-9
Galán RJ, Bernal-Vasquez A-M, Jebsen C et al (2020) Integration of genotypic, hyperspectral, and phenotypic data to improve biomass yield prediction in hybrid rye. Theor Appl Genet 133:3001–3015. https://doi.org/10.1007/s00122-020-03651-8
Gärtner T, Steinfath M, Andorf S et al (2009) Improved heterosis prediction by combining information on DNA- and metabolic markers. PLoS ONE 4:e5220. https://doi.org/10.1371/journal.pone.0005220
Gorjanc G, Jenko J, Hearne SJ, Hickey JM (2016) Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genomics 17:30. https://doi.org/10.1186/s12864-015-2345-z
Griffiths S, Simmonds J, Leverington M et al (2009) Meta-QTL analysis of the genetic control of ear emergence in elite European winter wheat germplasm. Theor Appl Genet 119:383–395. https://doi.org/10.1007/s00122-009-1046-x
Guo Z, Magwire MM, Basten CJ et al (2016) Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize. Theor Appl Genet 129:2413–2427. https://doi.org/10.1007/s00122-016-2780-5
Hanocq E, Laperche A, Jaminon O et al (2007) Most significant genome regions involved in the control of earliness traits in bread wheat, as revealed by QTL meta-analysis. Theor Appl Genet 114:569–584. https://doi.org/10.1007/s00122-006-0459-z
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009a) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443. https://doi.org/10.3168/jds.2008-1646
Hayes BJ, Visscher PM, Goddard ME (2009b) Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res 91:47–60. https://doi.org/10.1017/S0016672308009981
Heffner EL, Jannink J-L, Iwata H et al (2011) Genomic selection accuracy for grain quality traits in biparental wheat populations. Crop Sci 51:2597–2606. https://doi.org/10.2135/cropsci2011.05.0253
Hickey JM, Dreisigacker S, Crossa J et al (2014) Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci 54:1476–1488. https://doi.org/10.2135/cropsci2013.03.0195
Honigs DE, Hieftje GM, Mark HL, Hirschfeld TB (1985) Unique-sample selection via near-infrared spectral subtraction. Anal Chem 57:2299–2303. https://doi.org/10.1021/ac00289a029
Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9:166–177. https://doi.org/10.1093/bfgp/elq001
Kang HM, Sul JH, Service SK et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354. Doi: https://doi.org/10.1038/ng.548
Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137. https://doi.org/10.2307/1266770
Krause MR, González-Pérez L, Crossa J, et al (2019) Hyperspectral reflectance-derived relationship matrices for genomic prediction of grain yield in wheat. G3: Genes Genomes Genetics g3.200856.2018. Doi: https://doi.org/10.1534/g3.118.200856
Lane HM, Murray SC, Montesinos-López OA et al (2020) Phenomic selection and prediction of maize grain yield from near-infrared reflectance spectroscopy of kernels. Plant Phenome J 3:e20002. https://doi.org/10.1002/ppj2.20002
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829. https://doi.org/10.1093/genetics/157.4.1819
Montesinos-López A, Montesinos-López OA, Cuevas J et al (2017a) Genomic Bayesian functional regression models with interactions for predicting wheat grain yield using hyper-spectral image data. Plant Methods 13:1–29. https://doi.org/10.1186/s13007-017-0212-4
Montesinos-López OA, Montesinos-López A, Crossa J et al (2017b) Predicting grain yield using canopy hyperspectral reflectance in wheat breeding data. Plant Methods 13:1–23. https://doi.org/10.1186/s13007-016-0154-2
Norman A, Taylor J, Edwards J, Kuchel H (2018) Optimising genomic selection in wheat: effect of marker density, population size and population structure on prediction accuracy. G3: Genes|Genomes|Genetics 8:2889–2899. Doi: https://doi.org/10.1534/g3.118.200311
Osborne BG (2006) Applications of near infrared spectroscopy in quality screening of early-generation material in cereal breeding programmes. J near Infrared Spectrosc 14:93–101. https://doi.org/10.1255/jnirs.595
Posada H, Ferrand M, Davrieux F et al (2009) Stability across environments of the coffee variety near infrared spectral signature. Heredity 102:113–119. https://doi.org/10.1038/hdy.2008.88
Pszczola M, Strabel T, Mulder HA, Calus MPL (2012) Reliability of direct genomic values for animals with different relationships within and to the reference population. J Dairy Sci 95:389–400. https://doi.org/10.3168/jds.2011-4338
Riedelsheimer C, Czedik-Eysenberg A, Grieder C et al (2012) Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet 44:217
Rimbert H, Darrier B, Navarro J et al (2018) High throughput SNP discovery and genotyping in hexaploid wheat. PLoS ONE 13:e0186329. https://doi.org/10.1371/journal.pone.0186329
Rincent R, Charpentier J-P, Faivre-Rampant P, et al (2018) Phenomic selection is a low-cost and high-throughput method based on indirect predictions: proof of concept on wheat and poplar. G3: Genes|Genomes|Genetics g3.200760.2018. Doi: https://doi.org/10.1534/g3.118.200760
Rincent R, Laloe D, Nicolas S et al (2012) Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics 192:715–728. https://doi.org/10.1534/genetics.112.141473
Rodríguez-Álvarez MX, Boer MP, van Eeuwijk FA, Eilers PHC (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics 23:52–71. https://doi.org/10.1016/j.spasta.2017.10.003
Abraham S, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36:1627–1639. https://doi.org/10.1021/ac60214a047
Schrag TA, Westhues M, Schipprack W et al (2018) Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics 208:1373–1385. https://doi.org/10.1534/genetics.117.300374
Seifert F, Thiemann A, Schrag TA et al (2018) Small RNA-based prediction of hybrid performance in maize. BMC Genomics 19:371. https://doi.org/10.1186/s12864-018-4708-8
Solberg TR, Sonesson AK, Woolliams JA, Meuwissen THE (2008) Genomic selection using different marker types and densities. J Anim Sci 86:2447–2454. https://doi.org/10.2527/jas.2007-0010
Ward J, Rakszegi M, Bedő Z et al (2015) Differentially penalized regression to predict agronomic traits from metabolites and markers in wheat. BMC Genet 16:19. https://doi.org/10.1186/s12863-015-0169-0
Westhues M, Schrag TA, Heuer C et al (2017) Omics-based hybrid prediction in maize. Theor Appl Genet 130:1927–1939. https://doi.org/10.1007/s00122-017-2934-0
Whittaker JC, Thompson R, Denham MC (2000) Marker-assisted selection using ridge regression. Genet Res 75:249–252. https://doi.org/10.1017/S0016672399004462
Xiaobo Z, Jiewen Z, Povey MJW et al (2010) Variables selection methods in near-infrared spectroscopy. Anal Chim Acta 667:14–32. https://doi.org/10.1016/j.aca.2010.03.048
Xu S, Xu Y, Gong L, Zhang Q (2016) Metabolomic prediction of yield in hybrid rice. Plant J 88:219–227. https://doi.org/10.1111/tpj.13242
Yu X, Li X, Guo T et al (2016) Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nature Plants 2:16150. https://doi.org/10.1038/nplants.2016.150
Zenke-Philippi C, Frisch M, Thiemann A et al (2017) Transcriptome-based prediction of hybrid performance with unbalanced data from a maize breeding programme. Plant Breed 136:331–337. https://doi.org/10.1111/pbr.12482
Zhong S, Dekkers JCM, Fernando RL, Jannink J-L (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182:355–364. https://doi.org/10.1534/genetics.108.098277
Acknowledgements
The authors thank the work in experimental units by INRAE (Clermont-Ferrand, Estrées-Mons, Le Moulon, Rennes), breeders from Agri-Obtentions and Florimond Desprez. The authors are grateful to Agri-Obtentions, Florimond Desprez, and the Association Nationale de la Recherche et de la Technologie (ANRT, grant number 2019/0060) which supported this PhD work. The authors also thank Bastian Alexandre and Rachel Carol (Bioscience Editing, France) for the proofreading of this work and Tristan Mary-Huard for the careful reading of the equations. Finally, the authors thank the two anonymous reviewers for the helpful comments on this work.
Funding
This work was funded by Agri-Obtentions, Florimond Desprez and the Association Nationale de la Recherche et de la Technologie (ANRT, grant number 2019/0060).
Author information
Authors and Affiliations
Contributions
JA, FXO, BR and EH designed the field trials and collected the phenotypic data from Agri-Obtentions and INRAE. EGD provided the phenotypic data and genotyping data from Florimond Desprez company. SB provided the genotyping data from Agri-Obtentions and INRAE and participate in discussions of this study. RR initiated the project, and with JLG supervised the study and helped improving the manuscript. PR analysed the data and wrote the manuscript. All authors approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Availability of data and material
The datasets generated during and/or analysed during the current study are not publicly available due to breeding programs privacy but are available from the corresponding author on reasonable request.
Code availability
Code used to lead the analysis of this study is available from the corresponding author on request.
Additional information
Communicated by Thomas Miedaner.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix 1
Appendix 1
We describe in this section the analysis of two factors which also impacted the predictive ability of the phenomic selection: the size and composition of the training set (TS).
In GS, the size and composition of the TS have an impact on PA. We characterised this effect by testing six TS sizes (10, 50, 100, 150, 200, 250 genotypes) on two specific sites from the dataset Set2-2019: GL and EM. These datasets had the larger number of lines genotyped to test different TS sizes. For this, we randomly split the data in fivefolds of the same size. Onefold constituted the validation set, and the remaining folds are the genotypes potentially included in the TS. Among the latter, we randomly sampled a definite number of lines to constitute the final TS with the corresponding size. The same procedure was followed for all the folds and all the TS sizes and was repeated 25 times to give 125 predictive abilities for each TS size. We thus compared the PA of the four models M, S, CbSD, and CbSD + M representing the G-BLUP, H-BLUP and GH-BLUP model types.
The composition of a TS can be optimised in order to minimise its size while retaining similar PA. In one scenario, the size of the TS was arbitrarily defined, and either random or optimised procedures were used to select among all genotypes available, the one which will constitute the TS. The validation set was composed of the genotypes not included in the TS. We compared three optimisation algorithms to define the TS for a particular size. We tested the CDmean algorithm (CDmean) (Rincent et al. 2012), developed originally for GS, and two algorithms developed to optimise NIRS calibration equations in chemiometry, Honigs (HG) (Honigs et al. 1985) and Kennard-Stone (KS) (Kennard and Stone 1969). CDmean was applied with custom R code, while HG and KS were applied with the prospectr R package (Stevens and Ramirez-Lopez 2020). Algorithm performance was compared to randomly select (RD) training sets. PA for each TS size was averaged over 50 repetitions for CDmean or 125 repetitions for RD. There was no repetition for HG and KS as they are both deterministic. To compare these TS selection approaches, we used the datasets Set2-2019-EM and Set2-2019-GL, in which many varieties were genotyped.
We found that the effect of increasing TS size on the PA of PS and GS was substantial (Fig.
6). For each model and both environments, PA increased with the TS size. However, from 50 to 250, the PA increased only slightly. For the smallest TS, the S model was slightly better than the others. Regardless of the TS size, the GS model M was less accurate than the PS models. Model CbSD + M, combining both marker and NIRS data, outperformed the other models for the larger size of TS.
To optimise TS composition for a given size, we compared three different optimisation algorithms to determine the composition of the TS for performing GS or PS (Fig.
7). When applying GS (G-BLUP model), TS optimised with CDmean computed with the kinship matrix (CDmean_K) performed better than random TS (RD) for both environments with an average gain of + 22%, a maximum of + 30% and a minimum of −4.5%. When applying PS (both H-BLUP models), TS optimised with CDmean_H computed on the NIR similarity matrix also performed slightly better than randomly sampled TS (RD). Finally, for the GH-BLUP model combining molecular markers and NIRS, CDmean_K performed slightly better than CDmean_H and RD. KS and HG performed very variably as a function of the TS size in Set2_2019_GL and with lower PA than RD in Set2_2019_EM when applying H-BLUP and GH-BLUP models.
Rights and permissions
About this article
Cite this article
Robert, P., Auzanneau, J., Goudemand, E. et al. Phenomic selection in wheat breeding: identification and optimisation of factors influencing prediction accuracy and comparison to genomic selection. Theor Appl Genet 135, 895–914 (2022). https://doi.org/10.1007/s00122-021-04005-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-021-04005-8