Introduction

Carcass merit traits, including hot carcass weight (HCW), rib eye area (REA), average backfat thickness (AFAT), lean meat yield (LMY), and carcass marbling score (CMAR), are economically important traits in beef cattle since they directly influence the meat product yield and quality grade, and therefore profitability. For example, sufficient marbling is important for beef tenderness, juiciness and flavor, so the degree of marbling in beef is the primary factor determining quality grade of the meat. However, carcass merit traits are expressed late in life and the measurement of these traits for individual live animals is relatively expensive via ultrasound technologies. In many cases evaluation occurs post mortem, thereby eliminating breeding stock with superior breeding values for the traits. The development of genomic prediction provides an opportunity to assess genetic merit of animals as early as birth1,2,3,4 but there is still a need to improve the accuracy of genomic selection for carcass traits in beef cattle in order to achieve broader industry applications3,5,6. Detecting more candidate genes and functional or causal DNA variants through genome-wide association studies (GWAS) and understanding the biological background of the relationship between the genome and phenome could help improve the accuracy of genomic selection for complex traits including carcass merit traits7,8,9,10.

As more omics-based intermediate phenotypes, based on gene expression, protein and metabolite analysis, become available, integrating multi-omics data to further elucidate genetic influence of complex traits holds great promise11,12,13,14. Among the omics-based intermediate phenotypes, metabolites have been reported to be associated with carcass merit traits of livestock15,16,17 and their variation is influenced by genetic effects18,19. Therefore, we hypothesize that combining metabolomic data into GWAS of whole genome DNA variants could help detect key candidate genes and functional or causal DNA variants associated with carcass merit traits.

In this study, the data of 5 carcass merit traits (HCW, REA, AFAT, LMY, and CMAR) and 31 plasma metabolites were collected from a beef cattle population consisting of 493 crossbred bulls, heifers, and steers. Our objective was to identify significant single nucleotide polymorphisms (SNPs), candidate genes and biological functions associated with carcass merit traits through integration of carcass merit traits, plasma metabolites and whole genome sequence variants. Linear regression models were first used to identify metabolites associated with carcass merit traits. Metabolome-genome wide association studies (mGWAS) were then performed with 10,488,742 imputed whole genome SNPs to identify significant SNPs for the trait associated metabolites. Candidate genes were mapped based on significant SNPs and gene functional enrichment analyses were subsequently performed on candidate genes of each trait to predict biological functions/processes associated with carcass merit traits in beef cattle.

Results

Metabolites associated with carcass merit traits

The results of regression analyses showed 15 out of 31 analyzed metabolites were associated with one or more than one of the carcass merit traits (Table 1). At P-values less than 0.05, 3 (3-hydroxybutyric acid, acetic acid, and citric acid), 3 (creatinine, l-glutamine, and succinic acid), 1 (methanol), 1 (l-lactic acid), and 3 (succinic acid, lysine, and glycine) metabolites were identified as associated with HCW, REA, AFAT, LMY, and CMAR, respectively. However, some metabolites with a P-value from 0.05 to 0.1 explained more than 1% of the phenotypic variance of the associated carcass merit traits, thus, a relatively relaxed threshold of P-value < 0.1 was chosen to include more metabolites that may be potentially associated with carcass merit traits. For HCW, at P-values less than 0.1, 3-hydroxybutyric acid, acetic acid, citric acid, and choline were the associated metabolites, accounting for 1.92%, 1.69%, 1.48%, and 1.09% of the phenotypic variance in HCW, respectively. Creatinine, l-glutamine, succinic acid, pyruvic acid, l-lactic acid, and 3-hydroxybutyric acid were significantly associated with REA, and these six metabolites accounted for 1.87%, 1.79%, 1.60%, 1.56%, 1.04%, and 0.80% of phenotypic variance, respectively. AFAT was associated with fumaric acid, methanol, d-glucose, and glycerol, and these four metabolites explained 2.40%, 1.71%, 1.67%, and 1.39% of the phenotypic variance, respectively. l-lactic acid and creatinine were associated with LMY and accounted for 2.71% and 1.42% of phenotypic variance, respectively. Five metabolites, including succinic acid, fumaric acid, lysine, glycine, and choline, were associated with CMAR and each respectively accounted 1.76%, 1.36%, 1.22%, 1.20%, and 1.05% of the phenotypic variance in CMAR, respectively. Most of these metabolites were mainly associated with a single trait. However, a few metabolites were associated with more than one trait. For example, 3-hydroxybutyric acid was associated with both HCW and REA. l-lactic acid and creatinine were both associated with REA and LMY. Choline was associated with HCW and CMAR, and fumaric acid was associated with both AFAT and CMAR (Table 1). The additive genetic variance and heritability estimates of the 15 metabolites associated with the carcass merit traits were low to moderate (Table S1 in Supplementary file 1).

Table 1 A summary of metabolites associated with carcass merit traits in a multibreed population of beef cattle.

Significant SNPs and candidate genes associated with metabolites

Genomic inflation factors for all association analyses ranged from 0.95 to 1.01 (Table S2 in Supplementary file 1), a value around 1 indicates that there is no population stratification, and the statistical models are well fitted. Summarized results of the mGWAS for the 15 metabolites (identified as associated with the carcass merit traits, P-value < 0.1) are presented in Table 2. Manhattan plots and QQ plots are provided in Figs. S3S17 in Supplementary file 4. The average of phenotypic variance of the metabolites explained by a single SNP was 5.13% with a range of 3.57–10.95%. Details of significant SNPs for the 15 metabolites are provided in Supplementary file 2. The number of genes associated with the 15 metabolites varied from 3 (fumaric acid) to 53 (succinic acid) and a full list of candidate genes for each of the 15 metabolites is provided in Supplementary file 3.

Table 2 A summary of significant SNPs, the number of putative QTLs, and the number of candidate genes for metabolites associated with carcass merit traits in a multibreed population of beef cattle.

Through integrating the metabolite and carcass merit trait regression analyses and the mGWAS results, a total of 103, 160, 83, 43, and 109 candidate genes were found to be associated with HCW, REA, AFAT, LMY, and CMAR, respectively (Table 3). As for metabolites, some candidate genes identified through the mGWAS were associated with multiple carcass traits (Table S3 in Supplementary file 1 and Fig. S1 in Supplementary file 4). For instance, CDH13 was associated with HCW, REA, AFAT, and CMAR, while 5 genes (KMT5B, NDUFS8, ALDH3B1, CHKA, and TCIRG1) were associated with HCW, REA, LMY, and CMAR.

Table 3 Metabolites and their candidate genes associated with carcass merit traits in a multibreed population of beef cattle.

Significantly enriched molecular functions and gene networks for carcass merit traits

Of the identified candidate genes for HCW, REA, AFAT, LMY, and CMAR, 99, 149, 78, 42, and 102 genes were respectively mapped to the IPA database for functional enrichment analyses. Briefly, 26, 24, 26, 24, and 28 cellular and molecular functions were identified as significantly (P-value < 0.05) associated with HCW, REA, AFAT, LMY, and CMAR, respectively (Tables S4S8 in Supplementary file 1). Interestingly, 75% of the biological functions were commonly associated with all the five carcass merit traits in this study (Table S9 in Supplementary file 1 and Fig. S2 in Supplementary file 4). Some of the major functions common across the traits included molecular transport, small molecule biochemistry, lipid metabolism, cell-to-cell signaling and interaction, carbohydrate metabolism, cellular assembly and organization. Additionally, the five topmost significantly enriched biological functions for each trait and the candidate genes involved are presented in Table 4. Among the top five significant enriched functions, molecular transport and small molecule biochemistry were commonly associated with all carcass merit traits. Lipid metabolism was the most significant biological function for LMY and CMAR, and it was the second and fourth highest biological function for REA and HCW, respectively. Cell-to-cell signaling and interaction was one of the top significant biological functions associated with HCW, REA, AFAT, and CMAR. Carbohydrate metabolism was among the top significant biological functions associated with both HCW and CMAR. Further investigation within some of the biological functions revealed molecular/metabolic processes related to the carcass merit traits. For HCW, within the molecular transport function, 11 genes (AGTR1, CHKA, CPT1A, DDX5, IGHMBP2, IL21R, LRP5, NTRK2, PVALB, SULT1E1, and TRAF3) were involved in concentration of corticosterone and lipid, and quantity of steroid and steroid hormone (Fig. 1a). For CMAR, within the lipid metabolism function, 17 candidate genes (AGTR1, AQP9, CCDC80, CHKA, CPT1A, DAB1, DDX5, IGHMBP2, LRP5, PLA2G2A, PLA2G2E, PLA2G5, PTGS1, PVALB, SLC10A1, SPRED2, and SSPN) were involved in multiple metabolic processes related to fatty acid and lipid metabolism (Fig. 1b), such as synthesis of fatty acid and release of lipid. Additionally, within the carbohydrate metabolism function for CMAR, 12 candidate genes (AGTR1, AQP9, CHKA, CPT1A, GYG1, LRP5, NKX3-2, PDCL, PLA2G2A, PLA2G2E, PLA2G5, and TREH) were involved in carbohydrate metabolic processes such as carbohydrate biosynthesis (Fig. 1c). It is worth noting that 8 candidate genes (AGTR1, AQP9, CHKA, CPT1A, LRP5, PLA2G2A, PLA2G2E, and PLA2G5) associated with CMAR were involved in both lipid and carbohydrate metabolism.

Table 4 Five topmost significantly enriched biological functions for carcass merit traits, and genes involved in functions.
Figure 1
figure 1

(a) gene network of molecular transport for hot carcass weight (HCW); (b) gene network of lipid metabolism for carcass marbling score (CMAR); (c) gene network of carbohydrate metabolism for carcass marbling score (CMAR).

Discussion

Metabolomics to improve understanding on genetic influence of carcass merit traits

Studies have demonstrated metabolites as potential biomarkers for economically important traits in livestock species15,16,17. However, improving understanding of the biology involved is hampered by the limited knowledge of how these metabolites are associated with different economically important traits in different livestock species. Carcass merit traits are of fundamental interest to every beef producer and everyone involved in the beef industry. However, these traits are relatively expensive to measure using ultrasound technologies on individual live animals, which is a limitation for selection and improvement of these traits. Since blood metabolites are easily measurable/quantifiable even on live animals, we speculate that identification of genetic/biological associations between metabolite concentrations and beef cattle carcass merit traits could potentially enhance genetic prediction and selection for these traits in beef cattle. In addition, identification of blood metabolite biomarkers associated with carcass traits at an earlier stage would have a more practical application for genetic selection and for sorting animals into different finishing groups for more uniform carcass outputs. Therefore, we collected the blood samples at the start of the feedlot test instead of close to slaughter and examined the associations between 31 plasma metabolites and 5 carcass merit traits. We further explored the potential biological linkage between these metabolites and carcass merit traits. Our results showed that several metabolites were associated with the carcass merit traits studied. However, individual metabolites, despite being significantly associated with the trait, only accounted for 0.80–2.71% of the total phenotypic variance of carcass merit traits. This relatively small percentage of phenotypic variance reflected the complex nature of these traits, which we believe are regulated by multiple metabolic pathways involving many metabolites with each having only a small contribution/effect. It is also possible that due to the limited number of metabolites we profiled in the current study, we were not able to identify those metabolites with major influences on the traits studied. Additionally, this study used a more relaxed threshold (P-value < 0.1) to identify metabolites potentially associated with carcass merit traits, therefore, validation in independent beef cattle populations or further studies considering a wider range of metabolites is warranted. It is also worthwhile to analyze metabolites on samples collected at different developmental stages to see whether and how the associations between the metabolites and the carcass traits may change. Furthermore, we observed that a majority of the significant metabolites were only associated with one trait. However, some metabolites in the current study were associated with two traits, indicating potential biological relationships between these traits. For example, in this study, we observed that 3-hydroxybutyric acid was associated with both HCW and REA, and beef cattle with high HCW and REA had lower concentration of 3-hydroxybutyric acid, indicating that animals with high HCW and REA may have better carbohydrate metabolism. Additionally, 3-hydroxybutyrate is the main representative of ketone bodies and one important function of ketone bodies is to provide acetoacetyl-CoA and acetyl-CoA for the synthesis of cholesterol, fatty acids, and complex lipids20. Thus, a lower concentration of 3-hydroxybutyric acid may lead to reduced lipid synthesis in animals with high HCW and REA. Interestingly, creatinine, the final catabolite of muscle energy metabolism21, was positively associated with both REA and LMY in the current study (Table 1), and these two carcass merit traits measure muscle development and the proportion of lean meat in a carcass respectively. In line with our results, Hanset et al.22 previously observed higher concentrations of plasma creatinine in double muscled bulls as compared to conventional or normal muscled bulls, and Patel et al.23 proposed creatinine in serum as a promising biomarker for human muscle mass. These previous studies and the results from our study demonstrate that creatinine is a potential indicator trait or biomarker for muscle related traits in beef cattle. Finally, the heritability of metabolites estimated in this study could be used as reference information. Large standard errors for these estimates were observed due to the sample size used and future research utilising a larger sample size is warranted.

Candidate genes, enriched molecular functions and gene networks for carcass merit traits

Generally, identification of SNPs and genes associated with carcass merit traits mainly relies on association studies between DNA variants and the traits. For example, Wang et al.24 performed GWAS based on 7.8 million imputed whole genome sequence variants for carcass merit traits using Canadian beef cattle and they identified hundreds of candidate genes associated with carcass merit traits. However, the knowledge about the underlying biological background behind these associations is relatively limited. We assume that metabolites, which are an intermediate phenotype lying between genome and carcass merit traits, could provide additional insight into the associations. In the current study, the candidate genes identified through incorporating metabolites showed relatively good consistency with the previous study24 (Table S10 in Supplementary file 1). Briefly, we found that 34.95%, 28.13%, 27.71%, 41.86%, and 22.94% of the candidate genes identified in this study overlapped with those from Wang et al.24 for HCW, REA, AFAT, LMY, and CMAR, respectively. Of note, some of the candidate genes were also reported in different cattle breeds or populations in other studies. For example, ST18 was associated with AFAT in the current study and it was associated with the metabolite d-glucose. Medeiros de Oliveira Silva et al.25 also identified ST18 as candidate gene for backfat through GWAS in a Nelore cattle population. Additionally, by integrating metabolomic data, this study added more information to some previously identified associations between genes and carcass merit traits. For example, Wang et al.24 reported that UMODL1, L3HYPDH, JKAMP, and LUZP2 were candidate genes associated with REA and LMY, but the potential mechanism of how these genes could influence the two traits remained unclear. Our study showed that these same genes (UMODL1, L3HYPDH, JKAMP, and LUZP2) were associated with the concentration of creatinine which is a metabolite associated with REA and LMY. These results indicated that these genes may be associated with the synthesis or degradation of creatinine in animals and thereby influence the related traits. Similarly, HLTF, GYG1, RYR2, RBM47, and APBB2 were reported to be associated with REA and CMAR by Wang et al.24. Our results showed these genes were associated with succinic acid which was negatively associated with both REA and CMAR. Both examples represent one of situations that genes may influence different traits by regulating the same metabolites, and the mechanism of how these genetic variants affect the concentration of metabolites still needs more studies. According to our results, we would like to highlight that some genes could affect the same carcass merit traits by influencing different metabolites. For instance, AMN was associated with both 3-hydroxybutyric acid and citric acid, and both metabolites were identified as associated with HCW. Therefore, information obtained via analyzing metabolites could improve the understanding of genetic effects on these phenotypes. In our companion paper by Li et al.26, similar findings were also observed. These two studies indicate that metabolites play important roles in the variation of both feed efficiency and carcass merit traits. Integration of metabolomic and genomic data could help identify functional or causal SNPs or genes, and interpret the biological meaning of the candidate genes identified in GWAS. In addition, these two studies investigated the associations between different omics levels, which could shed light on the interrelationship between different omics layers and potential molecular mechanisms underlying these traits. Therefore, our findings have broadened our knowledge on the genetic and molecular mechanisms of these traits. Based on what we have learned from these two studies, we recommend applying such multi-omics analyses to study other important traits in beef cattle.

In addition to adding more information to known associations, incorporating metabolomic data can help us identify additional novel associations as metabolites represent a level close to the final phenotypes (i.e., carcass merit traits). In the current study, some additional candidate genes were reported to be associated with carcass merit traits. Therefore, we expect that including the candidate gene SNPs in the DNA marker panel or increasing the weight applied to such SNPs could either improve accuracy of genomic prediction of the traits or decrease the DNA marker density used in genomic prediction while retaining accuracy. A preliminary attempt of this latter option was done by Melzer et al.27 for the prediction of three traditional milk traits in dairy cows. Melzer et al.27 applied regression methods to identify important milk metabolites and then those SNPs with significant genetic effects on important metabolites were identified and used to predict milk traits. Compared with the classical approach that uses all SNPs (40,317) in prediction, this metabolite approach could achieve similar prediction precision with less than 1% of the total amount of SNPs. Fontanesi28 suggested integration of metabolomic data would be useful if the heritability of a trait is low, if a trait is hard to be precisely and directly measured on the animals, or if the prediction accuracy was limited by the small number of phenotyped animals in the training populations. Since carcass merit traits are expressed at later stages of animal production and are usually measured by sacrificing potential breeding stock, these traits are more suitable for DNA marker based genomic prediction, and incorporating metabolomic data into the genomic prediction has the potential to enhance the genomic prediction accuracy. In addition to metabolites, the information carried by other omics data, such as RNA and protein, also helps to prioritize SNPs associated with complex traits, and can further contribute to improving genomic prediction accuracy of these traits. For example, Fang et al29 applied an extended genomic best linear unbiased prediction (GBLUP) model called genomic feature BLUP (GFBLUP) that included a separate random effect for the joint action of SNPs within genomic features which were obtained from RNA differential expression analyses. Compared to GBLUP, the accuracy of genomic prediction for mastitis and milk production traits with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction but significantly increased (164.4%) in across-breed prediction. Theoretically, the genomic features could be defined from various sources of biological knowledge (e.g., metabolomics) and the GFBLUP model could be applied to other complex traits. Therefore, it would be of interest to investigate the accuracy of prediction for carcass merit traits with and without utilizing multi-omics data.

In order to further investigate biological functions associated with carcass merit traits, five topmost significant biological functions associated with each trait were identified in the current study. These top five biological functions showed substantial overlap with the top five biological functions identified by previous studies24,30,31, which indicated those overlapping top biological functions potentially have important biological meaning for the carcass merit traits in beef cattle. Since our carcass data were collected from animals that were finished for meat production, genes involved in these overlapping top biological functions, such as lipid metabolism and carbohydrate metabolism, likely play a more important role in determining the carcass merit traits. Therefore, the identification of top and other enriched biological functions and their corresponding genes will not only improve our understanding of the underlying biology but also help prioritize candidate genes and related putative causal SNPs for future studies. Additionally, construction of gene networks for biological functions could help us elucidate complicated connections among genes and disentangle the potential relationships among genes, biological functions and phenotypes. For example, molecular transport was identified as a top enriched biological function associated with all carcass merit traits and its network of HCW as an example showed that some of the associated genes were involved in concentration of lipid and corticosterone, and quantity of steroid and steroid hormones (Fig. 1a). In beef cattle production, more than 30 commercially-available steroid hormone implants are marketed in the U.S. and the effects of steroid hormone implants on improving carcass leanness, increasing average daily gain, and altering dry matter intake has been reviewed by Smith and Johnson32. Thus, those genes linked to the functions of steroid and steroid hormones in the network may consequently influence final muscle mass in the carcass. For those genes involved in the concentration of lipid, they may influence fats in the carcass by regulating breakdown or storage of fats. Additionally, we would like to highlight the network of lipid metabolism for CMAR (Fig. 1b) because lipid metabolism was the most significant biological function associated with this trait. In this network, some genes were involved in fatty acid metabolism including fatty acid synthesis, release and concentration. The phenotypic and genetic correlations between fatty acid composition and marbling have been reported in different beef cattle populations33,34,35,36. Our results provide further insight into the potential molecular and genetic background accounting for genetic correlations between marbling and fatty acid composition in beef cattle, further indicating that the selection for fatty acid composition or concentration could influence marbling in beef cattle as previously proposed36.

Conclusion

In this study, genomic, metabolomic, and phenotypic data were integrated to investigate biological functions/processes related to carcass merit traits in beef cattle. Plasma metabolites associated with HCW, REA, AFAT, LMY, and CMAR were identified and individual metabolites were found to account for a small proportion of the total phenotypic variance of the carcass merit traits. Several candidate genes as associated with carcass merit traits were identified through mGWAS along with regression analyses. These genes are involved in multiple biological functions that are related to the associated carcass merit traits. Additionally, the results of our integrative analyses could help interpret previous results from DNA marker based GWAS of the carcass merit traits and revealed functional genes associated with these economically important traits. Therefore, our integrative study has provided insights into relationships between genes, metabolites and carcass merit traits, which could potentially lead to improvement of genomic prediction accuracy via incorporating metabolomic related data.

Material and methods

Animal population, metabolomic and phenotypic data collection

All animals in this study were cared for according to the guidelines of the Canadian Council on Animal Care (2009) and the experimental procedures were approved by the University of Alberta Animal Care and Use Committee (AUP00000777). In total, 493 bulls (n = 93), heifers (n = 125) or steers (n = 275) from different herds including Charolais (n = 73), Hereford-Angus crosses (n = 191), and Beefbooster composite breed (predominantly Charolais-based, n = 229) were used. Among these animals, 277 animals from two herds had implants, while 216 animals from the other three herds had no implant. The effect of the factor of implant was examined using statistical analysis, and its effect has been captured under the herd variable through subsequent phenotypic value adjustment. Animals were born between 2002 to 2011 and initially measured for feed intake using the GrowSafe system (GrowSafe Systems Ltd., Airdrie, Alberta, Canada) at the feedlot test station under multiple projects, which were described previously37,38,39,40. The animals from the same herd of a particular year were tested in the same feedlot and diet. Blood samples were collected from all animals by jugular venipuncture in the early morning on the first day of feedlot feeding test and immediately frozen at − 80 °C for storage. Plasma metabolites were quantified using nuclear magnetic resonance (NMR) spectroscopy as described by Li et al.18. Briefly, blood samples were thawed at room temperature and centrifuged at 10,000 rpm for 10 min to separate the plasma. Plasma was then filtered through 3 kDa molecular weight cut-off filters (Merck Millipore Ltd. 3KDA filter tubes; Darmstadt, Germany) to exclude macromolecules, including lipids and proteins. After filtration samples with a low volume were diluted with high-performance liquid chromatography (HPLC) water to 570 μl to ensure an adequate volume for NMR acquisition. Samples were further prepared in a 5 mm NMR tube (New Era Enterprises Inc., NJ, USA) that contained a total of 700 μl including 570 μl filtered serum, 60 μl DSS and 70 μl D2O. Spectra were acquired on a 500 MHz VNMRS spectrometer equipped with a 5 mm cold probe (Agilent Technologies, CA, USA). Metabolites were identified and quantified with a targeted profiling approach using the Profiler and Library Manager modules in the same software which contained 304 total metabolites as references. Each spectrum was reviewed by a separate analyst and a final review was performed on all of the spectra before exporting concentration results. Concentration measurements were adjusted to report metabolite concentrations (µM). In total, 33 metabolites were initially identified and quantified using NMR. However, two of them were excluded due to missing values, resulting in 31 metabolites for further analyses.

In order to collect carcass data, animals were then slaughtered after the feedlot tests at either a commercial plant or the Lacombe Research and Development Centre (LRDC) abattoir when a majority of them reached > 8 mm backfat as predicted from ultrasound measurement. The processes of carcass data collection were previously described39,41,42,43,44,45. In summary, hot carcass weight (HCW) in kg was obtained by summing up the weight of each side of the carcass that was split during dressing, about 45 min post-mortem. Average backfat thickness (AFAT) in mm, rib eye area (REA) in square centimeters, and carcass marbling score (CMAR) at the grading site between the 12th and 13th ribs was assessed by trained personnel. Carcass marbling score was measured as a continuous variable from 100 (trace marbling or less) to 499 (abundant or more marbling) to reflect the amount of fat deposit interspersed between the muscle fibers (i.e., intramuscular fat) of the longissimus thoracis. Lean meat yield (LMY) was calculated as LMY, % = 57.96 + (0.202 × REA, cm2) − (0.027 × HCW, kg) − (0.703 × AFAT, mm) as an estimate of saleable meat in the carcass37.

Animal genotyping, SNP imputation and quality control

DNA was extracted from the blood samples using DNeasy Blood & Tissue Kit (QIAGEN, Ontario, Canada) and then genotyped using the Illumina BovineSNP50 v2 BeadChip (Illumina Inc., CA, USA). For the SNP imputation, a step-wise procedure was applied as described in our previous studies24,40 using Beagle 5.1 software46. Briefly, we first imputed from the 50 K SNPs to the AffyHD panel of 444,558 SNPs with 4,247 animals of mixed beef breeds in the reference population. We then imputed from the imputed AffyHD panel to the whole genome sequence variants with the reference population of 3,093 Bos taurus animals from the 1000 Bull Genomes Project47 (run 7). Finally, 53,258,178 variants (SNPs and indels) on 29 autosomes were obtained after the imputation with average imputation accuracy of 0.97 across the DNA variants with a standard deviation (SD) of 0.08, which was assessed through a five-fold cross-validation as described in our previous studies24,40. For each SNP, post-imputation quality control was then performed to filter out the imputed variant genotypes if one of the following conditions was met: (1): SNPs on 29 autosomes that had an imputation accuracy < 0.95; (2): minor allele frequency < 0.05; (3) SNPs failed to pass the Hardy–Weinberg equilibrium test (P-value < 0.0001). A total of 10,488,742 SNPs remained for subsequent analyses after the quality control.

Regression analyses between carcass merit traits and metabolites and metabolome-genome wide association studies

Phenotypic values of carcass merit traits and metabolites were adjusted for factors including animal type (bull, heifer, steer), birth year, herd, feedlot pen, animal age at slaughter, and breeding composition using linear regression models. The breed composition of each animal was predicted based on their 50 K SNPs using ADMIXTURE software to account for population stratification48,49. The value of K = 6 was chosen because it had the smallest cross-validation error and yielded the most accurate breed composition prediction based on prior knowledge of breed composition on a subset of animals. Residuals of metabolomic and phenotypic data beyond 3 standard deviations from the mean of residuals were considered as outliers and excluded from further analyses. The descriptive statistics of phenotypic data on carcass merit traits and metabolites are shown in Table S11 in Supplementary file 1. In order to determine relationships between carcass merit traits and metabolites, regression analyses were conducted between the adjusted carcass merit traits and the 31 adjusted metabolites using R statistical software. A carcass merit trait and a metabolite were considered to be significantly associated when the regression analyses have a P-value < 0.1.

For the metabolome-genome wide association studies (mGWAS), the adjusted values of metabolites that were significantly associated with the carcass merit traits were the response variable in the single SNP-based mixed linear model association (mlma) as implemented in GCTA software50. The linear mixed model can be described as follows:

$${y}_{ij}=\mu +{b}_{j}{x}_{ij}+{a}_{ij}+{e}_{ij}$$

where \({y}_{ij}\) is the adjusted metabolite value of the \(i\) th animal with the \(j\) th SNP (i.e. the \(ij\) th animal), \({b}_{j}\) is the allele substitution effect of the \(j\) th SNP, \({x}_{ij}\) is the \(j\) th SNP genotype of animal \(i\) coded as 0, 1, 2 for genotypes \({A}_{1}{A}_{1}\), \({A}_{1}{A}_{2}\), and \({A}_{2}{A}_{2}\), respectively, \({a}_{ij}\) is the additive polygenic effect of the \(ij\) th animal \(\sim N(0,{\varvec{G}}{\sigma }_{a}^{2})\), and \({e}_{ij}\) is the random residual effect \(\sim \boldsymbol{ }N(0,\boldsymbol{ }{\varvec{I}}{\sigma }_{e}^{2}\)). The genomic relationship matrix \({\varvec{G}}\) was derived based on total filtered SNP markers (i.e. 10,488,742 SNPs) as described by Yang et al.51, which is essentially the same as the second VanRaden’s formulation52. The SNP allele substitution effect was estimated and the significance test of the SNP allele substitution effect was conducted via a generalized least square F-test as implemented in the GCTA package. The SNPs with P-value < 1 × 10–5 were considered to be significantly associated with the metabolite according to the recommendation of The Wellcome Trust Case Control Consortium53. The phenotypic variance of the metabolite explained by each significant SNP was calculated by \(\frac{2pq{\beta }^{2}}{{S}^{2}}*100\mathrm{\%}\), where \(p\) and \(q\) denote the SNP allele frequency of \({A}_{1}\) and \({A}_{2}\), respectively; \(\beta\) is the SNP allele substitution effect; \(2pq{\beta }^{2}\) is the additive variance of the SNP, and \({S}^{2}\) is the phenotypic variance of the metabolite.

Identification of candidate gene and functional enrichment analyses for carcass merit traits

A 140-kbp window (70-kbp upstream and 70-kbp downstream) of each significant SNP was used to map candidate genes based on ARS-UCD 1.2 bovine genome assembly from the Ensembl BioMart database (accessed in February 2021). The 70-kbp was the chromosomal length within which a high linkage disequilibrium phase correlation (\({r}^{2}\)> 0.77) was maintained across a sample of Canadian beef cattle breeds54.

The Entrez gene IDs were used as gene identifiers and small nucleolar RNA and microRNA were excluded from gene functional enrichment analyses. Bovine genes were changed to known human orthologous genes from Ensembl, whereas for those genes without human orthologs their bovine gene IDs were maintained in the gene list. Then candidate genes of all metabolites associated with the carcass merit traits (HCW, REA, AFAT, LMY or CMAR) as identified in the regression analyses between carcass merit traits and metabolites were combined and imported into the Ingenuity Pathway Analysis software (accessed in February 2021) (IPA; www.Ingenuity.com). Significantly enriched molecular and cellular biological functions and gene networks (P-value < 0.05) for each carcass merit trait were inferred and gene-sub-biological function/process interactions within the most significant molecular and cellular functions were predicted in the IPA.