Introduction

Acetaminophen is the prototypical hepatotoxin and has been used for numerous hepatotoxin models [1]. The drug is responsible for significant liver injury following overdose, and occasionally after therapeutic dosing. This has led to several labs to use ‘omic techniques to examine the pathophysiology of hepatic injury due to acetaminophen in order to learn new insights of he mechanisms of injury and hepatic regeneration [2,3,4,5]. The increased availability of ‘omic techniques has yielded numerous datasets with multi’omic data, though this has not yet been performed with acetaminophen. There is a strong belief that integration of these increased data will lead to new understanding of disease, drug therapy, and ultimately improve patient care. In fact, integrative analyses of proteomic and genomic data from patients with colon and rectal cancers have identified new associated genes, thereby providing a functional discovery framework of new disease pathways [6]. Metabolomic and proteomic markers can predict sepsis death with high accuracy [7] and clinical data paired with epigenomic data can predict clinical outcomes in patients with COVID-19 at initial presentation using machine learning techniques [8]. Integrative programs have been developed [9] and statistical packages have outlined pipelines for data analysis [10]. With the exception of the COVID-19 epigenomic work, these pipelines are remarkably manual in their data curation, which limits scalability of these approaches. New pipelines are needed to adequately leverage more than one ‘omic dataset.

Capturing more than one ‘omic dataset on a cohort raises questions as to which dataset takes priority in a pipeline and introduces difficulty in determining statistical and clinical significance of the linked findings. Fully agnostic approaches are hindered by multiple testing limitations which raise the number of patients that must be examined. These analytic problems become exacerbated further when clinical phenotypes are polygenic and multifactorial. These complex phenotypes require thousands of patients to be effectively examined with currently available multi’omic pipeline methods. Analytic methods that allow one ‘omic dataset to advise the examination of another, when aligned with the biologic underpinnings of disease or drug mechanism, can lead to findings that are more likely to be clinically significant. New multi’omic pipelines can maximize the biologic understanding of clinical conditions by allowing triangulation of findings generated from one technique in the other as well as reduction of sample size needed for targeted discovery in the second dataset. Additionally, new integrative pipelines can minimize the chance of type 1 error.

Toward that end, we have utilized a well characterized multi’omic dataset containing detailed clinical observations, metabolomic data, and genomic data to develop a pipeline for multi’omic mosaic modeling of acetaminophen induced alanine aminotransferase (ALT) elevation. Utilizing ‘omics to characterize acetaminophen induced liver injury can lead to mechanistic understanding of ALT elevation triangulated by the genomic and metabolomic data generated. Integration of these datasets gives a more rich understanding of which metabolites associated with ALT elevation are under genetic control, and thus, may predict ALT elevation prior to drug administration. The objective of this study is to determine new genes associated with ALT elevation using a novel integrated metabolomic and genomic multi’omic method.

Methods

Patients and Parent Study

The parent study was an outpatient placebo-controlled trial (NCT00743093) in which patients received therapeutic doses of acetaminophen, totaling 4 g/day [11]. Compliance was verified by study diary, pill counts at each study visit, and confirmation of acetaminophen in the plasma. ALT was drawn on study days 0 (baseline), 4, 7, 10, 13 and 16 and, when elevated, until it returned to baseline. Patients did not change their diet throughout the study. ALT is used as the marker of hepatocyte death and is specific for hepatocellular injury. All subjects provided informed consent. We included male and female subjects who were age 18 years or older and who did not have any of the following exclusion criteria: history of acetaminophen ingestion on any of the four days preceding study enrollment or a measurable serum acetaminophen concentration at time of enrollment; laboratory testing suggesting active viral hepatitis A, B, or C infection; any of the following tests greater than the upper limit of normal at screening: serum aminotransferase or total bilirubin, International Normalized Ratio (INR) or alkaline phosphatase activity; platelet count less than 125,000/mL; positive pregnancy test; history of cholelithiasis (without cholecystectomy); history of heavy ethanol use defined as consuming more than an average of 3 alcohol containing drinks daily or 3 or more alcohol containing drinks on any given day over the preceding 2 weeks prior to study enrollment; new prescription medication started within the previous 30 days; taking isoniazid or warfarin; currently has anorexia nervosa or reports a fasting type diet; clinically intoxicated, psychiatrically impaired or unable to give informed consent for any reason; known hypersensitivity or allergy to acetaminophen. Only the patients receiving APAP were included in these analyses. All blood samples were stored at -80 °C and there was only one thaw prior to ‘omic analyses.

Metabolomic and Genomic Analyses

Plasma samples were extracted and 164 metabolites relevant to know biologic pathways and acetaminophen drug mechanism were measured at day 0, as previously described [12]. Quantitative metabolite levels were determined for each patient and used for statistical analyses [12]. DNA was extracted from patient whole blood and run on the Illumina MEGA array as previously described [4]. Genotyping at 2.2 million loci was determined. Imputation was not performed to characterize non-genotyped loci. Metabolomic data represent the end-stage products of upstream biologic pathways and thus, is the closest ‘omic technique to the clinical phenotype examined. Genomic data represent the underlying code that drive this downstream pathways. Thus, linking these two ‘omic datasets can capture the span of biologic activity from code to phenotype.

Statistical Analysis

We employed a one-sample colocalization analysis to identify metabolites under genetic control associated with ALT elevation. This approach can be conceptualized as an extension of the BADGERS approach [13] where metabolites are leveraged as intermediate phenotypes. Such an approach has previously been applied to identify brain imaging endophenotypes associated with Alzheimer’s disease [14]. This approach can be conceptualized in two ways: first, as an extension of the transcriptome-wide association study framework to additional mediating phenotypes (i.e., not limited to gene expression), and as an extension of a Mendelian randomization framework where polygenic scores, rather than single nucleotide variants (SNVs), are used to quantify the SNV-exposure relationship. Although typical transcriptome wide analysis studies (TWAS) type analyses leverage two samples (one for the exposure and one for the outcome), we note that one-sample colocalization approaches are employed in literature, and generally shown to maintain good operating characteristics [15]. Given the difficulty of rigorously assessing the instrumental variable assumptions required to draw causal conclusions about the relationship between SNVs, metabolites, and ALT elevation in a one-sample study of small size, we do not draw causal conclusions from our analysis regarding the relationships between SNVs, metabolites, and ALT elevation. We claim that significant TWAS-type associations in this analysis are evidence of colocalization between the metabolite and ALT elevation in our sample, corresponding to the original interpretation of TWAS [16].

We conducted a metabolite wide colocalization scan to associate the genetically regulated component of metabolite expression with ALT elevation. This framework can detect shared genetic control of ALT elevation and metabolite level, and thus is distinct from a ‘differential expression’ metabolite analysis, whereby observed metabolite level is associated with ALT elevation. In this framework, we first model the genetically regulated component of metabolite level by estimating a machine learning model to predict metabolite level from SNVs. This model provides coefficient estimates for each SNV. We then weight the SNV-ALT association statistics (Z statistics) by the SNV-metabolite coefficients to conduct a ‘weighted sum’ or TWAS type test. This can be interpreted as a test for colocalization of ALT and metabolite regulation, and may be interpretable as a causal mediation (i.e. the causal pathway whereby SNVs causally modulate metabolite level, and metabolite levels causally modulate ALT elevation).

To quantify ALT elevation, we used the highest measured ALT value across the study. To quantify metabolite level, we used the observed value at day zero. Genome-wide association study (GWAS) analyses were conducted for ALT elevation and metabolite level using linear regression, with age, sex, and the first five principal components included as covariates. To model the genetically regulated component of metabolite level, we estimated an elastic net penalized regression model. Only the top 50,000 SNPs were included in the elastic net modeling to reduce computational complexity; given the relatively few number of nonzero SNPs selected (< 1,000 for all metabolites), we do not believe that this screening step meaningfully affects model accuracy.

Elastic net models were fit and assessed as follows. Data was split into training and testing sets, with the training set comprising 75% of the sample. fivefold cross-validation was performed on the training sample and accuracy was assessed on the testing sample. This training–testing split was performed ten times (thus entailing ten separate cross-validation procedures). Tuning parameters were chosen to be the set of parameters that produced the best testing accuracy most frequently. After tuning parameters were selected, the model was re-fit to the full dataset. Coefficient estimates from this final model were used for the ensuing metagenome-wide association study (MWAS) test. Only models with average testing accuracy > 0.001 were considered for MWAS tests, thus reducing weak-instrument bias.

We note the potential risk associated with overfitting a machine learning model to the SNV-metabolite relationship in such a one-sample analysis. Namely, an overfit model would predict some of the non-heritable metabolite variance, thus potentially inducing false positive colocalization associations. Although our rigorous cross-validation procedure reduces the likelihood of overfitting, it does not preclude the possibility. To reduce the likelihood of overfitting, we use a simple clumped and thresholded polygenic risk score to model the SNV-metabolite relationship and likewise conduct TWAS-type tests. We compare these associations to those identified via the elastic net framework. Polygenic risk scores were estimated in PLINK using a threshold p-value of 0.01 and a Linkage Disequilibrium (LD) \({R}^{2}\) cutoff of 0.1.

Results

Data from 192 patients taking 4 g of APAP daily were included in this analysis. All metabolites with cross-validated prediction \({R}^{2}>0.01\) were retained in the stage two association analysis. Out of the 164 metabolites modeled, 120 met the criteria for predictive accuracy and were retained (Fig. 1).

Fig. 1
figure 1

Histogram showing the cross-validated accuracy of the elastic net models predicting metabolite level from SNVs. Black vertical line denotes the 0.01 cutoff for retaining metabolites in the stage 2 analysis. The y axis is the count of metabolites.

A metabolite-ALT association was determined to be significant if the TWAS-type association test statistic was associated at a type I error rate of 0.05 with a Bonferroni correction for the number of metabolites. Eight such associations were identified via both the elastic net TWAS-type approach and the pruning and thresholding approach. These metabolites are listed in Table 1. The correlation between the negative log of the p-values for the elastic net and the pruning and thresholding tests was 0.94, indicating a strong degree of agreement between the two approaches (Fig. 2). Given that the pruning and thresholding approach should be less prone to overfitting, this is evidence that overfitting of the stage 1 models is not driving the identified associations.

Table 1 Significant metabolite-ALT elevation associations for the TWAS-type tests.
Fig. 2
figure 2

Scatterplot of the negative log base 10 of p-values from the elastic net TWAS-type test (x-axis) and the negative log base 10 of p-values from the pruning and thresholding TWAS-type test (y axis).

Genetically Mediated Metabolites

This multi’omic analysis identified eight metabolites under genetic control predictive of ALT elevation due to therapeutic acetaminophen (Table 1). These metabolic pathways are associated with mitochondrial energy production and hepatocyte injury and the pathways will be outlined below. Interestingly, some but not all, of these metabolites were identified with our isolated metabolomic examination of these patients [12]. This highlights the ability of this multi’omic method to identify new genes for interrogation using an integrated multi’omic dataset.

3-oxalomalate is a metabolite that interacts with the of the tricarboxylic acid cycle (TCA). This metabolite was not previously identified as associated with ALT elevation in our prior analyses. Some organisms, and some humans possess a truncated version of the TCA cycle, known as the glyoxylate-bypass, that converts acetyl CoA to biosynthetic intermediates without the loss of carbon dioxide [17,18,19]. While in most eukaryotes the conversion of malate to oxalacetic acid is catalyzed only by an NAD-dependent enzyme (mitochondrial malate dehydrogenase [20]), prokaryotes that employ this variation of the TCA cycle possess an alternative quinone-dependent enzyme, RXNI-3 [21]. Variants in this gene can lead to decreased ATP production, and subsequently decreased mitochondrial energy production, and may be associated with hepatocyte death in patients taking acetaminophen.

Allantoate represents a metabolite of the uric acid cycle, and the production of ammonia. This would be expected in the setting of hepatocyte death. This metabolite is under polygenic control [22] though ALLC (allantoicase) regulates allantoate breakdown and urea production. Failure to metabolize allantoate leads to mitochondrial energy dysfunction under stress. We had identified this metabolite in the past [12] and now have a genetic focus to examine.

Dimethylglycine is a metabolite found in or produced by E. coli bacteria and has been associated with hepatomegaly and ALT elevation when mice are stressed with lipopolysacchride [23]. This metabolite is also a precursor to glycine which is a critical precursor to glutathione. This metabolite is under the control of the human gene, DMGDH (dimethylglycine dehydrogenase), The enzyme is found as a monomer in the mitochondrial matrix and uses flavin adenine dinucleotide and folate as cofactors. Mutation in this gene causes dimethylglycine dehydrogenase deficiency [24] and which limits production of glycine, subsequently decreasing glutathione. This represents a down-regulated portion of the glutathione pathway in hepatocytes that has not been associated with hepatic injury due to acetaminophen in the past.

Diphosphate, or pyrophosphate, is present in a wide range of dietary compounds. In humans, diphosphate is involved in several metabolic pathways, some of which include cardiolipin biosynthesis. Diphosphate is also involved in several metabolic disorders, some of which include hyperinsulinism-hyperammonemia syndrome, methionine adenosyltransferase deficiency, mevalonic aciduria, and g(m2)-gangliosidosis: variant B, Tay-Sachs disease [25]. These diseases are all genetically mediated, though they involve different genes and thus, diphosphate control associated with ALT elevation is likely polygenic. Given the polygenic nature of this metabolite, it is unclear exactly which genetic region or regions are associated with this clinical finding.

L-carnitine is elevated when the mitochondrial shuttle is damaged and hepatocyte death occurs. Long chain fatty acids are transferred via the mitochondrial l-carnitine shuttle leading to energy production and this rate-limiting enzyme controls a key regulatory step for fatty acid β-oxidation [26]. Genes involved in regulation of this metabolite include SLC25A20, CPT1A, CPT1B, CPT2, and CPT1C [27]. SLC35E4 has been identify in past transcriptomic work in patients that have elevated ALT due to therapeutic APAP ingestion [2].

L-proline is a cyclic, nonessential amino acid in humans that is a constituent of many proteins. This metabolite is a breakdown product of protein, and thus, while it is likely under some genetic control, it is more likely to be elevated when there is protein breakdown [28]. There are numerous genes associated with this metabolite since it is a product of protein breakdown and the genes are spread across the genome.

Maltose elevation has been associated with fatty liver disease [29]. This metabolite was found in our prior metabolomic work and may represent alteration in starch and sucrose metabolism which puts hepatocytes at risk. Fatty liver disease is known have over 100,000 genes associated and genes associated with this metabolite are spread across the genome making it difficult to identify which specific gene is responsible for the association in this cohort.

Ornithine has a known role as a hepatoprotective agent. Mutations in the mitochondrial ornithine transporter result in hyperammonemia, hyperornithinemia, homocitrullinuria (HHH) syndrome, a disorder of the urea cycle. The pathophysiology of the disease may involve diminished ornithine transport into mitochondria, resulting in ornithine accumulation in the cytoplasm and reduced ability to clear carbamoyle phosphate and ammonia loads. This transfer is mediated by the mitochondrial ornithine transporters SLC25A15, AF112968, and ORNT1 [30]. Diminished ornithine transport most certainly leads to hepatocyte death. Genes involved in conversion of ornithine into downstream metabolites include, OAT, OTC, ODC1 [31].

Discussion

In this multi ‘omic analysis we have identified several metabolites under genetic control associated with acetaminophen induced ALT elevation. This analysis builds upon our prior work that utilized isolated metabolomic and targeted genetic analyses and builds a multi’omic pipeline that leverages the power of both datasets to identify the genetic underpinning of these metabolic associations. This technique allows for identification of new pathways, and the genes therein, that contribute to clinical outcomes. Prior studies have used the TWAS approach to link mRNA variability to the genes associated [14], though this is the first demonstration of a similar approach linking metabolomic data to genomic data. Typical multi’omic studies that integrate metabolomic and genomic data require large amounts of time to curate [32]. This technique offers the advantage of rapid distillation of the most pertinent metabolites and allows targeting of likely genetic etiologies advised by the clinical data. This technique is similar to a polygenic risk score within the metabolite pathways. Validation of findings in a separate cohort is still necessary though this allows far fewer patients to be enrolled and targeted genetic analysis in the subsequent study population. The importance of this new pipeline is the ability to integrate two ‘omic datasets together rapidly, the ability to triangulate findings in the second dataset with decreased sample size and subsequently, increased power, and decreased risk of type 1 error. This pipeline approach can increase the biologic understanding of the mechanisms of disease, drug effectiveness, safety, and maximize the likelihood of reproducibility.

Some, but not all metabolites agree with our prior metabolomic work. This is encouraging as it further demonstrates the validity of the prior metabolomic analyses but demonstrated this complementary method for identification of next genes to interrogate. In the past, we had identified allantoate, maltose, and ornithine [12]. These metabolite changes represent energy regulation and mitochondrial dysfunction that lead to hepatocyte death. New metabolite pathways under genetic control were 3-oxalomalate, diphosphate, L-carnitine, and L-proline. The genes in these pathways are now targeted for further exploration to determine if a single SNV or a collection of SNVs results in the genetic association found through this work. Additionally, we have identified the upstream metabolite, dimethylgycine which is regulated by the gene DMGDH. This may be the etiology of the lower glutathione levels in those that had ALT elevation [12].

This dataset is unique compared to other acetaminophen liver injury data. These patients ingested therapeutic APAP and were followed with serial blood work over time. This allows for examination of the mechanisms of transient liver injury due to a drug insult and the mechanisms of hepatic regeneration. Past genomic studies have identified a range of variants differentiating outcomes in overdose patients including UGT2B15, UGT1A1, CYP3A5, CD44, and CYP2E1 [33,34,35]. In those studies, there was not adequate confirmation of acetaminophen overdose and the mechanisms of liver injury in overdose are likely different at therapeutic ingestions making it difficult to apply those data to patients taking the drug appropriately. One well conducted transcriptomic study that included 54 patients (10 placebo) ingesting therapeutic APAP for 14 days and 5 overdose patients revealed alterations in RNA transcripts in genes associated with mitochondrial energy production (ZBTB16) [2]. We also found a variant in the SLC ornithine transporter gene identified by this group [2]. We have confirmed these findings in this multi’omic analysis. It is not surprising that we did not identify metabolites associated with immune function, as was observed in our and others targeted genetic analyses [2, 36]. There may not be specific metabolites associated increased immune response as the response leads to cell death which may be further upstream.

Limitations

Limitations of this approach include the potential for overfitting. This limitation was mitigated by employing two methods to prevent this, pruning and thresholding. Additionally, this method will not identify the specific SNPs that yield the metabolite alteration, but rather the pathway and the genes that may be involved. The investigator must still examine each gene for deleterious variants and follow with targeted genetic analyses. Our genetic platform did not include full sequence data, but rather 2.2 million SNPs. Thus, we may miss variants in intergenic regions which are felt to be increasingly important in regulation of expression. We opted not to perform imputation to limit the risk of overfitting and decrease type 1 error. This cohort is not large enough to determine rare variants present at less than 1% frequency in the population.

Conclusions

This novel work uses a multi’omic approach to identify genes responsible for metabolomic patterns in patients with elevated ALT due to therapeutic APAP ingestion. This analytic technique can be used to leverage multi’omic datasets to determine which metabolites are under genetic regulation.