Co-expression analysis reveals interpretable gene modules controlled by trans-acting genetic variants

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Understanding the causal processes that contribute to disease onset and progression is essential for developing novel therapies. Although trans-acting expression quantitative trait loci (trans-eQTLs) can directly reveal cellular processes modulated by disease variants, detecting trans-eQTLs remains challenging due to their small effect sizes. Here, we analysed gene expression and genotype data from six blood cell types from 226 to 710 individuals. We used co-expression modules inferred from gene expression data with five methods as traits in trans-eQTL analysis to limit multiple testing and improve interpretability. In addition to replicating three established associations, we discovered a novel trans-eQTL near SLC39A8 regulating a module of metallothionein genes in LPS-stimulated monocytes. Interestingly, this effect was mediated by a transient cis-eQTL present only in early LPS response and lost before the trans effect appeared. Our analyses highlight how co-expression combined with functional enrichment analysis improves the identification and prioritisation of trans-eQTLs when applied to emerging cell-type-specific datasets.

Introduction

Genome-wide association studies have been remarkably successful at identifying genetic variants associated with complex traits and diseases. To enable pharmacological and other interventions on these diseases, linking associated variants to causal intermediate phenotypes and processes is needed. A canonical example is the causal role of circulating LDL cholesterol in cardiovascular disease (Ference et al., 2012). However, discovering clinically relevant intermediate phenotypes has so far remained challenging for most complex diseases. At the molecular level, cis-acting gene expression quantitative trait loci (cis-eQTLs) can be used to identify putative causal genes at disease-associated loci, but due to widespread co-expression between neighbouring genes (Wainberg et al., 2019) and poor understanding of gene function, these approaches often identify multiple candidates whose functional relevance for the disease is unclear.

A promising approach to overcome the limitations of cis-eQTLs is trans-eQTL analysis linking disease-associated variants via signalling pathways and cellular processes (trans-acting factors) to multiple target genes. Although trans-eQTLs are widespread (Võsa et al., 2018), most transcriptomic studies in various cell types and tissues are still underpowered to detect them (Aguet et al., 2019). This is due to limited sample sizes of current eQTL studies, small effect sizes of trans-eQTLs, and the large number of tests performed (>10⁶ independent variants with >10⁴ genes). To reduce the number of tested phenotypes, co-expression analysis methods are sometimes used to aggregate individual genes to co-expressed modules capturing signalling pathways and cellular processes (Stein-O'Brien et al., 2018). Such approaches have been successful in identifying trans-eQTLs in yeast (Parts et al., 2011) as well as various human tissues (Hore et al., 2016; Mao et al., 2019; Nath et al., 2017) and purified immune cells (Ramdhani et al., 2020; Rotival et al., 2011). An added benefit of co-expression modules is that they can often be directly interpreted as signatures of higher level cellular phenotypes, such as activation of specific signalling pathways or transcription factors (Parts et al., 2011; Way et al., 2020).

Gene co-expression modules can be detected with various methods. Top-down matrix factorisation approaches such as independent component analysis (ICA) (Hyvärinen and Oja, 2000), sparse decomposition of arrays (SDA) (Hore et al., 2016) and probabilistic estimation of expression residuals (PEER) (Stegle et al., 2012) seek to identify latent factors that explain large proportion of variance in the dataset. In these models, a single gene can contribute to multiple latent factors with different weights. In contrast, bottom-up gene expression clustering methods such as weighted gene co-expression network analysis (WGCNA) (Langfelder and Horvath, 2008) seek to identify non-overlapping groups of genes with highly correlated expression values. Recently, both matrix factorisation and co-expression clustering methods have been further extended to incorporate prior information about biological pathways and gene sets, resulting in pathway-level information extractor (PLIER) (Mao et al., 2019) and funcExplorer (Kolberg et al., 2018), respectively. Out of these methods, ICA, WGCNA, SDA and PLIER have previously been used to find trans-eQTLs for modules of co-expressed genes (Hore et al., 2016; Mao et al., 2019; Nath et al., 2017; Ramdhani et al., 2020; Rotival et al., 2011), but only a single method at a time. However, since different methods solve distinct optimisation problems, they can detect complementary sets of co-expression modules (Stein-O'Brien et al., 2018), with recent benchmarks demonstrating that there is no single best co-expression analysis method (Way et al., 2020). Thus, applying multiple co-expression methods to the same dataset can aid trans-eQTL detection by identifying complementary sets of co-expression modules capturing a wider range of biological processes (Way et al., 2020).

Another aspect that can influence co-expression module detection is how the data is partitioned prior to analysis (Stein-O'Brien et al., 2018). This is particularly relevant when data from multiple cell types or conditions is analysed together. When co-expression analysis is performed across multiple cell types or conditions, then the majority of detected gene co-expression modules are guided by differential expression between cell types (Quach et al., 2016; van Dam et al., 2018). As a result, cell-type-specific co-expression modules can be missed due to weak correlation in other cell types (van Dam et al., 2018). One strategy to recover such modules is to perform co-expression analysis in each cell type separately (Stein-O'Brien et al., 2018).

In this study, we performed comprehensive gene module trans-eQTL analysis across six major blood cell types and three stimulated conditions from five published datasets. To maximise gene module detection, we applied five distinct co-expression analysis methods (ICA, PEER, PLIER, WGCNA, funcExplorer) to the full dataset as well as individual cell types and conditions separately. Using a novel aggregation approach based on statistical fine mapping, we grouped individual trans-eQTLs to a set of non-overlapping loci. Extensive follow-up with gene set and transcription factor motif enrichment analyses allowed us to gain additional insight into the functional impact of trans-eQTLs and prioritise loci for further analyses. In addition to replicating two known monocyte-specific trans-eQTLs at the IFNB1 (Fairfax et al., 2014; Quach et al., 2016; Ramdhani et al., 2020; Ruffieux et al., 2018) and LYZ loci (Fairfax et al., 2012; Rakitsch and Stegle, 2016; Rotival et al., 2011), we found that the trans-eQTL at the ARHGEF3 locus detected in multiple whole blood datasets (Mao et al., 2019; Nath et al., 2017; Rotival et al., 2011; Wheeler et al., 2019) was highly specific to platelets in our analysis. Finally, we also detected a novel association at the SLC39A8 locus that controlled a group of genes encoding zinc-binding proteins in LPS-stimulated monocytes.

Results

Cell types, conditions and samples

We used gene expression and genotype data from five previously published studies from three independent cohorts (Fairfax et al., 2014; Fairfax et al., 2012; Kasela et al., 2017; Momozawa et al., 2018; Naranbhai et al., 2015). The data consisted of CD4+ and CD8+ T cells (Kasela et al., 2017; Momozawa et al., 2018), B cells (Fairfax et al., 2012; Momozawa et al., 2018), neutrophils (Momozawa et al., 2018; Naranbhai et al., 2015), platelets (Momozawa et al., 2018), naive monoctyes (Fairfax et al., 2014; Momozawa et al., 2018) and monocytes stimulated with lipopolysaccharide for 2 or 24 hr (LPS 2 hr, LPS 24 hr) and interferon-gamma for 24 hr (IFNγ 24 hr) (Fairfax et al., 2014). The sample size varied from n = 226 in platelets to n = 710 in naive monocytes (Figure 1A). After quality control, normalisation and batch correction (see ‘Materials and methods’), the final dataset consisted of 18,383 unique protein coding genes profiled in 3938 samples from 1037 unique genotyped individuals of European ancestries (Figure 1B). Even though the samples originated from five different studies, they clustered predominantly by cell type of origin (Figure 1B).

Figure 1 with 6 supplements see all

Download asset Open asset

Data, analysis workflow and results.

(A) Sample sizes of cell types and conditions included in the analysis. LPS - lipopolysaccharide, IFNg - interferon-gamma. (B) Multidimensional scaling (MDS) analysis of the gene expression data and principal component analysis (PCA) of genotype data after quality control and normalisation. Cell types and conditions are colour-coded according to panel A. Genotyped samples from this study have been projected to the 1000 Genomes Project reference populations. (C) Following quality control, five co-expression methods were applied to two different data partitioning approaches: (1) gene expression profiles across all cell types and conditions were analysed together (integrated approach), (2) gene expression profiles from each cell type and condition were analysed separately (separate approach). (D) The number of gene modules detected from integrated and separate analyses. (E) For *trans*-eQTL analysis we used the estimated module activity profile (‘eigengene’) as our phenotype. To identify independent *trans*-eQTLs, we performed statistical fine mapping for all nominally significant (p-value<5×10⁻⁸) associations and grouped together all associations with overlapping credible sets. (F) Manhattan plot of nominally significant (p-value<5×10⁻⁸) *trans*-eQTLs. Each point corresponds to a gene module that was associated with the corresponding locus and is colour-coded by the cell type from panel A.

Detecting trans-eQTLs regulating modules of co-expressed genes

We performed co-expression analyses with ICA, WGCNA, PLIER, PEER and funcExplorer on the full gene expression dataset (integrated approach) as well as on each cell type and condition separately (separate approach) (Figure 1C). In total, we obtained 482 gene modules from the integrated approach and 3509 from the separate clustering of different cell types (Figure 1D; Figure 1—figure supplement 1). For every module, the methods inferred a single characteristic expression pattern (‘eigengene’) that represents the expression profiles of the module genes across the samples. Although implementation details varied between methods (see ‘Materials and methods’), these eigengene profiles were essentially linear combinations of expression levels of genes belonging to the modules.

The number of detected modules and their sizes varied due to the properties and the default parameters of each method (Figure 1D). Although matrix factorisation approaches generally identified larger modules than clustering methods (Figure 1—figure supplement 1), this is confounded by the fact that assigning genes to modules in matrix factorisation is fuzzy and requires the specification of arbitrary thresholds. Nevertheless, even though the number of modules for PEER and PLIER were initialised with identical values, PLIER consistently detected more modules with each module containing slightly fewer genes (Figure 1D, Figure 1—figure supplement 1). Similarly, funcExplorer detected more modules than WGCNA (Figure 1D) probably because funcExplorer was able to detect modules containing fewer genes (minimum of 5 versus 20 genes) if these were supported by functional enrichment (Figure 1—figure supplement 1).

For trans-eQTL analysis, we included 6,861,056 common (minor allele frequency >5%) genetic variants passing strict quality control criteria. First, we used linear regression implemented in MatrixEQTL (Shabalin, 2012) package to identify all genetic variants nominally associated (p-value<5×10⁻⁸) with the eigengenes of each of the 3991 co-expression modules detected across nine cell types and conditions. We performed trans-eQTL analysis in each cell type and condition separately. Next, we used SuSiE (Wang et al., 2018) to fine map all significant associations to 864 independent credible sets of candidate causal variants (Figure 1E). Since we applied five co-expression methods to both integrated and cell-type-specific (separated) datasets, we found a large number of overlapping genetic associations. We thus aggregated overlapping credible sets from 864 associations to 601 non-overlapping genomic loci (Figure 1—figure supplement 2; see ‘Materials and methods’). We observed that some, especially smaller, co-expression modules were driven by strong cis-eQTL effects that were controlling multiple neighbouring genes in the same module. To exclude such effects, we performed gene-level eQTL analysis for 18,383 protein-coding genes and the 601 lead variants identified above. We excluded co-expression modules where the module lead variant was not individually associated with any of the module genes in trans (>5 Mb away) and the overlap between the module genes and individually mapping trans genes was not significant according to the one-sided Fisher’s exact test (Bonferroni adjusted p-value<0.05) (see ‘Materials and methods’). This step reduced the number of nominally significant trans-eQTL loci to 247 (Figure 1F; Supplementary files 1–2). Finally, to account for the number of co-expression modules tested, we used both Benjamini-Yekutieli false discovery rate (BY FDR) and Bonferroni correction (see ‘Materials and methods’). The BY FDR 10% threshold reduced the number of significant associations to 38 and Bonferroni threshold retained only three significant loci, including loci near IFNB1 (Figure 1—figure supplement 3) and LYZ (Figure 1—figure supplement 4) genes that have been previously reported in several other studies (Fairfax et al., 2014; Fairfax et al., 2012; Quach et al., 2016; Rakitsch and Stegle, 2016; Rotival et al., 2011; Ruffieux et al., 2018; Table 1). While the strong trans-eQTL signals at the IFNB1 and LYZ loci were detected by all co-expression methods in both integrated and separate analyses, most associations were detected by only a subset of the analytical approaches (Supplementary file 1).

Table 1

Literature-based replication of trans-eQTL loci near IFNB1, LYZ and ARHGEF3 genes.

Linkage disequilibrium (r²) was calculated using European samples from the 1000 Genomes Phase 3 reference panel. The last column indicates if any of the associated modules had a significant overlap with the genes reported by the independent study according to one-sided Fisher’s exact test after Bonferroni correction. The overlaps with individual modules are shown in Supplementary file 5. GHS - Gutenberg Health Study, FHS - Framingham Heart Study, CTS - Cardiogenics Transcriptomic Study, * - largest observed r² in the credible set.

trans-eQTL			Replication
Locus	Lead rs ID	Context	Study	Dataset	Context	rs ID	r²	Replication variant in credible set	Significant overlap with a module
IFNB1	rs13296842	Monocytes LPS 24 hr	Fairfax et al., 2014	Fairfax_2014	Monocytes LPS 24 hr	rs2275888	0.57 (0.86*)	FALSE	-
			Quach et al., 2016	Quach_2016	Monocytes LPS 6 hr	rs12553564	0.57 (0.86*)	FALSE	TRUE
			Ramdhani et al., 2020	Fairfax_2014	Monocytes LPS 24 hr	rs2275888	0.57 (0.86*)	FALSE	-
			Ruffieux et al., 2018	Fairfax_2014	Monocytes LPS 24 hr	rs3898946	0.88	TRUE	-
LYZ	rs10784774	Monocytes naive, LPS 2 hr, LPS 24 hr, IFNγ 24 hr	Rotival et al., 2011	GHS	Monocytes	rs11177644	0.79	TRUE	TRUE
			Fairfax et al., 2012	Fairfax_2012	Monocytes	rs10784774	1	TRUE	-
			Rakitsch and Stegle, 2016	CTS	Monocytes	rs6581889	0.79	TRUE	TRUE
ARHGEF3	rs1354034	Platelets	Võsa et al., 2018	eQTLGen	Blood	rs1354034	1	TRUE	TRUE
			Mao et al., 2019	Battle_2014	Blood	rs1354034	1	TRUE	-
			Rotival et al., 2011	GHS	Monocytes	rs12485738	0.6	FALSE	-
			Rotival et al., 2011	GHS	Monocytes	rs1344142	0.6	TRUE	-
			Wheeler et al., 2019	FHS	Blood	-	-	-	-
			Nath et al., 2017	DILGOM07	Blood	rs1354034	1	TRUE	TRUE

To characterise the general interpretability of the associated modules, we performed functional enrichment analysis for all modules associated with the 247 nominally significant loci (Supplementary file 3). We found that 97% of the associated modules were enriched with at least one biological function from Gene Ontology, Reactome or KEGG. In contrast, in the gene-level analysis, only 86% of the loci showed significant enrichment in at least one tested cell type. However, this discrepancy could be partly due to the fact that gene-level analysis results in fewer associated genes, thus reducing the power to detect significant enrichments. Moreover, funcExplorer and PLIER modules are based on known gene annotations and are therefore expected to have high levels of enrichment by definition. We will now dissect two loci with interesting functional enrichment patterns in more detail.

Platelet-specific trans-eQTL at the ARHGEF3 locus is associated with multiple platelet traits

We found that the rs1354034 (T/C) variant located within the ARHGEF3 gene is associated with three co-expression modules in platelets: one ICA module detected in integrated analysis (IC68, 1074 genes) and two co-expression modules detected in a platelet-specific analysis by PLIER (X6.WIERENGA_STAT5A_TARGETS_DN, 918 genes) and funcExplorer (Cluster_12953, five genes) (Figure 2B, Figure 2—figure supplement 1). The T allele increases the expression of the ARHGEF3 gene in cis and the two lead variants are the same (Figure 2A). Furthermore, both the cis and trans-eQTLs colocalise with a GWAS hit for mean platelet volume (cis PP4 = 0.99, trans PP4 >0.99 for all modules), platelet count (cis PP4 = 0.99, trans PP4 >0.99 for all modules) and plateletcrit (trans PP4 >0.99 for all modules) (Figure 2A; Astle et al., 2016). Interestingly, ARHGEF3 itself is not in any of the three modules and the module eigengenes are not strongly co-expressed with ARHGEF3 (Pearson’s r ranging from 0.07 to 0.33 in platelets). While IC68 and X6.WIERENGA_STAT5A_TARGETS_DN share 74 overlapping genes (one-sided Fisher’s exact test p-value=0.003), none of the genes in Cluster_12953 is in any of the other modules.

Figure 2 with 3 supplements see all

Download asset Open asset

Platelet-specific *trans*-eQTL at the *ARHGEF3 locus*.

(A) Regional plots showing colocalisation between GWAS signal for mean platelet volume (Astle et al., 2016), *cis*-eQTL for *ARHGEF3* in platelets and *trans*-eQTL for a platelet-specific co-expression module detected by PLIER. *Cis* and *trans* credible sets (cs) are marked on the plots. The *cis* credible set consists of only the lead variant (rs1354034), which occludes the orange highlight. (B) Line graph showing that the association between the modules and *ARHGEF3* locus is platelet specific. In cell-type-specific clustering, only a single p-value from the corresponding cell type is available. The integrated modules have p-values from each of the cell types and the values are connected by a line. (C) Association between the *trans*-eQTL lead variant (rs1354034) and eigengene of module X6.WIERENGA_STAT5A_TARGETS_DN in platelets. (D) Association between the *trans*-eQTL lead variant (rs1354034) and *ARHGEF3* expression in platelets. (E) Manhattan plot of gene-level eQTL analysis for the *trans*-eQTL lead variant. Dark blue points highlight the genes in module X6.WIERENGA_STAT5A_TARGETS_DN. Light blue points show significantly associated genes (variant-level Benjamini-Hochberg FDR 5%) not included in the module. (F) Functional enrichment analysis of modules associated with *ARHGEF3* locus (see full results at https://biit.cs.ut.ee/gplink/l/CY6ZukXhSq). Empty cell indicates that no gene in the module is annotated to the corresponding term, enrichment p-value=1 shows that at least some of the genes in the module are annotated to the term, but not enough to report over-representation. The last column combines the FDR 5% significant genes from the gene-level analysis. The table shows adjusted enrichment p-values. GO - Gene Ontology, KEGG - Kyoto Encyclopedia of Genes and Genomes Pathways, REAC - Reactome Pathways.

Although the ARHGEF3 trans-eQTL has been detected in multiple whole blood trans-eQTL studies (Mao et al., 2019; Nath et al., 2017; Võsa et al., 2018; Wheeler et al., 2019; Table 1), our analysis demonstrates that this association is highly specific to platelets and not detected in other major blood cell types (Figure 2B). Furthermore, even though ARHGEF3 is expressed in multiple cell types, the cis-eQTL effect is also only visible in platelets (Figure 2—figure supplement 1). Reassuringly, the trans-eQTL effect sizes in our small platelet sample (n = 216) are correlated (Pearson’s r = 0.68, p-value=5.1×10⁻¹²) with the effects from the largest whole blood trans-eQTL meta-analysis (Võsa et al., 2018) (n = 31,684) (Figure 2—figure supplement 2). The platelet specificity of the ARHGEF3 association is further supported by functional enrichment analysis with g:Profiler (Raudvere et al., 2019), which found that both the PLIER module X6.WIERENGA_STAT5A_TARGETS_DN and target genes from the gene-level analysis were strongly enriched for multiple terms related to platelet activation (Figure 2E; https://biit.cs.ut.ee/gplink/l/CY6ZukXhSq). Cluster_12953, however, was enriched for cellular response to iron ion, suggesting that ARHGEF3 might be involved in multiple independent processes (Mao et al., 2019; Serbanovic-Canic et al., 2011). Altogether, these results demonstrate how a trans-eQTL detected in whole blood can be driven by a strong signal present in only one cell type.

SLC39A8 locus is associated with zinc ion homeostasis in LPS-stimulated monocytes

One of the novel results in our analysis was a locus near the SLC39A8 gene that was associated (p-value=1.2×10⁻⁹) with a single co-expression module detected by funcExplorer (Cluster_10413) in monocytes stimulated with LPS for 24 hr (Figure 3A–C). The module consisted of five metallothionein genes (MT1A, MT1F, MT1G, MT1H, MT1M) all located in the same locus on chromosome 16 (Figure 3D). Although the trans-eQTL lead variant (rs75562818) was significantly associated with the expression of the SLC39A8 gene (Figure 3A and D), the two association signals did not colocalise and the credible sets did not overlap (Figure 3A; Figure 3—figure supplement 1), indicating that the cis-eQTL detected in naive and stimulated monocytes in our dataset is not the main effect driving the trans-eQTL signal. Furthermore, the expression of SLC39A8 was only moderately correlated with the eigengene value of Cluster_10413 (Pearson’s r = 0.27). Since SLC39A8 is strongly upregulated (log₂fold-change = 3.53) in response to LPS already at 2 hr (Figure 4A), we speculated that there might be a transient eQTL earlier in the LPS response. To test this, we downloaded the cis-eQTL summary statistics from the Kim-Hellmuth et al., 2017 study that had mapped eQTLs in monocytes stimulated with LPS for 90 min and 6 hr (Kim-Hellmuth et al., 2017). Indeed, we found that the cis-eQTL 90 min after LPS stimulation colocalised with our trans-eQTL (Figure 3A) and this signal disappeared by 6 hr after stimulation (Figure 3—figure supplement 2).

Figure 3 with 3 supplements see all

Download asset Open asset

Transient *cis*-eQTLs for *SLC39A8* is associated with the expression of seven metallothionein genes in *trans* in monocytes stimulated with LPS for 24 hr.

(A) Regional plots comparing association signals between naive (rs11097779) and transiently induced *cis*-eQTLs (rs75562818) for *SLC39A8* and *trans*-eQTL (rs75562818) for a module of five co-expressed metallothionein genes. LPS-induced *cis*-eQTL summary statistics 90 min post stimulation (n = 134) were obtained from Kim-Hellmuth et al., 2017. (B) Graph showing that the association between the module and *SLC39A8* locus is stimulation specific. As this module was detected by a cell-type-specific clustering, only a single value from the corresponding cell type is available. (C) Association between *trans*-eQTL (rs75562818) and eigengene of funcExplorer module Cluster_10413 in monocytes after 24 hr of LPS stimulation. (D) Manhattan plot of gene-level eQTL analysis for rs75562818. Dark blue points highlight the genes in module Cluster_10413. Light blue points show significantly associated genes (variant-level Benjamini-Hochberg FDR 5%) not included in the module. (E) Functional enrichment analysis of the *SLC39A8* associated module (see https://biit.cs.ut.ee/gplink/l/aohV4uKeT1 for full results). The last column combines the FDR 5% significant genes from the gene-level analysis. The table shows adjusted enrichment p-values. MTF1 - metal transcription factor 1. GO - Gene Ontology, WP - WikiPathways, REAC - Reactome Pathways, TF - transcription factor binding sites from TRANSFAC.

Figure 4

Download asset Open asset

Molecular mechanisms underlying the *SLC39A8 trans*-eQTL locus.

(A) *SLC39A8* gene expression values (log₂ intensities) across naive and stimulated monocytes. (B) Overview of the known regulatory interactions underlying the *cis* and *trans* eQTL effects at the *SLC39A8* locus. Figure adapted from Liu et al., 2013. (C) Pairwise LD (r² within 1000 Genomes European populations) between the *SLC39A8* variants highlighting missense variant (rs13107325), *trans*-eQTL (rs75562818), red blood cell distribution width (RBCDW) associated SNP (rs7692921) in our credible set and the *cis* lead variant from naive monocytes (rs11097779). LD was calculated using the LDlinkR (v.1.0.2) R package (Myers et al., 2020).

To understand the function of the SLC39A8 locus, we turned to the target genes. Gene-level analysis identified two more metallothionein genes (MT1E and MT1X) from the same locus as likely target genes (Figure 3D). Enrichment analysis with g:Profiler revealed that these genes were enriched for multiple Gene Ontology terms and pathways related to zinc ion homeostasis (Figure 3E, full results at https://biit.cs.ut.ee/gplink/l/aohV4uKeT1). Furthermore, the promoter regions of the seven genes were also enriched for the binding motif of the metal transcription factor 1 (MTF1) transcription factor (p-value=2.1×10⁻⁴, Figure 3E). Taken together, these results suggest that a transient eQTL of the SLC39A8 gene 90 min after stimulation regulates the expression of 7 zinc-binding proteins 24 hr later. Multiple lines of literature evidence support this model (Figure 4B). First, the ZIP8 protein coded by the SLC39A8 gene is a manganese and zinc ion influx transporter (Nebert and Liu, 2019). Secondly, SLC39A8 is upregulated by the NF-κB transcription factor in macrophages and monocytes in response to LPS and this upregulation leads to increased intracellular Zn²⁺ concentration (Liu et al., 2013). Third, Zn²⁺ influx increases the transcriptional activity of the metal transcription factor 1 (MTF1) (Kim et al., 2014) and metallothioneins, which act as Zn²⁺-storage proteins, are well known target genes of the MTF1 transcription factor (Laity and Andrews, 2007). Finally, SLC39A8 knockdown in mice leads to decreased expression of the metallothionein 1 (MT1) gene (Liu et al., 2013).

To see if the SLC39A8 trans-eQTL might be associated with any higher level phenotypes, we queried the GWAS Catalog database (Buniello et al., 2019) with the ten variants from the trans-eQTL 95% credible set. We found that a lead variant for red blood cell distribution width (rs7692921) was one of the variants in our credible set and in high LD (r² = 0.991) with the trans-eQTL lead variant (Figure 4C; Kichaev et al., 2019). However, neither of the eQTL variants was in LD with a known missense variant (rs13107325) in the SLC39A8 gene that has been associated with schizophrenia, Parkinson’s disease and other traits (Figure 4C; Pickrell et al., 2016).

Mediation analysis

For three of the four trans-eQTL loci discussed above (LYZ, ARHGEF3 and SLC39A8), we also detected an overlapping cis-eQTL effect on one or more cis genes. To test if the cis-eQTL effect might mediate the observed trans effect on the co-expression modules, we used mediation analysis. In all three cases, we detected a statistically significant mediation effect between the cis and trans associations (Figure 2—figure supplement 3, Figure 3—figure supplement 3, Supplementary file 4). However, in all cases, the mediation explained only a small fraction of the total genotype effect on the co-expression module. There could be multiple reasons for this. First, since co-expression module eigengene values go through multiple transformations, this might introduce additional noise and thus reduce observed mediation effect (Pierce et al., 2014). Second, if there is a temporal delay between the cis and trans effects (as observed for SLC39A8) then we would not necessarily expect to detect mediation at the same time point, even if the cis eQTL is causal for the trans eQTL effect. Finally, multiple independent causal variants in the region that are in LD with each other could bias the mediation estimates (Figure 3A).

Replication of associations in independent datasets

We first performed a literature-based replication to measure the overlap between the modules that map to the loci near IFNB1, LYZ and ARHGEF3 with the genes reported by previous studies (Table 1, Supplementary file 5). All the modules associated with the IFNB1 locus in monocytes stimulated with LPS for 24 hr (12 in total) had a significant overlap (one-sided Fisher’s exact test, Bonferroni adjusted p-value<0.05) with the trans genes reported by Quach et al., 2016. At the LYZ locus, we compared the 30 modules detected in unstimulated monocytes with the trans genes reported by Rotival et al., 2011 and Rakitsch and Stegle, 2016. In the case of Rotival et al., 23 out of the 30 modules from our study had significant overlap with the 33 trans genes reported by Rotival et al. In contrast, only two of our modules had a significant overlap with the genes reported by Rakitsch and Stegle. Interestingly, only one trans associated gene was shared between Rotival et al. and (Rakitsch and Stegle, 2016). We also evaluated the overlap for the three modules associated with the ARHGEF3 locus and the 840 genes reported in the eQTLGen study (Võsa et al., 2018). Only one module, IC68, did not have a significant overlap but this could be due to its large size and the fuzzy definition of the ICA module membership. For ARHGEF3 we also compared the modules with the 163 trans genes reported by Nath et al., 2017 where only one module (X6.WIERENGA_STAT5A_TARGETS_DN) had a significant overlap.

To further assess the replication of identified trans-eQTLs (after filtering for Benjamini-Yekutieli FDR<10%), we compared associated modules in unstimulated monocytes, neutrophils and T-cells to matched cell types from three independent studies for which we had access to individual-level data: BLUEPRINT (Chen et al., 2016), ImmVar (Raj et al., 2014) and Quach et al., 2016. We analysed 9 of the 38 trans-eQTLs that were associated with 40 different modules. We compared the overlap of gene modules and corresponding significant gene-level results (variant-level FDR <5%) from these three independent studies. Unfortunately, we were not able to replicate any additional associations. Interestingly, even though the LYZ and YEATS4 cis-eQTL effect was present in all three studies, the trans-eQTL did not replicate in any of them. Since this trans-eQTL was previously detected by Rotival et al, this suggests that in addition to small sample sizes of the replication studies, there might be biological differences in how the samples were collected.

Discussion

Given that trans-eQTLs have been more difficult to replicate between studies and false positive associations can easily occur due to technical issues (Dahl et al., 2019; Saha and Battle, 2018), it is increasingly important to effectively summarise and prioritise associations for follow-up analyses and experiments. We found that aggregation of credible sets of eigengene profiles from multiple co-expression methods (Figure 1—figure supplement 2) successfully reduced the number of independent associations, but this still retained 243 loci that we needed to evaluate. To further prioritise associations, we used gene set and transcription factor motif enrichment analysis of the trans-eQTL target genes. Although motif analysis is often underpowered, it can provide directly testable hypotheses about the trans-eQTL mechanism such as the MTF1 transcription factor that we identified at the SLC39A8 locus. Similar approaches have also been successfully used to characterise trans-eQTLs involving IRF1 and IRF2 transcription factors (Brandt et al., 2020; Fairfax et al., 2014).

A major limitation of co-expression-based approach for trans-eQTL mapping is that many true co-expression modules can remain undetected by various co-expression analysis methods (Way et al., 2020). We sought to overcome this by aggregating results across five complementary co-expression methods. We found that while all methods were able to discover strong co-expression module trans-eQTLs such as those underlying the IFNB1 (Figure 1—figure supplement 3) and LYZ (Figure 1—figure supplement 4) associations, most co-expression module trans-eQTLs were only detected by a subset of the analysis methods. For example, the ARHGEF3 association was detected by three of the five methods (Figure 2B) and SLC39A8 co-expression module was found only by funcExplorer and only when samples from LPS-stimulated monocytes were analysed separately (Figure 3B). Since this module consisted of only seven strongly co-expressed genes, other methods were probably not well tuned to find it. Moreover, if the trans-eQTL locus controls a single or a small number of genes then co-expression-based approaches are probably not well suited to detect such associations and gene-level analysis is still required.

To maximise module discovery, we aggregated results from five co-expression analysis methods and two partitions of the same underlying data (integrated versus separate). While this reduced the number of tests compared to a standard gene-level analysis, it introduced an additional layer of complexity, because the same gene expression values contributed to multiple different co-expression modules and analytical settings. As a result, it is unclear how well calibrated our false discovery rate estimates are. Thus, we decided to first use a relaxed nominal significance threshold of p-value<5×10⁻⁸, assuming that most of those associations were likely to be false positives. In our subsequent follow-up analyses, we only focused on four loci that we could either replicate in independent datasets (IFNB1, LYZ, ARHGEF3) or find significant support from the literature (SLC39A8).

Since eQTL datasets from purified cell types are still relatively small and single-cell eQTL datasets are even smaller (van der Wijst et al., 2018), it is tempting to perform trans-eQTL analysis on whole tissue datasets such as the brain or whole blood (Võsa et al., 2018). However, it remains unclear what fraction of cell type and condition-specific trans-eQTLs can be detected in whole tissue datasets collected from healthy donors. Although we were able to replicate the ARHGEF3 association in the eQTLGen whole blood meta-analysis, because our fine mapped lead variant happened to be one of the 10,317 variants tested in eQTLGen, systematic replication requires genome-wide summary statistics that are currently lacking for trans-eQTL analyses. Secondly, tissue datasets can be biased by cell type composition effects. These can lead to spurious trans-eQTL signals, because genetic variants associated with cell type composition changes would appear as trans-eQTLs for cell-type-specific genes (Võsa et al., 2018). Furthermore, multiple studies have demonstrated that the co-expression signals in tissues are also largely driven by cell type composition effects (Farahbod and Pavlidis, 2019; Parsana et al., 2019; Schubert et al., 2020). Thus, even though PLIER detected the ARHGEF3 trans-eQTL in whole blood, this could have been at least partially driven by the change in platelet proportion between individuals (Mao et al., 2019). Our analysis in purified cell types enabled us to verify that this was a truly platelet-specific genetic association.

Although both in the case of ARHGEF3 and SLC39A8, we detected significant mediation between the expression level of the cis gene and the observed trans-eQTL effect, it explained only a small proportion of the total trans effect. Furthermore, there was only a modest correlation (Pearson’s r between 0.07 and 0.33) between the cis gene expression and the corresponding trans co-expression module expression. In case of SLC39A8 there seemed to be a temporal delay with the cis-eQTL being active early in LPS response and trans-eQTL appearing much later after proposed accumulation of the ZIP8 protein and increase in intracellular zinc concentration. Temporal delay has similarly been reported for the trans-eQTLs at the INFB1 (Fairfax et al., 2014) and IRF1 (Brandt et al., 2020) loci. This suggests that if cis and trans effects are separated from each other either in time (early versus late response) or space (different cell types that interact with each other), then this might limit the power of methods that rely on genetically predicted gene expression levels to identify regulatory interactions (Liu et al., 2018; Luijk et al., 2018; Wheeler et al., 2019) and infer causal models. This can also have a negative impact on mediation analysis (Battle et al., 2014; Chick et al., 2016; Yang et al., 2019), which seeks to estimate the proportion of trans-eQTL variance explained by the expression level of the cis gene. Altogether, our results indicate that limiting trans-eQTL analysis to missense variants and to variants that have been detected as cis-eQTLs in the same cell type might miss some true associations, because the cis effect might be active in some other, yet unprofiled, context.

We have performed a large-scale trans-eQTL analysis in six blood cell types and three stimulated conditions. We demonstrate that co-expression module detection combined with gene set enrichment analysis can help to identify interpretable trans-eQTLs, but these results depend on which co-expression method is chosen for analysis and how the input data are partitioned beforehand. We perform in-depth characterisation of two cell type specific trans-eQTL loci: platelet-specific trans-eQTL near the ARHGEF3 gene and monocyte-specific associations near the SLC39A8 locus. In both cases, the co-expression modules were enriched for clearly interpretable Gene Ontology terms and pathways, which directly guided literature review and more detailed analyses. We believe that applying co-expression and gene set enrichment based approaches to larger eQTL datasets has the power to detect many more additional associations while simultaneously helping to prioritise trans-eQTLs for detailed experimental or computational characterisation. A particularly promising avenue would be treating co-expression modules as complex traits for which multiple independent genetic associations could be mapped. These associations could subsequently be used in Mendelian randomisation analyses to infer causal intermediate phenotype for complex diseases (Evans and Davey Smith, 2015).

Cell type	Fairfax_2012	Fairfax_2014	Naranbhai_2015	Kasela_2017	CEDAR
B cell	281	-	-	-	266
T cell CD4+	-	-	-	279	294
T cell CD8+	-	-	-	267	281
Neutrophil	-	-	93	-	291
Platelet	-	-	-	-	226
Monocyte naive	-	420	-	-	290
Monocyte LPS 2 hr	-	255	-	-	-
Monocyte LPS 24 hr	-	325	-	-	-
Monocyte IFNγ 24 hr	-	370	-	-	-

Share this article

Cite this article

Data, analysis workflow and results.

Literature-based replication of trans-eQTL loci near IFNB1, LYZ and ARHGEF3 genes.

Platelet-specific trans-eQTL at the ARHGEF3 locus.

Transient cis-eQTLs for SLC39A8 is associated with the expression of seven metallothionein genes in trans in monocytes stimulated with LPS for 24 hr.

Molecular mechanisms underlying the SLC39A8 trans-eQTL locus.

Number of samples included in the analysis from each study and each cell type.

Author details

Liis Kolberg

Contribution

For correspondence

Competing interests

Nurlan Kerimov

Contribution

Competing interests

Hedi Peterson

Contribution

Competing interests

Kaur Alasoo

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading