Main

The use of menopausal hormone therapy (MHT) has been consistently associated with a reduced risk of developing colorectal cancer (CRC; Calle et al, 1995; Fernandez et al, 1998; Grodstein et al, 1998, 1999; Anderson et al, 2004; Campbell et al, 2007; Johnson et al, 2009; Rennert et al, 2009; Green et al, 2012; Lin et al, 2012). According to a recently published meta-analysis, the relative risk (RR) of CRC was 0.74 (95% confidence interval (CI) 0.68–0.81) for ever use of oestrogen plus progestogen (E+P) therapy and 0.79 (95% CI 0.69–0.91) for ever use of oestrogen-only (E-only; Lin et al, 2012). Compared to placebo, menopausal women randomised to combined E+P hormone therapy in the Women’s Health Initiative (WHI) also had a lower risk of CRC (Chlebowski et al, 2004), although their cancers tended to be of higher stage with poorer prognosis (Simon et al, 2012). However, randomisation to conjugated E-only in the WHI trial was not associated with risk of CRC (Anderson et al, 2004).

The underlying mechanisms of how MHT use influences colon carcinogenesis are largely unknown. Insight into potential biological pathways could be gained by investigating genetic modifiers of CRC risk associated with use of MHT. Furthermore, certain loci associated with susceptibility for CRC may only be evident in presence or absence of a specific environmental factor such as use of MHT. Thus, studies that specifically examine the association of MHT use with CRC risk in the context of varying genetic backgrounds are needed. Previous studies that have investigated gene–environment (G × E) interactions between single nucleotide polymorphisms (SNPs), MHT use and CRC risk have been largely based on candidate genes, which encompass only limited genetic variance (Lin et al, 2010, 2011; Rudolph et al, 2011; Slattery et al, 2011). A genome-wide scan to examine G × E interactions is a crucial next step to comprehensively examine additional variants and identify novel interactions with MHT. We therefore carried out a genome-wide association analysis to assess G × E interactions with use of MHT, using recently developed statistical methods and data from several studies comprising in total 5419 CRC cases and 5416 controls.

Materials and Methods

Study participants

Our overall genome-wide association study (GWAS) design has been described previously (Hutter et al, 2012; Peters et al, 2013). In brief, this analysis is based on 10 studies (a case–control study from the Colon Cancer Family Registry (CCFR), and nine studies from the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO)). Study-specific details are described in Supplementary Methods. All cases were defined as colorectal adenocarcinoma, and confirmed by medical records, pathology reports or death certificate. All participants provided written informed consent, and studies were approved by their respective institutional review boards.

Harmonisation of environmental data

Information on basic demographics and environmental risk factors was collected by using in-person interviews and/or structured questionnaires, as detailed previously (The Women’s Health Initiative Study Group, 1998; Colditz and Hankinson, 2005; Newcomb et al, 2007; Hoffmeister et al, 2009). Postmenopausal status was defined by using either (i) study-derived menopausal status, if available; (ii) self-reported menopausal status, if study derived was not available; or (iii) age 55, if study derived and self-report were not available. MHT use was considered either as any MHT use, E-only use or E+P use at reference time. Non-users (of any MHT) at reference time were used as reference group. The reference time for nested case–control studies was time of enrolment into the cohort (VITAL, WHI and PLCO) or blood draw (NHS). For case–control studies, the reference time was at diagnosis and 2 years prior to diagnosis for CCFR and DALS. The harmonisation procedure is described in more detail in the Supplementary Methods.

Genotyping, quality assurance/quality control and imputation

All analyses were based on genotyped data generated from genome-wide association scans and imputation to HapMap II. Genotyped SNPs were excluded if they were triallelic, not assigned an rs number, or were reported or observed as not performing consistently across platforms. Furthermore, genotyped SNPs were excluded based on call rate (<98%), lack of Hardy-Weinberg equilibrium in controls (HWE, P<1 × 10−4) and minor allele frequency (MAF<5%). Further details on DNA extraction, genotyping and quality assurance/quality control for each of the involved studies can be found in the Supplementary Methods and in Peters et al (2012).

All autosomal SNPs of all studies were imputed to the CEU population in HapMap II release 24, with the exception of OFCCR, which was imputed to HapMap II release 22. CCFR set 1 was imputed using IMPUTE (Marchini et al, 2007), OFCCR was imputed using BEAGLE (Browning and Browning, 2007), and all other studies were imputed using MACH (Li et al, 2010). Imputed data were merged with genotype data such that genotype data were used if a SNP had both types of data, unless there was a difference in terms of reference allele frequency (>0.1) or position (>100 base pairs), in which case imputed data were used. SNPs were restricted based on per study MAF 5% of samples and per study imputation accuracy (r2>0.3). After imputation and QC analyses, a total of about 2.7 M SNPs were used in the analysis. The inflation factor λ (a measurement of the overdispersion of the test statistics from the marginal association tests) obtained from the fixed-effect meta-analysis of GWAS scans was 1.029, 1.027 and 1.027 for the samples used for any MHT use, E+P and E-only analyses, respectively, indicating little evidence of population substructure (Quantile–quantile (Q–Q) plots in Supplementary Figure 1).

Statistical Methods

Models were adjusted for age at reference time, centre and the first three principal components from EIGENSTRAT (Price et al, 2006) to account for population substructure. Each directly genotyped SNP was coded as 0, 1 or 2 copies of the variant allele. For imputed SNPs, we used the expected number of copies of the variant allele (the ‘dosage’), which has been shown to give unbiased test statistics (Jiao et al, 2011). SNPs are treated as continuous variables (i.e., assuming log-additive effects). Each study was analysed separately using logistic regression, and study-specific results were combined using fixed-effect meta-analysis to obtain summary odds ratios (ORs) and 95% CIs across studies. We calculated the heterogeneity P-values by Woolf’s test (Woolf, 1955). Q–Q plots were assessed to determine whether the distribution of the P-values was consistent with the null distribution (except for the extreme tail). To test for multiplicative interactions between SNPs and environmental risk factors, we primarily used the empirical Bayes (EB) test (Mukherjee and Chatterjee, 2008; Mukherjee et al, 2008, 2010) given that this test can be more powerful than the conventional case–control logistic regression analysis while maintaining the desired type I error. It calculates the interaction log OR that corresponds to a weighted average of the case-only and case–control estimators. Thus, the method makes use of the greater precision of the case-only estimator by simultaneously reducing the chance of generating biased estimates due to violations of the assumption of gene–environment independence in controls (Mukherjee and Chatterjee, 2008). We modelled the SNP by environment (G × E) interaction by the product of the SNP and the dichotomised environmental variable. A two-sided P-value <5 × 10−8 was considered genome-wide significant, yielding a genome-wide significance level of 0.05 (Risch and Merikangas, 1996; International HapMap Consortium, 2005; Wellcome Trust Case Control Consortium, 2007; Dudbridge and Gusnanto, 2008; Hoggart et al, 2008; Pe’er et al, 2008). We secondarily used the multiplicative Cocktail method (Hsu et al, 2012), to evaluate how robust the findings were under the use of an alternative test to evaluate a multiplicative interaction (Supplementary Methods).

To estimate the effects of the environment variable stratified by genotype, we fit the following model: logit(d)=b0+b1e+c1p1+c2p2+β1p1e+β2p2e+covariates, where p1 and p2 are the imputation posterior probabilities for genotype A/B, B/B. Then the stratified effects of environment variable can be estimated as , and for genotypes A/A, A/B and B/B, respectively. The s.e. are estimated by using the standard formula for a linear combination of two parameters based on the covariance matrix of these parameters.

We estimated the CRC incidence rates associated with E+P use among individuals with each genotype of SNPs, which would provide more direct interpretation for G × E interaction effects on public health. We based the estimation of the incidence rate on the Surveillance, Epidemiology, and End Results (SEER) age-adjusted CRC incidence rate (denoted by ‘I’) 1992–2010 among the White population, which is 59.5 per 100 000 women per year. By using I, we estimated the reference incidence rate of CRC (denoted by ‘Iref’) using the population attributable risk (PAR), which is estimated by one minus the average of the inverse of estimated risk score exp(−X β) in cases (Bruzzi et al, 1985). Specifically, the formula for computing the PAR estimator is:

where Yj=1 for case and 0 for control; Xj=covariates; β=estimated regression coefficients in the logistic regression analysis. We can then estimate the reference incidence rate of CRC for X=0 by Iref =(1−PAR) × I (Gail et al, 1989). On the basis of this reference incidence rate of CRC (i.e., Iref), we further calculated the CRC incidence rate for each subgroup defined by genotypes of the SNP according to E+P use or non-use by multiplying the Iref with each corresponding OR estimate. We calculated the 95% CIs using a resampling technique with 1000 weighted bootstraps. Since SEER incidence rates are based on a large number of individuals, the uncertainty of ‘I’ is negligible compared to the uncertainty from the PAR estimate, and hence was not considered in the calculation of 95% CIs.

Methods used for functional follow-up on promising loci are described in the Supplementary Methods.

Results

Our study population comprised 10 835 menopausal women: 5419 cases and 5416 controls. For all 10 835 women information on use of any MHT was available and information on the type of MHT preparation was available for 9004 participants. At reference time, 3384 women used any MHT (31.2%), 1283 (11.8%) used E+P and 1606 (14.8%) used E-only (Supplementary Table 1). MHT use was inversely associated with the risk of CRC. Compared to non-users of any MHT at reference time, the OR for CRC was 0.70 (95% CI 0.62–0.79, P=1.9 × 10−9, P for heterogeneity (phet)=0.14) for women using any MHT preparation, 0.76 (95% CI 0.64–0.90, P=0.0015, phet=0.049) for women who used E+P and 0.71 (95% CI 0.61–0.84, P=7.1 × 10−5, phet=0.017) for women who used E-only.

The EB test identified a significant interaction of the variant rs964293 with E+P, with OR=0.61 (95% CI 0.52–0.72, P=4.8 × 10−9). This interaction showed borderline evidence of heterogeneity across studies (phet=0.044; Figure 1A). The same variant (rs964293) showed an interaction with E+P use on CRC risk in our secondary analysis, using the Cocktail test (OR=0.64, 95% CI 0.52–0.78, P=1.2 × 10−5, alpha threshold for significance in the group where the variant is assigned according to its rank=3.1 × 10−4). Table 1 summarises the results of the interaction analyses of rs964293 with MHT use. When testing for interaction between rs964293 and use of any MHT or E-only, no significant results were observed (Table 1).

Figure 1
figure 1

Forest plot for meta-analysis of the interaction between SNP and oestrogen plus progestogen use, using the empirical Bayes (A and C) and case–control logistic regression method (B and D) for rs964293 (A and B) and rs6023015 (C and D). DALS and PLCO studies are not plotted because they do not have information on the type of hormone compound.

Table 1 Result of the interaction tests for any MHT use, E+P use and E-only use and rs964293

Table 2 presents the association of the different strata of MHT by rs964293 with CRC risk. The OR of CRC for women taking E+P compared with women not using MHT is 0.96 (95% CI 0.61–1.50, P=0.84), 0.61 (95% CI 0.39–0.95, P=0.03) and 0.40 (95% CI 0.22–0.73, P=0.0026) for women with C/C, A/C and A/A genotype of rs964293, respectively (Table 2). The per study ORs of E+P on CRC stratified by rs964293 genotype are shown in Figure 2A–C. No significant heterogeneity between study wise estimates was observed (phet=0.3, 0.66 and 0.23, respectively).

Table 2 Associations with colorectal cancer risk stratified by E+P use and genotypes of rs964293
Figure 2
figure 2

Forest plot for meta-analysis of the marginal association of oestrogen plus progestogen with colorectal cancer risk in strata defined by zero, one or two minor alleles of rs964293 (A, B and C, respectively) and rs6023015 (D, E and F, respectively).

The variant rs964293 is located in an intergenic region 28 kb upstream of CYP24A1 on chromosome 20q13.2. Supplementary Figure 2 shows the interaction P-value (EB test) of rs964293 as well as SNPs surrounding rs964293 in region 20q13.2. The variant rs964293 was directly genotyped in five of the study sets included in the analysis and imputation accuracy was high in the remaining six study sets (r2 ranging from 0.82 to 0.99). The MAF ranged from 0.34 to 0.38 in the 11 study sets.

The SNP rs964293 is in moderate LD (r2=0.61 in 1000 Genomes pilot CEU) with a strong functional candidate, rs6023015. Our in silico functional analysis demonstrates that this SNP is located in a region with strong DNase hypersensitivity and histone methylation patterns consistent with enhancer activity. Supplementary Figure 3 shows how these two patterns of DNase hypersensitivity and histone methylation are stronger in some cancerous cell lines (including the CRC cell lines CACO2 and HCT-116) relative to non-cancerous cell lines. ChIP-seq experiments indicate that several transcription factors bind to this region (Supplementary Figure 4). In data from 169 tissue samples of the transverse colon available through the GTEx portal (Ardlie et al, 2015), both variants, rs964293 and rs6023015, are expression quantitative trait loci (eQTL) for CYP24A1 expression (P=0.037 and 0.018, respectively). Expression of CYP24A1 varied slightly stronger by genotypes of another SNP rs2256649 in LD with rs964293 (r2=0.60 in 1000 Genomes pilot CEU), with P=0.0069. Compared to homozygous carriers of the wild-type allele, expression of CYP24A1 was higher in homozygous carriers of the minor allele for all three SNPs. Supplementary Figure 5 displays the distribution of the rank normalised gene expression of CYP24A1 by genotypes of rs964293, rs6023015 and rs2256649.

We estimated absolute risks for CRC according to the use of E+P and genotypes of rs964293 and rs6023015 (Table 3). Compared to not-using E+P and carrying the wild-type CC genotype of rs964293, using E+P was associated with 30.6 fewer cases of CRC within a year among carriers of the AA genotype. Likewise, carrying the CC genotype of rs6023015 and using E+P was associated with 32.4 fewer cases of CRC within a year when compared to carrying the GG genotype of rs6023015 and not-using E+P.

Table 3 Estimated incidence rates (95% CI) per 100 000 individuals stratified by genotypes of rs964293 and rs6023015 and use of E+P

Aside from results reported above, we did not observe any additional genome-wide significant G × E interactions with use of E+P and none for any MHT and E-only use.

Discussion

In this study, including over 10 000 women, we evaluated genome-wide G × E interactions with the use of any MHT preparation as well as separately with use of E+P and of E-only. Employing the powerful yet robust EB method, we found evidence of an interaction between the variant rs964293 at 20q13.2/CYP24A1 and use of E+P.

Although most prior studies of G × E interactions in CRC have utilised a candidate gene approach (Slattery et al, 2010; Zhong et al, 2013, Figueiredo et al (2011), examined G × E interactions using a genome-wide scan and 14 environmental variables, including MHT use, and did not observe any genome-wide significant associations within the CCFR and OFCCR study. However, that study had limited power to detect any significant G × MHT interactions since it included only 572 highly selected CRC cases of menopausal women (age <50 years or with a family history) and did not assess the possible differential interaction with use of MHT according to type of preparation.

Our finding of a genome-wide significant interaction between rs964293 and use of E+P has convincing biologic plausibility. This SNP is located in an intergenic region 28 kb upstream of CYP24A1. As it is not in strong LD with any coding variants, we hypothesised that the underlying causal variant(s) exerts its effect through a regulatory mechanism. We found that rs964293 is in moderate LD with rs6023015 (r2>0.61 in CEU), which lies in a putative enhancer region for CYP24A1. The rs6023015 SNP was imputed in all 11 studies with high accuracy (mean r2=0.95, range 0.87–1.00), and interaction and association of E+P across strata defined by number of minor alleles of rs6023015 paralleled that of rs964293 (Supplementary Table 4 and Figure 2D–F). Also the estimated ORs for interaction of rs6023015 with use of E+P were comparable to those found for rs964293 (Figure 1C and D), but P-values observed with the EB test were less extreme (P=2.8 × 10−6). Both SNPs are eQTL for CYP24A1 expression in 169 normal tissue samples of the transverse colon. CYP24A1 expression was increased in homozygous carriers of the minor allele compared to homozygous carriers of the wild-type allele (Supplementary Figure 5). Details of the in silico functional analyses of rs6023015 are provided in the Supplementary Methods.

CYP24A1 codes for a protein that belongs to the cytochrome P450 family. These mitochondrial proteins are monooxygenases that catalyse several reactions involved in drug metabolism and synthesis of cholesterol, steroids, sex hormones and other lipids. Specifically, CYP24A1 plays a key role in the metabolism of the steroid hormone vitamin D by degrading its active form. CYP24A1 is highly expressed in malignant colon tumours as compared with healthy colonic epithelium at both the mRNA (Bareis et al, 2001) and protein level (Matusiak and Benya, 2007), and some variants in CYP24A1 have been associated with CRC risk in candidate gene studies (Dong et al, 2009). Vitamin D refers to a group of lipid soluble cholesterol-based compounds that, in contrast with other vitamins, can be synthesised endogenously. The active form of vitamin D exerts several functions relevant to regulation of tumour pathogenesis and progression, such as the activation of apoptotic pathways, anti-proliferative effects and angiogenesis inhibition (Deeb et al, 2007). Substantial experimental and epidemiological data support an inverse association with risk of CRC (Chan and Giovannucci, 2010). Moreover, there is evidence linking the vitamin D pathway and sex hormones in CRC aetiology. A secondary analysis of data from the WHI suggested that the association between low-dose vitamin D plus calcium supplementation and CRC risk is modified by MHT (E-only and E+P; Ding et al, 2008). Women randomised to receive calcium and 400 international units (IU) vitamin D3 supplementation in the placebo arms of the factorial oestrogen therapy trials were at suggestively decreased risk of developing CRC (HR=0.71, 95% CI 0.46–1.09) compared to women randomised to receive calcium, vitamin D3 and concurrent MHT (HR=1.30 (95% CI 0.83–2.03), P for interaction=0.05; Ding and Giovannucci, 2009). Furthermore, the difference in the association of calcium and vitamin D supplementation with CRC between users and non-users of MHT appeared to be more evident for E+P than for E-only.

The results observed here suggest that CYP24A1 may be regulated by sex hormones exposure and that the modifying effect observed with rs964293 may be caused by a disruption of the increased CYP24A1 expression induced by sex hormones exposure. Alternatively, given that we found an interaction only with E+P exposure and not with E-only intake, CYP24A1 may be a metabolising enzyme for progestogens but not oestrogen. Taken together, this evidence provides a strong rationale for additional functional studies to determine the role of rs964293/rs6023015 in CYP24A1, particularly in relation to exposure to exogenous E+P in colorectal carcinogenesis.

Our study has several strengths. First, our large sample size facilitated the detection of genome-wide G × E interactions even accounting for the stringent threshold for significance necessary for control of type I error and the generally small magnitude of effect modification that can be reasonably expected. We followed a ‘whole set’ approach rather than a ‘discovery-validation’ strategy to maximise efficiency (Skol et al, 2006). Second, as previously described (Hutter et al, 2012), we have carefully harmonised data on a range of environmental variables, including MHT use across 10 studies. Third, two different methods (EB and Cocktail) to evaluate G × E interaction identified the same variant as having a significant G × E interaction, providing greater confidence that this association is not a false positive finding (although the EB test can be part of the Cocktail test itself, this was not the case for rs964293). The magnitude of the interaction between rs964293 and the use of E+P yielded by the traditional case–control logistic regression analysis was similar to the one found by the EB test (OR=0.64 and 0.61, respectively; Figure 1 and Table 1), though it did not reach the threshold of genome-wide significance (P=1.2 × 10−5). We observed some degree of heterogeneity among studies for the interaction of rs964293 with E+P (phet=0.044). Figure 1 shows that the OR for interaction is 1 for all the studies but the VITAL, which constitutes about 2.5% of the total sample size. We do not consider this heterogeneity as a strong limitation to the results of the study.

The use of E+P for the purpose of chemoprevention is not routinely recommended for the general population due to concerns about the potential adverse consequences of long-term exposure. However, our results suggest the possibility that the benefit of E+P may be enhanced in women carrying genetic variants at rs964293 (Table 3). In conjunction with other strategies for risk-stratification and after its evaluation across other relevant clinical outcomes, this finding could be exploited to more specifically identify individuals for whom the benefits of chemoprevention may outweigh potential risks (Collins, 2015).

In summary, we have identified a CYP24A1-related variant as effect modifier of CRC risk associated with use of E+P using a genome-wide approach. This finding offers important insight into the role of E+P and its downstream pathways, including its potential interaction with vitamin D in the etiopathogenesis of CRC and supports the need for further studies to confirm the involvement of CYP24A1 in modulating CRC risk.