Introduction

Most colorectal cancers (CRCs) probably arise from adenomatous polyps. There is a clinically important subset of patients who are found to have multiple (five or more) colorectal adenomas, a phenotype that is suggestive of an inherited genetic predisposition. Identifying the genetic basis of these patients’ tumours is clinically informative in terms of determining the natural history of their disease, the cancer risks in relatives and the optimal means of surveillance and early treatment.

Some patients with ≥5 adenomas have one of the rare Mendelian polyposis syndromes, such as classical or attenuated familial adenomatous polyposis (FAP),1 MUTYH-associated polyposis (MAP)2 or polymerase proofreading-associated polyposis (PPAP).3 The number of adenomas is highly variable in each syndrome, and for FAP, the underlying causes of this phenotypic variation have been studied in some detail. FAP is caused by germline APC variants and the position of the variant explains some of the variation in polyp numbers. For example, although FAP patients usually develop hundreds or thousands of adenomas, a few individuals with variants in proximal, distal or alternatively spliced regions of the gene have so-called attenuated disease, with tens or fewer tumours.4 The position of the germline APC variants also influences the ‘classical’ (>100 adenomas) FAP phenotype: for example, severe colonic polyposis is associated with germline variants near codon 1309.5 However, several studies have also addressed the possibility that modifier genes unlinked to APC influence the FAP phenotype.6, 7, 8

Only a minority of individuals with 5–20 colorectal adenomas test positive for any pathogenic germline variant9 and the set of patients with multiple colorectal adenomas is genetically heterogeneous. Recently, genome-wide association studies (GWAS) have identified haplotype-tagging single-nucleotide polymorphisms (SNPs) that are associated with CRC risk in the general population.10 Some of these variants are close to loci (eg, GREM1, BMP2, BMP4, POLD3 and MYC) that are functionally related to the genes involved in the Mendelian polyposis syndromes. Each SNP has a modest effect size (typically 10–20% increased risk per allele). By testing individuals with small numbers of colorectal adenomas (median=1, interquartile range=1–2), but no history of CRC, we previously showed that some SNPs predispose to CRC through the development of adenomas.11

In this study, we have examined whether the multiple adenoma phenotype can be explained in some cases by common risk alleles of individually modest effects. We have also assessed whether the common CRC predisposition SNPs influence the severity of colonic polyposis in FAP.

Materials and Methods

Genomic DNA was extracted from peripheral blood of 178 unrelated individuals with multiple (5–100) adenomatous polyps (median=10, IQR=7–17) at first colonoscopy based on routine histopathological reports (Supplementary Figure 1). Seventy-one patients had CRC at presentation or subsequently (Table 1), or had a family history of CRC. No case had any polyp of hamartomatous morphology. A number had several serrated polyps, but in all cases, the classical adenoma was the majority morphology. Patients were from throughout the United Kingdom and all were of northern European ancestry. Patients were not selected on the basis of age, adenoma size or specific histology. Median age of presentation was 55 years (IQR=48–64) and 34% of patients were female. Adenoma number was not associated with increased patient age (P=0.97) or with gender (P=0.32). FAP, MAP and PPAP had been excluded in each case using direct sequencing of (i) the regions of APC associated with attenuated disease (and the entire gene for those with close to 100 adenomas), (ii) the common northern European MUTYH variants p.Tyr179Cys and p.Gly396Asp12 and (iii) the exonuclease domains of POLE and POLD1.3 One hundred and forty-two patients (79 families) with FAP and pathogenic germline APC variants were also studied, as were 30 cases with a classical FAP phenotype and no identified disease-causing variant in APC, MUYTH, POLE or POLD1. Data on age, sex and number of polyps at colectomy from the FAP patients were obtained from the St Mark’s Hospital Polyposis Registry, Harrow, UK.

Table 1 Summary risk score statistics for each group

For the multiple adenoma and FAP patients, we used KASPar assays (KBiosciences, Hertfordshire, UK) to genotype 18 published CRC SNPs (rs6691170 chr1.hg19:g.220112069G>T, rs6687758 chr1.hg19:g.220231571A>G, rs10936599 chr3.hg19:g.170974795 C>T, rs16892766 chr8.hg19:g.117699864 A>C, rs6983267 chr8.hg19:g.128482487 G>T, rs10795668 chr10.hg19:g.8741225 G>A, rs3802842 chr11.hg19:g.110676919 A>C, rs7136702 chr12.hg19:g.49166483 C>T, rs11169552 chr12.hg19:g.49441930 C>T, rs4444235 chr14.hg19:g.53480669 T>C, rs1957636 chr14.hg19:g.53629768 G>A, rs4779584 chr15.hg19:g.30782048 C>T, rs9929218 chr16.hg19:g.67378447 G>A, rs4939827 chr18.hg19:g.44707461 T>C, rs10411210 chr19.hg19:g.38224140 C>T, rs961253 chr20.hg19:g.6352281 C>A, rs4813802 chr20.hg19:g.6647595 T>G and rs4925386 chr20.hg19:g.60354439 C>T). All of these assays had previously been validated as showing >98% genotype concordance with Illumina SNP array genotypes as part of the CRC GWAS. Eight samples were excluded for having individual SNP genotype call rates of <94%. The overall genotyping call rate was >99.5%. None of the markers showed significant deviation from Hardy–Weinberg equilibrium (P>0.05).

As controls, to avoid overlap with the published CRC GWAS, we extracted SNP genotypes for the 18 SNPs from cancer-free individuals within the publicly available Colorectal Cancer Family Registry (CFR) (http://coloncfr.org/) and CGEMS (http://dceg.cancer.gov/research/how-we-study/genomic-studies/cgems-summary) data sets that had been genotyped using Illumina genome-wide tagSNP arrays. To control for population stratification, we conducted principal component analysis to ensure that the CFR samples used clustered with UK population individuals of northern European ancestry (Supplementary Figure 2).

We calculated for each individual a SNP risk score defined by

where β is the ln(odds ratio (OR)) (>0) for each SNP derived from unconditional logistic regression analysis in CFR cases and CGEMS controls (Supplementary Tables 1 and 2 and Figure 1), and n is the number of risk alleles (0–2) carried by that individual. This resulted in a theoretical range of risk scores between 0 and 4.78, where the minimum and maximum score would represent individuals homozygous for all the protective and risk alleles respectively.

Figure 1
figure 1

Histograms comparing the risk score distribution in multiple adenoma cases (1) and controls (0). A simple count of an individual’s total number of high-risk CRC alleles, without taking differing SNP effect sizes into account, yielded similar results.

Results and discussion

The SNP risk score was approximately normally distributed in both adenoma cases and controls (Shapiro–Wilk test, P>0.13). We initially wondered whether any of the cases had an outlying number of CRC SNP risk alleles. The maximum number of risk alleles carried by any patient was 26/36, equivalent to a risk score of 3.53. Four cases and three controls had risk scores of over 3.40. These data suggested that the risk score had limited use as a predictor of multiple adenomas on an individual basis. This result was not unexpected, given that the known CRC SNPs account for only a minority of adenoma risk.11

As a more general test of the hypothesis that common risk alleles contribute to the multiple adenoma phenotype, we compared the SNP risk score distributions in multiple adenoma cases and controls. There was a significantly higher risk score (P=5.8 × 10−7, t-test; Table 1 and Figure 1) in multiple adenoma cases (mean=2.44, SD=0.40) than controls (mean=2.27, SD=0.42). The association remained present (P=0.0011) when the analysis was restricted to the 107 cases with no known personal or family history of CRC (mean score=2.41, SD=0.40). Of the 178 multiple adenoma cases, 103 had ≥10 adenomas and 75 had 5–9 adenomas. Both case groups individually had significantly higher risk scores than controls (Table 1 and Figure 1), but in the group with ≥10 adenomas, the mean risk score was higher (2.48) than that in the patients with 5–9 adenomas (2.39). Ordered logistic regression analysis on the three groups (10+ adenomas vs 5–9 adenomas vs population controls) showed that the risk score was correlated with adenoma numbers (P=9.5 × 10−7).

The 18 CRC SNPs explained 4.3% of the variance in the risk of multiple adenomas. Using multivariate logistic regression to assess specific polymorphisms, we found that three individual SNPs, rs6983267 (OR=1.54, 95% CI: 1.21–1.97, P=3.74 × 10−4), rs10795668 (OR=0.64, 95% CI: 0.49–0.84, P=0.00148) and rs3802842 (OR=1.31, 95% CI: 1.03–1.69, P=0.0278) were nominally significantly associated with the multiple adenoma phenotype. These three SNPs alone explained 3% of the variance in multiple adenoma risk.

Of note, Hes et al13 recently reported associations between individual CRC SNPs and adenoma risk in a similar data set to ours, although no risk score was calculated. Hes et al13 found rs3802842 to be the SNP most significantly associated with adenoma risk, with additional evidence for associations with rs6983267 and with a further SNP, rs4779584. In our previous analysis of patients with smaller numbers of adenomas, rs3802842 and rs6983267 again showed strong associations with risk, although rs4779584 did not. For rs10795668, the evidence for an association with rs10795668 was only moderate in the Hes et al data13 and our own previous study,11 but the direction and magnitude of effect were consistent in all three studies. Our other four previously reported adenoma SNPs (rs10936599, rs4444235, rs1957636, rs4939827, rs961253)11 were not as well supported by this study or by Hes et al13 (details not shown), although this might have resulted from the relatively small sizes and low power of the two multiple adenoma case collections.

Our set of 142 FAP patients with germline APC variants had a very similar risk score distribution to the controls, as did the 30 cases with classical FAP and no identified pathogenic variant in the polyposis genes (P=0.53 and 0.42 respectively; Table 1). Given the overwhelming effect of the germline APC variant on the FAP phenotype, the former result was not unexpected, and provided reassurance that the controls used were representative of the general UK population.

The severity of colorectal polyposis in our FAP cases (number of colorectal adenomas at prophylactic colectomy) was known for 64 patients from 30 families with pathogenic germline APC variants. Including sex, age and the position of the germline APC variant (codons 1265–1389 vs regions associated with attenuated FAP vs other)14 as covariates in a linear regression analysis, we tested whether the CRC SNPs were associated with FAP severity. Although APC variant position was strongly associated with polyp number (details not shown), there was no association with any other variable. In particular, SNP risk score showed very little evidence of a positive association with polyp number (OR=0.94, 95% CI: 0.87–1.01, P=0.09), and no individual SNP was nominally associated with polyp count.

In general, the phenotypic overlap between Mendelian and ‘sporadic’ disease is particularly important for the common cancers, in terms of the clinical management of cancer families and/or those with multiple tumours. Our results probably underestimate the effects of the CRC SNPs on the multiple adenoma phenotype, because the publicly available population controls were not known to be adenoma-free; furthermore, there almost certainly exist undiscovered, common CRC risk variants. A further consideration is that a small number of our multiple adenomas cases may have carried germline variants in the mismatch repair genes (MSH2, MLH1, MSH6, PMS2) or the juvenile polyposis genes (SMAD4, BMPR1A). We were a little surprised to find no evidence that the CRC SNPs affected the severity of the colorectal phenotype in FAP cases, because adenoma pathogenesis is often thought to be similar in FAP and sporadic lesions. Genotyping of modifier genes could influence the management of FAP patients – for example, in choosing between ileorectal anastomosis and pouch formation. However, our data suggest that the known CRC SNPs cannot be used for this purpose.

In conclusion, although unidentified Mendelian predisposition genes for multiple adenomas may exist, we have shown that common CRC risk variants are likely to contribute to the multiple adenoma phenotype. It is highly plausible that some of the multiple adenoma cases carry additional, unknown susceptibility variants with moderate or small effects. Multiple adenoma cases in whom the known Mendelian syndromes have been excluded are often not clinically distinguishable from the Mendelian conditions of attenuated FAP, MAP or PPAP. However, some of the multiple adenoma patients will have ‘polygenic’ rather than monogenic disease, with an accompanying lower risk of CRC in family members. Although polygenic multiple adenoma cases cannot currently be identified positively by genetic testing, the existence of non-Mendelian genetic adenoma aetiology should be recognised when counselling the families of multiple adenoma patients and monitoring the screening regimens of at-risk relatives.