Introduction

Haematopoiesis, the process by which blood cells are generated, begins in embryogenesis and continues throughout an individual’s lifespan1. Haematopoietic stem cells (HSCs) are responsible for the creation of all mature blood cells, including red blood cells, platelets, and the numerous myeloid cells (such as monocytes and neutrophils) and lymphoid cells (such as T cells and B cells), that comprise the innate and adaptive immune systems. Roughly 50,000–200,000 HSCs in an adult human2 produce an estimated 1010–1012 progeny blood cells every day3. Over the course of repeated cell divisions during a lifetime, HSCs accumulate unique patterns of acquired DNA mutations. Each HSC gains approximately one new exonic variant per decade4, although somatic changes can and do occur throughout the non-coding genome as well. Although the majority of these acquired mutations involves genetic loci that do not lead to phenotypic consequences, mutations can occur in portions of the genome that may confer a relative fitness advantage to affected HSCs. Such a fitness advantage can take several forms, including an increased proliferative drive, a more durable capacity for self-renewal that counteracts ageing-related drop-out from the HSC pool, or an improved ability to evade death from cellular damage5. Over time, the relative fitness advantage of these mutated HSCs can result in the clonal production of a large number of progeny that all bear the same somatic alterations.

This expansion of haematopoietic cells with the same acquired mutation is referred to as clonal haematopoiesis (CH). The first experimental evidence suggestive of widespread age-related clonality in the blood dates back to the mid-1990s6, but the genetic characterization of the acquired clonal mutations has only been possible over the past several years owing to three parallel developments. First, advances in next-generation sequencing technologies have enabled the identification of mutations with high resolution (that is, single base-pair changes) even when these lesions are present in just a fraction of sampled cells. Second, bioinformatic innovations in analyses of big data have allowed for the detection of mutations in meaningfully large datasets. Third, many simultaneous efforts to build institutional and national cohorts consisting of tens to hundreds of thousands of individuals have begun to come to fruition, providing ample substrate in which to look for acquired mutations as well as their associations with inherited variation and clinical phenotypes. The convergence of these trends has led to the identification of several distinct types of CH that are common and hold important implications for human health. Specifically, it is now known that CH is linked to a heightened risk of mortality and multiple common diseases of ageing, including blood cancers and cardiovascular disease (CVD). Moreover, this recent work has shown that germline variation influences the risk of developing CH and the type of acquired mutation that a clone will have.

In this Review, we aim to provide a complete synthesis of available research of how inherited genetic variation influences the incidence of CH. We detail the current evidence from twin studies and large-scale genetic association studies regarding the heritable risk of CH and consider how this genetic architecture intersects the biology of ageing.

Clonal haematopoiesis

Somatic variation giving rise to CH

Here, we use ‘CH’ as an umbrella term that refers to the presence of an expanded mutant clone of any sort within the blood, excluding the reactive expansion of immune cells within lymphoid organs and frank malignancy. CH is a common phenomenon among the general population and its prevalence increases significantly with age7,8,9,10,11. The blood is not unique in the accumulation of mutations with age, a trend which has also been observed in the solid organs12, although CH involves a distinct set of recurrently mutated genes and comes from a readily available tissue source. To date, the literature has largely classified CH by the type of somatic variation that can be observed within the clone: gain, loss and copy-neutral loss of heterozygosity (CN-LOH) events involving a large portion of a chromosome or single-nucleotide variation and short insertions/deletions (indels).

By far the most common genetic lesion seen in CH is mosaic loss of the Y chromosome (mLOY) in men11,13,14,15,16. Additionally, mosaic chromosomal alterations (mCAs)9,17,18, single-nucleotide variants (SNVs) and indels in genes associated with myeloid malignancies, and putative incidental mutations/genomic drift (CH with unknown drivers)8,10 have all been documented. In the absence of a haematological malignancy, these SNVs/indels are known as CH of indeterminate potential (CHIP) when the mutations are present at ≥2% variant allele fraction (VAF)7,8. This classification scheme is largely a by-product of how CH is identified in existing studies. Chromosomal abnormalities, including mLOY and mCAs, can be interrogated using genome-wide genotyping arrays (such as those used for genome-wide association studies (GWAS)), whereas whole-exome or whole-genome sequencing (WGS) data can identify SNVs or indels but is suboptimal for identifying mCA events.

Of these subtypes of CH, the vast majority of studies of inherited risk have examined mLOY, mCAs or CHIP; therefore, the remainder of this Review focuses on these three entities (Fig. 1).

Fig. 1: Types of clonal haematopoiesis.
figure 1

Clonal haematopoiesis refers to a clonal expansion of blood cells that are often identified based on shared genetic mutations. a | An especially common type of clonal haematopoiesis is the mosaic loss of the Y chromosome (mLOY), an entity that is often studied separately from other large chromosomal events. b | When large segments of one or more chromosomes are gained, lost or recombine resulting in the loss of heterozygosity, this may result in clonal haematopoiesis with mosaic chromosomal alterations (mCAs). c | Clonal haematopoiesis may also occur through mutations in myeloid-associated genes, termed clonal haematopoiesis of indeterminate potential (CHIP).

Epidemiology of CH

All types of CH are strongly age associated. It has been postulated that all adults have some CH mutations at extremely low clonal fractions19,20 but prevalence estimates for CH in the population are typically based on the identification of clones with VAF of at least ~2%, which is approximately the limit of detection for many commonly used assays. An estimated 1.7–20% of men have some amount of mLOY11,13,14,15,16,21, with the prevalence increasing to >40% of individuals by age 70 in the largest epidemiological study to date11. The X chromosome seems to have a lower rate of mCA acquisition. Approximately 8% of women over the age of 65 years have detectable X chromosome mosaicism9, whereas men rarely have X chromosome mosaicism in the blood at any age22. Autosomal mCA events are the least commonly observed type of CH, affecting ~1–5% of the population older than 70 years of age9,17,23,24,25,26, whereas CHIP is estimated to affect >10% of individuals older than 70 years of age7,8,27. An individual can have both mCAs and CHIP simultaneously, which occurs frequently with point mutations in JAK2 (a tyrosine kinase involved in multiple cytokine signalling pathways28) and mCA events at the same locus17,29. However, aside from the JAK2 locus, the co-occurrence of CHIP and mCAs as identified by bulk sequencing/genotyping from the same individual appears to be a rare event18,27, although the prevalence is higher among patients treated for solid tumours30.

The prevalence of CH varies across several demographic features. There is a sex bias for specific mCA lesions, with most of these having greater prevalence in men17,26. Although some studies have suggested a male-bias for CHIP7, other studies have found this association does not persist after controlling for potential confounders27. Groups with different ancestries also have different prevalence. For instance, mLOY is less commonly observed in individuals of African ancestry than of European ancestry (0.4% versus 1.8%)15. Meanwhile, CHIP mutations are less frequently observed in individuals identifying as Hispanic7,27 or East Asian27.

There are substantial differences across age in the distribution of mutated CHIP genes27,31. In particular, mutations in the de novo DNA methyltransferase DNMT3A and in JAK2 can be observed with some regularity beginning in the third and fourth decade of life, whereas clones carrying mutations in spliceosome genes are generally detected no earlier than the fifth and sixth decades of life27,31. The extent to which this distribution is shaped by differences in DNA sequence mutability20, relative fitness advantage20 or interactions with an ageing microenvironment31 is still an area of active investigation.

Environmental exposures that increase somatic variant acquisition are significantly correlated with CH prevalence. In particular, smoking is robustly associated with CHIP8,27,32 and mLOY11,13,14,15,16,33,34. In the case of cytotoxic chemotherapy and radiation therapy, the mutational spectrum exhibits a marked enrichment of mutations in DNA damage response pathway genes32,35,36. The outgrowth of CH clones following anticancer therapy is partly due to the expansion of pre-existing clones with a selective advantage35 but may also be from the introduction of new mutations by the anticancer agents themselves37 or due to stochastic effects from a bottleneck event for HSCs.

Health consequences of CH

Although most individuals with CH have normal haematological parameters, CH is associated with significant health consequences. With respect to larger chromosomal abnormalities, epidemiological studies have demonstrated associations between the mLOY and a broad range of health outcomes in men, including all-cause mortality15,21, numerous types of cancer11,14,21,33,38,39,40, cardiovascular events41, Alzheimer disease42, schizophrenia43, autoimmune disease44,45, diabetes15 and age-related macular degeneration46. Autosomal mCA events have been associated with an increased risk of haematological malignancies17,18,26,30 as well as with all-cause mortality only partially explained by excess cancer deaths9. Additionally, even as mCAs are independently associated with a heightened risk of myeloid malignancies, a retrospective analysis of patients with solid tumours found relatively increased rates of haematological malignancies in patients with both mCAs and CHIP compared to those with either alone30. Whether the presence of dual mCAs/CHIP is an indicator of individuals with particularly unstable genomes or whether the combination of these lesions cooperatively leads to malignancy risk remains to be determined. The heightened risk of infection and serious infectious complications may account for a portion of the excess mortality seen in patients with mCAs: a recent multinational study found that mCA events are moderately associated with risk for a wide range of infections (odds ratio (OR) = 1.06), including the risk of hospitalization for COVID-19 (OR = 1.6)47. Somatic mutations in recurrently mutated CHIP genes have been studied in both natural epidemiological contexts and in experimental models, which have revealed strong associations with mortality, malignancy and CVD. All-cause mortality is greater in individuals with CHIP compared to without CHIP7,10; this is partly due to an increased risk of haematological malignancies, which has been observed across many studies7,8,10,48,49. However, individuals with CHIP mutations have excess mortality compared with those who do not harbour such mutations even after controlling for blood cancer deaths7,10. This may be partly explained by an association between CHIP and CVD. On a population level, CHIP has been linked to a greater burden of atherosclerotic vessel disease and a heightened risk of myocardial infarction50,51,52 as well as to higher blood levels of the inflammatory marker C-reactive protein53. Mouse models of CHIP have demonstrated mechanistic ties between certain common CHIP mutations and accelerated atherosclerosis50,54,55 as well as heart failure56,57.

Despite the fact that numerous genes affected by somatic CHIP mutations have been associated with increased cancer and CVD risk, there are early indications of important functional differences in how each mutant gene might contribute to that risk. For instance, mutations in splicing factor U2AF1 are associated with a higher risk of acute myeloid leukaemia (AML) and with a shorter latency to disease than mutations in DNMT3A48,58. Somatic mutations in TET2, encoding a dioxygenase that opposes the action of DNMT3A by promoting DNA demethylation58, and in JAK2 are associated with coronary artery disease50 but may differ in how they contribute to blood cell dysfunction. In mouse models, mutations in Tet2, whose gene product recruits HDAC2 for the resolution of IL-6-mediated inflammation59, are associated with increased expression of Il1b, Il6, Cxcl1, Cxcl2 and Cxcl3 (refs50,54). While mutations in Jak2 also lead to higher Il1b expression, they additionally lead to plaque-promoting erythrophagocytosis55, secretion of arterial spasm-inducing erythrocyte-derived microvesicles60 and thrombotic neutrophil extracellular traps61. Yet, much remains to be learned about the relative risk of disease outcomes with specific CHIP genes, let alone how disease risk might vary across different protein-altering variants within each gene. The curation of large CHIP cohorts with inherited genotype and deep phenotype data will enable further investigation of how germline variation affects CHIP-to-disease risk.

There is accumulating evidence that CHIP mutations may interact with human illnesses beyond cancer and CVD. Somatic CHIP mutations have been associated with several diseases in which inflammation features prominently, including chronic obstructive pulmonary disease18,62, adult-onset haemophagocytic lymphohistiocytosis63 and anti-neutrophil cytoplasmic antibody-associated vasculitis64. CHIP also appears to be associated with several types of infections and with potentially severe disease manifestations among those infected with SARS-CoV-2 (ref.65), perhaps as a result of CHIP-exacerbated inflammatory signalling66. Several recent analyses have also found high rates of somatic mutations in CHIP genes in people with immunodeficiency from HIV, which might be a consequence of a pro-inflammatory disease state but might equally well be due to the impaired clearance of CH clones by T cells67,68. Furthermore, CHIP may have dynamic interactions with certain therapeutic interventions. Mutations in CHIP genes involved in the DNA damage response pathway, such as TP53 and PPM1D, are highly enriched following radiation treatment or treatment with a select few cytotoxic chemotherapies32,35,69,70,71,72. Additionally, CHIP has been associated with significantly increased mortality following transcatheter aortic valve implantation73, which is the first indication that CHIP might have an impact on surgical/procedural outcomes. Emerging research suggests that CHIP may impact patient outcomes following HSC transplantation (HSCT). Transplanted HSCs face several sizable and unique challenges, including the high replicative demand in order to reconstitute the entire population of blood cells as well as the exposure to immunosuppressive and cytotoxic therapies. The current evidence (nicely summarized in refs74,75) suggests that donor-derived CHIP is not uncommon in both allogeneic HSCT and autologous HSCT recipients and may increase risks of graft-versus-host disease, donor-derived leukaemia and overall mortality, although the interactions appear to be complex and may depend on both patient characteristics and the CHIP gene in question.

The disparate genomic lesions seen in CH and their associations with a broad range of consequential health outcomes has spurred research into how germline genetics influences the acquisition and outgrowth of specific somatic changes. In the next section, we discuss the associations between inherited variants and mLOY, mCAs and CHIP that have been described to date (Fig. 2).

Fig. 2: CH subtypes have shared and unique risk variants.
figure 2

Many germline risk loci have been linked to the development of clonal haematopoiesis (CH). The three subtypes of CH that have received the greatest scrutiny in this area are mosaic loss of the Y chromosome (mLOY), mosaic chromosomal alterations (mCAs) and clonal haematopoiesis of indeterminate potential (CHIP). These three subtypes are enriched for several of the same germline variants such as those affecting DNA damage response genes CHEK2 and ATM, proliferation factor TCL1A, and telomerase component TERT. However, each of these entities also retains risk loci unique to it alone. Considering the spectrum of variants, one notable pattern is a rarity of mitosis-specific genes in CHIP compared to their relative abundance in mLOY and mCAs. Another broad theme is the high prevalence of previously identified associations between these germline loci and diseases of ageing, including malignancies, cardiovascular disease and dementia. The degree to which CH is involved in these known links to disease remains to be determined. Note: over 150 loci have been associated with mLOY, only a small number of which are depicted in this figure.

Early evidence for inherited risk of CH

Although much of the knowledge about germline risk of CH has come from recent large-scale genetic association studies, some of the foundational insights in the field came from smaller studies relying on shared lineage to identify inherited risk factors. Starting even further back, research into inherited risk for haematological cancers provided signals that have informed the thinking around germline risk for CH, highlighting both the commonalities and differences between CH and malignancy.

Insights from haematological cancers

As CHIP mutations are also found in myeloid neoplasia, work on the genetic predispositions to haematological malignancies provided key initial insights linking the germline variation and expansion of somatic haematopoietic mutations. CHIP and myeloid malignancies such as myeloproliferative neoplasms (MPNs), myelodysplastic syndromes (MDS) and AML arise from similar origins in haematopoietic stem and progenitor cells (HSPCs)76,77,78. Despite the shared origins and patterns of acquired mutations, the vast majority of individuals with CHIP never develop a myeloid malignancy. Indeed, individuals with CHIP have normal counts of normal-appearing cells, whereas those with malignancy have abnormal numbers of blood cells and/or visibly dysmorphic cells. Given that CHIP is a potential precursor state to haematological cancer, many of the known germline risk factors for myeloid disease may also predispose to CHIP in a similar manner. Future study of the differences between the sets of germline variants predisposing more to CHIP versus the set predisposing more to malignancy may prove informative as to why only a minority of individuals ever progress from one to the other.

Many of the same germline variants predisposing to JAK2-mutated malignancies have also been associated with JAK2-CH79. JAK2 is the most commonly mutated gene in MPNs29 and the JAK2 p.Val617Phe mutation (JAK2V617F) is a characteristic feature of MPNs incorporated into the World Health Organization diagnostic criteria for over a decade80,81. Consequently, some studies looking to define MPN germline risk have used cohorts formed exclusively of diagnosed myeloid disease/MPNs82,83,84,85,86,87,88,89, whereas other studies have augmented cohorts of diagnosed MPNs with the addition of any individuals with a molecularly detectable JAK2V617F mutation78,79 (which may include undiagnosed MPNs as well as JAK2V617F-CH).

The first inherited variation linked to JAK2V617F-mutated MPNs was the 46/1 or GGCC haplotype, a collection of single-nucleotide polymorphisms (SNPs) stretching across several hundred kilobases of DNA that includes the JAK2 gene itself82,83,84,85,86,87,88. In several studies of patients with MPNs, the JAK2V617F somatic variant was identified in cis with the inherited 46/1 risk haplotype more often than would be predicted by chance82,83,84, which might suggest that the haplotype provides a hypermutable substrate for somatic alterations. Furthermore, this haplotype may increase the rate of JAK2V617F clonal expansion. In a study of clonal dynamics preceding MPN diagnosis in 12 patients, the homozygosity for 46/1 was enriched in those patients with the highest average clonal growth rate90, an intriguing finding which should be followed up in larger cohorts. Inherited polymorphisms in the telomerase reverse transcriptase (TERT) locus have also been linked to all varieties of MPNs in several studies27,78,87,89 and to JAK2V617F-mutated disease in several others79,86,88. While most tissues in the human body lack the expression of TERT (a key enzyme in telomere maintenance), haematopoietic stem cells have the constitutive expression of this protein91,92. The precise mechanism of how telomere regulation might influence the expansion of JAK2V617F or other CH clones is just beginning to be understood (see Overlap with biomarkers of ageing, below).

Two dozen additional loci imparting potential risk of MPNs have recently been identified78,79,87. Similar to JAK2 (ref.93) and TERT94, many of these are genes implicated in the functional regulation of HSCs (including SH2B3 (ref.95), TET2 (ref.96), ATM97, GFI1B98 and RUNX1 (ref.99), among several others) although some of the loci with strong signals, such as PINT, have no known role in HSC biology. While the location of lead SNPs in or near key HSC regulators is strongly suggestive of mechanisms that disrupt normal HSC biology, variant-to-function analyses have provided added evidence in the case of GFI1B and CHEK2. In the first case, the lead SNP was located in a putative enhancer region downstream of GFI1B and was experimentally determined to lead to lower GFI1B expression, which, in turn, was shown to increase HSPC self-renewal78. The second case involves a rare missense variant in CHEK2 and similarly demonstrated increased HSPC self-renewal following the knockdown of gene expression78.

The genetic associations identified for JAK2-mutated malignancy and JAK2-CH are not completely overlapping. The examination of a JAK2V617F-CH cohort replicated associations with the 46/1 haplotype, TERT, SH2B3 and TET2 with nominally significant signals for CHEK2, ATM, PINT and GFI1B79; additionally, KPNA4 has been associated with a greater risk of all CHIP, inclusive of JAK2-CH27. The present lack of replication of other MPN-associated loci in CHIP cohorts leads to the question of whether and how inherited variation might shape the convergent somatic mutational landscapes yet differ in the magnitude or type of attendant phenotypic risk (Box 1).

Compared to the literature on JAK2, studies of haematological malignancies have been less revealing with respect to what germline factors may increase the risk of somatic mutation in other CHIP genes. Family-based studies of inherited risk of MDS and AML have noted a high prevalence of non-disease CHIP in carriers of rare inherited variants affecting RUNX1, a member of the core binding factor family of transcription factors and a key regulator of definitive haematopoiesis100,101. Aside from RUNX1, there are several other germline variants recognized to predispose to myeloid, lymphoid or plasma-cell neoplasms that could presumably also predispose to asymptomatic CHIP102. Genetic association studies of CHIP-only cohorts (that is, only individuals without haematological disease; discussed in detail in the ‘Results from genetic association studies’ section below) have seen a significant signal with just one of these genes: TERT27. Although the remainder are strong candidates for genes likely to predispose to CHIP, concrete evidence of this in asymptomatic individuals is currently lacking.

Evidence from sibling studies

Although no groups have conducted sibling studies of chromosomal mosaicism, several have looked at CHIP in siblings. The first study to examine the heritability of CH mutations using siblings looked only at the two most commonly mutated CHIP genes, DNMT3A and TET2 (ref.62). The authors looked at the risk-recurrence ratio (λs) for mutations within these genes among a set of 391 female sib-ships of French-Canadian ancestry and found no familial risk for DNMT3A mutations but a significantly increased risk for TET2s = 2.24 for those ≥55 years of age, λs = 2.65 for those ≥65 years of age)62. One sib-ship consisting of seven sisters was notable for having TET2 mutations in 4/7 and a DNMT3A mutation in 1/7 sisters, raising the provocative but unanswered question of whether germline genetics or common environmental exposures did more to shape such a pedigree62.

The heritability of CHIP has also been examined in twin pairs in two recently published studies103,104. One study consisted of 299 twin pairs from Denmark104, whereas the other was comprised of 79 twin pairs from the UK103. Neither study found a higher concordance for the incidence of CHIP among monozygotic (MZ) twins than among dizygotic pairs. The larger of the two studies additionally found no increased concordance among MZ pairs for CHIP mutations specifically in DNMT3A or TET2 (ref.104). Of note, these studies each identified sets of MZ twins that shared identical CH mutations (KDM6A p.Q692X and DNMT3A p.R598X in the UK cohort103 and SRSF2 p.P95H and c.912_916delCTGGT in DNMT3A in the Denmark cohort104), suggesting these mutations occurred in utero103; several subsequent studies of patients with MPNs have identified JAK2V617F and DNMT3A mutations that similarly arose during embryogenesis or childhood105,106. Taken together, these twin studies provide no evidence for common, strong germline effects on the development of CHIP in the populations studied. However, the moderate power afforded by the size of the study cohorts precludes the detection of more modest effects. Additional twin studies on diverse populations, with the potential for subsequent meta-analysis, could supplement the existing work in this area. Future twin studies would also be warranted for mLOY and mCAs, the present lack of which is a notable gap in the field.

Results from genetic association studies

The bulk of the data regarding the inherited risk for CH comes from genetic association studies. These studies identify the enrichment of genetic variants in people with CH across large, unrelated and (more-or-less) diverse samples. Such analyses are well suited to finding common germline variants with modest effects that are noticeable in the aggregate. The sheer size of newly usable national cohorts (on the scale of 100,000–500,000 individuals) has further enabled the detection of effects from rare inherited variants present in a tiny fraction of the overall population.

Mosaic loss of Y

A substantial fraction of risk for mLOY appears to be genetically determined, with estimates of mLOY heritability ranging from 9% to 34%11,13,107. The first germline association with mLOY to be uncovered was with a common SNP (rs2887399) near the 5′ end of TCL1A, which encodes the protein T cell leukaemia/lymphoma 1A (TCL1A)14. The TCL1A protein is a co-activator of AKT and it participates in B and T cell malignancies108, largely through chromosomal rearrangements that place TCL1A near TCR-A (the gene for the T cell antigen receptor)109. This strong association between rs2887399 and mLOY has been replicated in subsequent studies with larger cohorts11,13; notably, single-cell RNA-sequencing of B lymphocytes has demonstrated that TCL1A gene expression is significantly higher in the setting of mLOY, suggesting that such clonal outgrowth in mLOY could be partly driven by supra-normal TCL1A expression11. GWAS projects have found over 150 additional loci significantly associated with mLOY, many of which functionally regulate various aspects of the cell cycle, including the formation of mitotic structures (for example, SPDL1, CENPU and CENPN, MAD1L1 and MAD2L1, and PMF1), the replication and stability of DNA (for example, ATM and NPAT), and cell arrest and apoptosis (for example, TP53, BCL2 and BAX)11,13,107. The implicated genes highlight three complementary processes influencing mLOY: increasing rates of functional mistakes during mitosis, a lack of ability to detect such DNA abnormalities and escape from normal apoptotic regulation in the face of recognized DNA damage.

Autosomal and X chromosome variation

As with mLOY, autosomal and X chromosome mCAs are associated with germline variants that increase risk of mutagenesis. Unlike mLOY, which only involves the unpaired Y chromosome, these mCAs may also be associated with variants that provide a strong selection pressure towards CN-LOH events9,17. Studies conducted in population-scale biobanks in the UK (UK Biobank (UKB))9,17,110 and Japan (BioBank Japan (BBJ))26,111 have demonstrated significant germline associations with mCAs. These associations occur both in cis and in trans with the inherited variant. In both populations, the trans associations involve common alleles with modest odds ratios. Variants in TERT and the related TERC (encoding telomerase RNA component91) as well as variants in SP140 (encoding a lymphoid-restricted nuclear body protein involved in B cell antigen response112) are associated with mCAs occurring anywhere in the genome17, whereas the remaining inherited variants have only been associated with trans mCAs on a particular chromosome9,17,26 (Table 1). Apart from common variation in TCL1A and DLK1 (a negative regulator of HSPC differentiation113) that is linked to 14q CN-LOH17 and a known association between the JAK2 46/1 haplotype and 9p CN-LOH9,17,82,83,84,114, the identified cis mCA associations are predominantly rare variants. Many of these rare germline variants are missense or nonsense mutations predicted to damage protein function. The cis mCA lesions associated with these disruptive variants demonstrate a strong preferential CN-LOH duplication of either the risk or the non-risk allele. In ATM, NBN and MRE11, all of which are genes involved in maintaining genomic integrity, it is their damaged germline allele that is more commonly propagated9,17. Conversely, the presence of damaging germline variants in the MPL gene, which encodes the thrombopoietin receptor important for HSC self-renewal, are associated with the duplication of the non-damaged allele9,17. Preferential CN-LOH duplication arising from germline alleles that confer a relative fitness advantage may also extend to polygenic risk. When the group studying the UKB cohort constructed blood cell-proliferation polygenic risk scores consisting of signals within individual chromosomal arms, they found that these are often associated with CN-LOH events on the same arm17. This finding raises the possibility that a main driver of these common CN-LOH somatic events is the replacement of inherited DNA segments with homologous segments that impart a greater fitness advantage17.

Table 1 Cis-acting and trans-acting risk variants for mCAs

The specific inherited variants associated with mCAs and the spectrum of mCAs themselves may differ significantly across populations. Several of the rare variants associated with cis mCAs in the UKB cohort (ATM, MPL, FRA10B and TM2D3–TARSL2) were absent in the BBJ cohort, whereas variants in several other genes (MRE11, NBN, NEDD8–TINF2 and CTU2) were present at higher frequencies26. These population-specific differences may shape not only the relative frequencies of observed mCAs but also patterns of downstream disease. For example, the incidences of chromosome 12 gain, 13q loss and 13q CN-LOH are between twofold to sixfold less in the BBJ cohort26; these mCAs are often seen in chronic lymphocytic leukaemia115,116, a malignancy that is four to five times more common among Europeans than among Japanese individuals117. Collectively, these studies highlight the importance of including diverse populations in genomics research118.

Small variants: SNPs and indels

Several recent large genomic studies have focused on CH identified with WGS data, using short-read sequencing to simultaneously identify germline and somatic SNPs and indels10,18,27. Mirroring one of the main signals found with JAK2, one study using the deCODE cohort from Iceland found that variation in the TERT locus (lead SNP rs34002450) was associated with CH (OR = 1.37, minor allele frequency (MAF) = 0.41) as defined by an outlier status on WGS10. Meanwhile, in the same study, individuals with CH were found to have a shorter average telomere length than individuals without CH10. An analysis of the NHLBI Trans-Omics for Precision Medicine (TOPMed)119 cohort in the USA recapitulated the association between rs34002450 and CHIP (OR = 1.3), although this study identified a different lead SNP (rs7705526; MAF = 0.29; r2 = 0.55 with rs34002450) as well as a second SNP in TERT that was independently associated with CHIP (rs13167280; OR = 1.3; MAF = 0.11; r2 = 0.2 with rs7705526)27. Additionally, an analysis of the UKB similarly identified associations with CHIP for an SNP in linkage disequilibrium with rs34002450 (rs7726159; OR = 1.33; MAF = 0.33; r2 = 0.70 with rs34002450) and for a second independent SNP in TERT (rs2853677; OR = 1.32; MAF = 0.42)18. Within the TOPMed cohort, two additional SNPs achieved genome-wide significant associations with CHIP. One variant (rs1210060191) is quite common (risk allele frequency = 0.54) and lies in the intronic region of TRIM59 but has a relatively weaker association with CHIP (OR = 1.16) than the TERT SNPs27. The second is a variant in an intergenic region near TET2 (rs144418061), which is specific to individuals with African ancestry (MAF = 0.035 in African ancestry, not present in samples without African ancestry), that is strongly associated with CHIP (OR = 2.4)27. A subsequent variant-to-function analysis of this second locus revealed a variant (rs79901204) that is predicted to disrupt a GATA/E-box in an enhancer element. The risk allele for this variant indeed reduced luciferase activation fourfold in an in vitro experiment and had a dose-dependent association with decreased TET2 gene expression in whole-blood samples from patients. Thus, it appears this variant increases the self-renewal and proliferation capacity of haematopoietic stem cells via reduced TET2 expression, which might create a selective pressure for CHIP clone expansion in one of several ways27. Increased rates of cell division may increase DNA replication strain and increase the likelihood of acquiring a lesion in a CHIP gene in the first place and/or the germline TET2 SNP might have a synergistic cooperativity with any subsequent incidental CHIP mutations to increase the relative fitness of the HSC.

The TOPMed study was also powered to investigate germline associations specifically in DNMT3A-CH and TET2-CH. Although there were no significant associations with TET2, there was a significant association for DNMT3A with variant rs2887399 (OR = 1.23; MAF = 0.23)27. Of note, this is the same variant near TCL1A that is associated with mLOY11,14. Each of these germline SNPs associated with CHIP are noted in Fig. 3.

Fig. 3: CHIP has polygenic risk.
figure 3

Genetic association studies have demonstrated that the inherited risk landscape for clonal haematopoiesis of indeterminate potential (CHIP) is characterized by numerous common variants with modest effect sizes and several rare variants associated with strong effects. Associations with the TERT locus have been replicated among numerous studies of individuals with CHIP as well as clonal haematopoiesis (CH) identified by high somatic mutational burden in whole-genome sequencing (WGS-outlier). Furthermore, CH by WGS-outlier is strongly correlated with CHIP and mosaic loss of the Y chromosome, highlighting a robust association between telomere biology and CHIP. Where studies have examined single genes affected by a somatic CHIP mutation, the results point to heterogeneity in their germline associations, both in terms of associated variants (for example, germline TCL1A variation is associated with DNMT3A-CH but not JAK2-CH) and the degree of association (for example, a stronger association of germline TERT variants with JAK2-CH than with CH overall).

Overlap with processes of ageing

Inherited and somatic variation at many CH risk loci have also been linked to diseases and processes of ageing (Table 2). Here, we specifically focus on the overlap of CH germline risk loci with germline variants associated with malignancy, CVD and several biomarkers of ageing. We also consider the challenges in distinguishing whether such overlap is independent or partially mediated by the presence of CH itself.

Table 2 Select germline risk loci associated with CH and diseases of ageing

Overlap with malignancy

In addition to the associations with MPNs described above, many of the inherited risk variants for CH also predispose to haematological and non-haematological cancers. This is particularly true of the genes involved in the DNA damage response: CHEK2, TP53, NBN, MRE11 and ATM. Inherited putative loss-of-function variants in CHEK2 (refs120,121,122,123,124) and TP53 (refs125,126,127) have long been known to be a cause of autosomal dominant familial cancer syndromes, while mutations in NBN (causing the autosomal recessive Nijmegen Breakage Syndrome)128,129 and MRE11 (refs130,131) confer an increased susceptibility to the development of a malignancy. Similarly, mutations in ATM, the aetiological agent of the autosomal recessive ataxia telangiectasia syndrome132, are associated with an increased risk of numerous types of cancer, including leukaemia and lymphoma133,134, breast cancer135,136, and prostate cancer137,138, among many others. Germline mutations in NPAT (nuclear protein, ataxia telangiectasia locus), whose gene product has been implicated in the transcriptional regulation of histone genes as well as ATM139, has been reported as a risk factor for Hodgkin lymphoma140. The most plausible mechanism of action for the contribution of these inherited variants is similar to their role in cancer — establishing a cellular context that is permissive of DNA mutation — rather than the direct effects on clonal proliferation. By contrast, other inherited variants may directly influence proliferation or augment the rapidity of proliferation by later CH mutations. Mutation or experimental deletion of TET2, which is often mutated in familial myeloid and lymphoid malignancies141,142,143, leads to increased HSC proliferation96 and secretion of pro-inflammatory cytokines50,54,56,66. Lastly, variants in SH2B3, a negative regulator of the pro-proliferative JAK–STAT signalling pathway in haematopoietic cells144, are associated with malignancies145, including in the blood144, breast78,146, lung146 and colon78,146.

Overlap with CVD

CVD is a major source of morbidity in ageing. Early epidemiological and functional studies of CHIP identified strong links between CHIP mutations and CVD50,51,54,56,147, raising the question of whether these entities exhibit shared germline predispositions. In addition to the risk of a haematological malignancy, germline TET2 mutations have also been associated with pulmonary arterial hypertension, which is a lethal vasculopathy148. In contrast to the role of this epigenetic regulator in tumorigenesis, which is thought to rest on increased HSPC self-renewal96,149,150,151, lineage skewing96,149,150,151 and an increased tendency towards mutation152, the contribution of mutant TET2 to pulmonary arterial hypertension may stem from overproduction of inflammatory cytokines (for example, IL-1β) in differentiated immune cells148. Meanwhile, genetic variation in the gene SH2B3 has been linked to numerous aspects of cardiovascular dysfunction, including hypertension153,154, aortic dissection155, atherosclerosis156 and stroke157. However, for at least one well-studied variant, there appears to be a trade-off between CVD risk and cancer risk: the C allele of rs3184504, which encodes SH2B3 p.R262W, is associated with a reduced risk of CVD (OR = 0.95) but a heightened risk of cancer (OR = 1.03). If such risk trade-offs persist more generally for SH2B3, this could limit the utility of targeting the gene itself for disease prevention, although future work may find distinct downstream effectors that could be targeted to limit either CVD risk or cancer risk.

Overlap with biomarkers of ageing

The links between telomere biology and CH are robust but complicated. Across tissues, telomere length is inversely correlated with ageing158. Inherited genetic variation influences telomere length, which is also tightly linked to the somatic expression of telomerase genes158. The risk variants associated with CH have substantial overlap with multiple portions of the cellular machinery responsible for telomere maintenance: TERT has been implicated in the risk for all CH subtypes, while TERC and TINF2 (encoding the TIN2 protein, part of the shelterin complex159) are associated with mCAs. However, even though CH is strongly associated with ageing, the germline variation in telomere genes that predisposes to CH tends to associate with longer telomeres not shorter. The TERT intron 2 SNPs rs7705526 (ref.160) (the lead variant for increased risk for CHIP27 and global mCA events17) and rs2853677 (ref.161) (associated with 14q CN-LOH26) associate with longer telomeres and greater telomere length, as predicted by germline variation, is positively associated with mCA events162. Recent work using Mendelian randomization has suggested that the telomere–CH relationship is actually bidirectional: longer telomeres may partly cause CHIP (perhaps through an increased propensity for mutation) whereas CHIP, once acquired, may contribute to telomere shortening (possibly via increased rates of cell cycling)163. Importantly, genetic association studies of telomere maintenance genes have revealed links to a broad spectrum of diseases, including strong ties to cancer164 and CVD165,166,167,168. Future work investigating links between telomere length and these diseases (or CH and these diseases) may need to account for mediating effects through the telomere–CH axis.

DNA methylation at CpG sites is a promising biomarker that has been used to generate highly accurate estimates of chronological age, such as with the Horvath epigenetic clock169. Many diseases that disproportionately affect the elderly, such as cancer and dementia, are linked to accelerated epigenetic ageing, in which an individual’s DNA methylation profile suggests an older chronological age than is true169. Likewise, individuals with CHIP (also an age-associated feature) have epigenetic age acceleration in blood cells170. The rate of epigenetic ageing has a heritable component169, including variation at loci associated with CH risk: TERT, TET2, TRIM59 and KPNA4 (ref.171). Paradoxically, faster epigenetic ageing is linked to TERT variants associated with longer telomeres172, matching the directionality of the CH risk variants at this locus. It is also worth noting that several of the genes that are most often affected by somatic CHIP mutations are epigenetic regulators whose (impaired) performance could plausibly shape an individual’s rate of epigenetic ageing. The top CHIP genes DNMT3A and TET2 directly modulate CpG methylation and dictate global methylation patterns within HSPCs173. Less common CHIP mutations in IDH1 and IDH2 lead to the production of the metabolite 2-hydroxyglutarate, which interferes with the function of TET2 (ref.174). Additionally, interestingly, many of the CpG sites used in the Horvath epigenetic clock are near target genes of Polycomb repressive complex 2 (PRC2)169, a protein complex whose function is impaired by CHIP mutations in ASXL1 (ref.174). Yet, the extent to which CHIP, or CH more broadly, might cause alterations in epigenetic ageing remains to be determined.

Determining causality: MR approaches

As described above, CH is associated with many diseases of ageing, which naturally begets the question: does CH contribute to these phenotypes? Although potential CH–phenotype relationships will be studied by conducting future natural or laboratory experiments like those that have demonstrated ties between CH and haematological malignancies or heart disease, there is a wealth of already generated genetic and phenotypic data that may provide insights on a shorter horizon. Mendelian randomization (MR) is a statistical technique that utilizes inherited variation to test causation between an exposure (here, CH) and outcome (disease of interest)175. MR relies on a quasi-experimental setup in which individuals have a higher or lower probability of experiencing the exposure based on the alleles they were randomly assigned at birth. Using this random germline-determined variation in exposure allows for the estimation of a causal relationship between the exposure and outcome (Fig. 4a). MR analysis has already been used to demonstrate that a higher risk of prostate, testicular, breast, glioma cell and renal cell cancers is predicted by the inherited risk for mLOY11. However, attempts to characterize the contribution of CH to diseases of ageing via MR approaches are likely to face several hurdles. The first is that, as described here, many germline variants associated with CH have also been previously associated with the diseases of ageing, leading to issues of horizontal pleiotropy that confound the estimation of a causal effect by MR176 (Fig. 4b). The risk of this can be minimized but not eliminated by only using variants with no described relation to the disease being studied. A second challenge is that data on CH is obtained through the sequencing of blood cells, which is rarely done in routine clinical practice and, when performed for research studies, is often done only once. As a result, data on CH is often a cross-sectional snapshot lacking information on the evolution and temporal duration of a clone. This will present difficulties for effect size estimation but may be ameliorated by future longitudinal studies and as sequencing costs drop and this test becomes more widely deployed in clinical settings. These potential methodological challenges aside, we anticipate that, with the increased power derived from larger CH GWAS sample sizes, MR will become an increasingly useful tool in answering questions about the health consequences of CH.

Fig. 4: Using germline variation to study causal associations of CH.
figure 4

Certain situations in which germline variants affect the risk of developing clonal haematopoiesis (CH) can be used to determine whether CH has a causal contribution to a given phenotype. A Mendelian randomization (MR) approach takes advantage of the fact that individuals acquire their germline allelic composition by chance. Any change in CH frequency or clone size due to germline variation can then be treated as the result of a random genetic assignment and used to estimate a causal association between CH and the phenotype. Current evidence suggests that some germline variants may affect CH but do not independently influence the associated phenotypes — such variants are the most likely to be appropriate for MR analyses. For example, TCL1A is known to increase the risk of DNMT3A-CH but is not known to have direct effects on cardiovascular disease (CVD) (part a). However, there is substantial overlap in the genetic architecture of CH and diseases of ageing, so some inherited variants may affect both CH and observed phenotypes. In such cases, MR estimates of causality are confounded by the direct association of the germline variant with the outcome. A good example of this is germline variation in SH2B3, which is associated with both the risk of JAK2-CH and CVD (part b). Unless this latter association can be properly accounted for, a typical MR approach would overestimate the effect of JAK2 on CVD.

Conclusions and perspectives

How germline genetics contributes to CH risk is an emerging field with a rapidly growing body of work. By simultaneously analysing germline and somatic genetic variation on a population scale, research in this area in just the past 5 years has made dramatic contributions to our understanding of HSC biology and disease risk.

To date, the patterns of germline susceptibility to mLOY, mCAs and CHIP have largely been studied in isolation from one another. However, the comparison of the inherited risk landscape for each of these phenomena reveals that these entities share many genetic signals (Fig. 2). In particular, the DNA damage response and telomere maintenance pathway genes are commonly implicated in genetic association studies with these CH subtypes. The substantial overlap in germline risk suggests that there may be common mechanisms that predispose individuals to mLOY, mCAs and CHIP. Therefore, there is likely to be a benefit to studying these phenomena jointly. Additionally, the existence of shared risk loci raises the important question of what additional factors may influence the likelihood that an HSC will acquire one type of CH over another (Box 2). It also remains to be fully explored whether and to what degree inherited variants contribute to the co-occurrence (or co-interaction, if one somatic change influences the next) of acquired CH mutations of different varieties, especially CHIP mutations and focal deletions or loss-of-heterozygosity events. These questions are important for our understanding of how HSCs adapt to the stresses of ageing and to improve our ability to assess the risk of disease for individuals carrying predisposing germline variants.

Moving forward, we expect studies in this field to focus not just on how inherited variation influences the risk of somatic mutations but also on how inherited variation interacts with these acquired mutations to influence disease phenotypes and biological ageing. For example, although CHIP mutations are associated with an increased risk of leukaemia and myocardial infarction, these outcomes are observed only in a minority of CHIP carriers7,8,50,51. Recent work has identified an inherited polymorphism in the IL-6 receptor that reduces the likelihood of heart disease in individuals with CHIP51; however, the full extent to which germline factors mitigate or contribute to disease manifestations in individuals with CHIP is still to be explored.

As we understand more about how inherited germline genetic variation interacts with CH, there will be increasing motivation to develop and deploy precision medicine applications that incorporate knowledge of the germline genome to precisely estimate the risk for CH and for developing associated disease sequelae. Given that most individuals with CH do not display overt symptoms of the condition, in time, these approaches may enable more precise CH screening regimens. In the more immediate future, the recent creation of specialty CH clinics177, well suited to capturing CH carriers in populations whose at-risk status warrants more extensive screenings (such as cancer patients), may afford opportunities for the rapid translation of new research insights in this space into impactful patient care.