Introduction

Coffin–Siris syndrome (CSS, MIM#135900) is an autosomal dominant developmental disorder exhibiting, coarse facial features, hypoplasia of the fifth digit/nails, hypotonia, hypertrichosis, and sparse scalp hair [1]. In 2012, pathogenic CSS variants were reported in genes encoding subunits of the BAF (BRG1-associated factor) chromatin-remodeling complex [2, 3]. Since then, more than 150 CSS individuals have been molecularly diagnosed [4,5,6,7,8,9,10]. Nine genes have so far been found to be involved in the pathogenesis of CSS: ARID1A (MIM# 614607), ARID1B (MIM# 135900), ARID2 (MIM# 617808), SMARCA4 (MIM# 614609), SMARCB1 (MIM# 614608), SMARCE1 (MIM# 616938), SOX11 (MIM# 600898), PHF6 (MIM# 300414), and DPF2 (MIM# 618027). These genes encode components of the BAF complex, except for SOX11 and PHF6. SOX11 encodes a transcriptional factor that binds to SMARCA4 and plays an role in embryonic neurogenesis [5, 11], and PHF6 regulates transcription by interacting with another ATP-dependent chromatin remodeler, the nucleosome remodeling and deacetylation complex [4, 12].

The BAF complex is a mammalian SWI/SNF (SWItch/Sucrose Non-Fermentable) complex that plays an important role in chromatin remodeling [13]. The BAF complex is required for normal mammalian organ development, and mutations in genes encoding BAF complex subunits cause multiple malformations and developmental disorders, including CSS [14,15,16].

Presently, more than 150 CSS individuals with a pathogenic variant are described in the literature [17]. We previously reported 71 CSS individuals of whom 39 pathogenic variants were found [2, 18]. Since then, we have recruited 182 additional CSS-suspected individuals. We have analyzed these patients together with the 32 individuals for whom no molecular diagnosis was achieved and we present the results of a comprehensive genetic analysis.

Materials and methods

Recruitment of subjects

We newly recruited 182 individuals clinically suspected of CSS and analyzed them together with 32 CSS individuals for whom no molecular diagnosis had been determined [2, 18]. The diagnostic criteria have not yet been completely established, but most patients shared several key phenotypes including developmental delay, hypoplastic digits/nails, body hypertrichosis, and coarse face according to the clinicians who saw patients [19, 20]. Differential diagnoses should include Nicolaides–Baraitser syndrome (NCBRS), deafness, onychodystrophy, osteodystrophy, mental retardation, and seizures syndrome, and so on, but it was sometimes difficult for them to completely differentiate them from CSS [4, 21, 22]. This study was approved by the institutional review board of the Yokohama City University Faculty of Medicine.

DNA preparation

Peripheral blood leukocytes or saliva of individuals and their family members were collected after obtaining informed consent and genomic DNA was extracted by standard methods.

Whole exome sequencing

Whole exome sequencing (WES) was performed as previously reported [23]. Coding regions within the genomic DNA were enriched with the SureSelect Human All Exon V4, V5, or V6 kit (Agilent Technologies) and sequenced on a HiSeq 2000 or 2500 platform (Illumina, San Diego, CA) [24]. After sequencing, raw FASTQ files were processed to Variant Call Format (VCF) files through several steps, with the following tools: Novoalign, SAMtools, Picard, and Genome Analysis Tool kit. VCF files were annotated by ANNOVAR. All variants [including single nucleotide variants (SNVs) and short insertions/deletions (indels)] were filtered by allele frequency using a public database, The Exome Aggregation Consortium (ExAC), Exome Sequencing Project v.6500 (ESP6500), the Human Genetic Variation Database, and our in-house database (n = 575). In parallel, BAM files were used for copy number variation (CNV) analysis using the Nord method [25], and eXome-Hidden Markov Model [26], as previously reported [27, 28]. Candidate SNVs were validated by Sanger sequencing using an ABI capillary sequencer, 3130xL or 3500xL (Thermo Fisher Scientific). Candidate CNVs were validated by quantitative polymerase chain reaction (qPCR) using a Rotor-Gene Q with Rotor-Gene 6000 Series Software 1.7 (Qiagen) or a LightCycler 480 with LightCycler 480 Software, Version 1.5 (Roche).

Mutation load score (MLS)

To evaluate the mutation load of a single exon as a function of its size, we collected missense variants in each genes which are registered in gnomAD (as of June 2019). Next we calculated the accumulation of missense variants within each exon, except for SOX11 which consists of only one exon, and adjusted by length of each exon and multiplied by 100, as Bögershausen et al. advocated it as mutation load score (MLS) [29]. MLS in this study means the number of missense variants per 100 bp in each exon.

Reverse transcription PCR (RT-PCR)

cDNA was examined to validate effects of a truncating variant and a deletion in patients CSS235 and CSS076, respectively. Total RNA was extracted from their lymphoblastoid cells with an RNeasy Plus Mini Kit (QIAGEN). Subsequently, cDNA was synthesized with the SuperScript III First-Strand Synthesis System (Thermo Fisher Scientific) using oligo dT primers, or with the PrimeScript first strand cDNA Synthesis Kit (Takara Bio) using random hexamer primers, according to the manufacturers’ instructions. PCR and Sanger sequencing were performed using specific primers for each abnormality using the 3130xl or 3500xl capillary sequencer. Primer information is available on request.

Results

Overview of pathogenic gene variants

A total of 214 individuals with suspected CSS (182 newly recruited and 32 for whom no molecular diagnosis was determined in a previous study [2, 18]) were analyzed by WES. Among these, 57 individuals were excluded from the study because they had pathogenic variants in genes that cause different diseases such as NCBRS (seven individuals), Wiedemann–Steiner syndrome (five individuals), and KBG syndrome (two individuals) (Fig. 1a). In the remaining 157 individuals, 78 had pathogenic variants (49.6%) (71 SNVs and 7 CNVs). (Fig. 1a, b). All variants found in this study are shown in Table 1. Four SNVs (three variants in SOX11 and one variant in SMARCE1) and two CNVs (two duplications involving the entire SMARCA2 locus) have been previously described by our group [5, 11, 30, 31], but are included here to give an overview of the pathogenic variants in this CSS cohort. Among 71 pathogenic SNVs, 55 occurred de novo, 15 could not confirmed as de novo because parental samples were unavailable, and only one in PHF6 (X-linked) was inherited from his healthy mother. Among seven pathogenic CNVs, three occurred de novo, two could not be confirmed because of a lack of parental samples, and two were balanced chromosomal translocations inherited from their healthy mothers (Table 1).

Fig. 1
figure 1

Flow chart of the analysis in this study. a In addition to 71 previously reported CSS patients [18], we recruited another 182 CSS patients. All the new patients were analyzed by WES and the 32 undefined patients from the previous study were subjected to re-analysis of existing WES data. Among 214 patients, 57 had pathogenic variants for other diseases and were therefore were excluded from the study. b Pie chart of mutated genes in 157 individuals in this study. c Number of pathogenic variants in each gene. Pathogenic variants already registered in HGMD were counted as ‘SNV_reported by others (recurrent)’ or ‘SNV_reported by our group’ and those which were not registered were categorized as ‘SNV_novel’. Pathogenic CNVs were simply counted as their size and position as they are basically private. Two SNVs in SOX11 were reported by Tsurusaki et al. [5], one SNV in SOX11 was reported by Okamoto et al. [11], one SNV in SMARCE1 was reported by Zarate et al. [31], and two CNVs involving SMARCA2 were reported by Miyake et al. [30]

Table 1 The pathogenic variants and variations found in this study

ARID1B

Forty-eight ARID1B variants were found in our CSS cohort (45 SNVs and 3 CNVs), accounting for 61.5% of genetically resolved cases (48/78) (Table 1 and Fig. 1b). Twenty-eight were novel. Various types of variants included stop-gain, frameshift insertion/deletion, splice site changes and CNVs, clearly indicating that loss-of-function (LoF) changes in ARID1B cause CSS. Of note, one missense variant, c.6257T>G:p.(Leu2286Arg) was found in the BAF250c domain. Only two pathogenic missense variants in ARID1B were registered in HGMD: c.5998G>T: p.Asp2000Tyr in an individual with short stature (not associated with CSS) [32], and c.6092T>C:p.Ile2031Thr in an individual with corpus callosum abnormalities (also not associated with CSS) [33].

Causative variants appear to be preferentially located in exons 1 and 20, but after exon-size correction no such exonic accumulation was seen (Fig. 2a). We are aware that missense variants and LoF variants in healthy populations are registered in gnomAD. Only 10 variants that passed quality control were registered as LoF: five frameshift variants and five variants located in canonical splice sites. Frameshift variants are found in exon 1 (n = 1) and exon 20 (last exon, n = 4). Canonical splice variants were also found in the middle of ARID1B: splice acceptor sites of exon 5, 8, and 11, in the splice donor site of exon 12, and deep in intron 11 (c.3135 + 729insG), but importantly the variant deep in intron 11 could be a canonical splice variant for another transcript (NM_001346813.1:c.3097-2insG). In gnomAD, additional missense variants in healthy populations were clustered in exons 2, 11, and 12 (Fig. 2a) but, in the current study, CSS-causative missense variant, c.6257T > G:p.(Leu2286Arg), occurred in exon 20.

Fig. 2
figure 2

Graphical presentation of pathogenic variants corresponding to exons in ARID1B and ARID1A. Pathogenic variant count is shown as a bar above each exon. White bars show the number of pathogenic variants registered in the HGMD database, and black bars show the number of pathogenic variants found in this study. While exon size (box) reflects original physical length, introns (line) are shown as the same length. Functional domains are shown as black boxes. The mutation load scores (MLS: number of mutations per 100 bp) of variants registered in gnomAD are shown by gray bars. a Variants of ARID1B (NM_020762.3). Pathogenic variants are seen in all exons. Longer exons harbor more pathogenic variants. b Variants of ARID1A (NM_006015.4). Longer exons contain more pathogenic variants

ARID1A

Six ARID1A variants were found in six individuals: two frameshift, two canonical splice-cite, one stop-gain and one missense, indicating that ARID1A variants are LoF (Table 1 and Figs. 1b, 2b). All were novel. Interestingly, c.6251T>G: p.(Val2084Gly) is located within the BAF250c domain, similar to the ARID1B missense variant. In the HGMD database, only one pathogenic ARID1A missense variant, c.6232G>A: p.(Glu2078Lys), was registered as causing CSS and is located within the BAF250 domain [34]. Pathogenic variants in ARID1A were distributed throughout the gene (Fig. 2b) and were not accumulated in any exons after exon-size standardization (data not shown). In contrast, missense variants in gnomAD were accumulated in exons 5, 16, and 19. Only four LoF variants were registered: two frameshift variants in the first exon and two canonical splicing variants, similar to ARID1B.

SMARCA4

Six missense variants in SMARCA4 were found in seven CSS individuals (one recurrent variant found in two independent cases) (Table 1 and Figs. 1b, 3a). All six variants occurred de novo and four were novel. All variants were located within functional domains. Interestingly, pathogenic variants were located in variant-poor exons in gnomAD (Fig. 3a).

Fig. 3
figure 3

Graphical presentation of pathogenic variant count corresponding to exons of SMARCA4, SMARCB1, SMARCE1, SOX11, and PHF6. Pathogenic variant count is shown by a bar above each exon. White bars show the number of pathogenic variants registered in the HGMD database, and black bars show the number of pathogenic variants found in this study. While exon size (box) reflects original physical length, introns (line) are shown as the same length. Functional domains are shown as black boxes. The mutation load scores (MLS: number of mutations per 100 bp) of variants registered in gnomAD are shown by gray bars. a Variants of SMARCA4 (NM_001128849.1), b SMARCB1 (NM_003073.4), c SMARCE1 (NM_003079.4), d SOX11 (NM_003108.3), and e PHF6 (NM_032458.2). SOX11, a single exon gene, is displayed with every 100 bp indicated

SMARCB1

Seven pathogenic SNVs and one CNV were found in eight CSS patients in this cohort (Table 1 and Fig. 3b). Six variants were novel. Seven out of eight pathogenic variants occurred de novo, one variant found in patient CSS174 could not be confirmed because parental samples were unavailable. The most common pathogenic variant, c.1091_1093delAGA, was found in two patients of this cohort and has been found in other cohorts [2, 9]. In patient CSS235, the frameshift insertion, c.1052dup, might not be subjected to nonsense-mediated mRNA decay as it was 66 bp from the 3′-end of exon 8, the second last exon. RT-PCR using lymphoblastoid cells derived from the patient showed consistent aberrant mRNA expression regardless of cycloheximide treatment (Supplementary Fig. 1). Furthermore, individual CSS076 had a 9001-bp deletion from intron 8 through the end of the gene extending to the neighboring DERL3 gene (Fig. 4). RT-PCR using lymphoblastoid cells derived from CSS076 showed two aberrant transcripts regardless of cycloheximide treatment. One transcript involved exon 8, intron 8 and genomic sequences after a telomeric deletion breakpoint. Another shorter transcript contained exon 8 and 112 bp of genomic sequence after the telomeric deletion breakpoint. Unfortunately, we could not determine full-length cDNA sequences because of many repeated sequences, including SINEs and poly A repeats in the vicinity of the 3′-end of both transcripts. However, we could confirm the stop codon in two transcripts and consequently could predict amino acid sequences, p.Arg374Tyrfs*48 and p.Arg374Aspfs*110. Both predicted proteins have longer amino acid sequences (with an almost preserved SNF5 domain) than the wild-type protein. Interestingly two aberrant shorter transcripts may lead to longer aberrant protein products than the wild-type protein produced by the wild-type transcript. In addition, a predicted 48 amino acid sequence produced by a longer aberrant transcript does not have any functional domains, but a predicted 110 amino acid sequence produced by a shorter aberrant transcript had a presumed SERPIN domain (Supplementary Fig. 2). Interestingly, the amino acid sequence of the SNF5 domain in p.Arg374Tyrfs*48, p.Arg374Aspfs*110 and the wild-type protein is the same except for the last amino acid. The deletion involving intron 8 and beyond led to aberrant proteins associated with CSS. Thus, this may not be a loF variant, rather a gain-of-function variant.

Fig. 4
figure 4

Analysis of SMARCB1 cDNA in CSS076 with a 9-kb deletion involving SMARCB1. a A physical map around a 9.0-kb deletion involving SMARCB1 and DERL3 in CSS076. Positions of primer pairs for quantitative PCR are shown as rectangles a–d, which correspond to a–d in c. b Sequence of breakpoint PCR product. The deletion was 9001 bp in size. c Quantitative PCR using primer pairs b and c confirmed the heterozygous deletion. d RT-PCR analysis of SMARCB1 cDNA. Upper and lower panels show product amplified with primer pair 1 and 2, respectively. RT-PCR with primer pair 1 amplified cDNA of the patient and a control. RT-PCR with primer pair 2 amplified only the patient’s cDNA. RT-PCR with primer pair 2 showed two cDNA products of ~500 and 1000 bp. Arrowhead indicates an ~500-bp band. e SMARCB1 cDNA structure (NM_003073.4). White boxes show protein coding exons, while dark gray boxes show new exonic regions created by the deletion that are not present in the wild-type protein. Dotted lines show SMARCB1 introns. Arched lines above and below break lines between boxes show connections of cDNA sequences. RT-PCR using primer pair 1 amplified only a wild-type fragment. RT-PCR with primer pair 2 produced longer and shorter products. The longer product had wild-type sequence from the end of exon 8, through intron 8 until the centromeric deletion breakpoint and sequence after the telomeric deletion breakpoint. The shorter product skipped intron 8 and extended beyond the telomeric deletion breakpoint. The longer product had a stop codon in intron 8, and the shorter product had a stop codon after the telomeric deletion breakpoint

SMARCE1

Only one previously reported missense variant (by our group) was found [31] (Table 1 and Fig. 3c). This variant was located in the high mobility group box domain (HMG domain, IPR009071) [31]. Missense variants in healthy populations of gnomAD were clustered in exons 10 and 11 where no functional domain was recognized.

ARID2 and DPF2

We found no variants of these genes in our cohort.

SOX11 and PHF6

We found four pathogenic variants (including one novel) in SOX11 in four patients, and one novel pathogenic PHF6 variant in one patient (Table 1 and Fig. 3d, e). One SOX11 variants and one PHF6 variant were novel. All SOX11 pathogenic variants found in this and other studies were missense variants within the high mobility group box domain (HMG domain, IPR009071) [5, 11]. The pathogenic missense variant found in PHF6 was located in the extended PHD domain (ePHD domain, IPR034732).

The other genes

Three CNVs involving SMARCA2 were found in our cohort. Two of them were reported previously [30]. A new case (CSS154) with SMARCA2 duplication is derived from balanced translocations in the mother. Her karyotype is presumed to be der(4)t(4;9)(q35.1;p21.3) (Table 1).

Discussion

Genetic variants in CSS were detected by next generation sequencing. To date, nine genes have been registered as CSS disease genes, and over 150 patients with pathogenic variants were reportedly diagnosed with CSS. In 182 newly recruited and 32 undefined patients from our previous cohort, 57 patients with pathogenic variants in various genes that cause other diseases were excluded from further analysis (Fig. 1a). In the remaining 131 newly recruited patients, 73 had pathogenic variants in CSS genes (55.7%). This detection rate is similar to that of previous studies (54.9~71 %) [4, 18, 35]. In the 32 previously undefined patients [18], we identified three pathogenic variants in ARID1B (one SNV in CSS033, and two CNVs in CSS039 and CSS072, see Table 1), two in SOX11 in CSS026 and CSS043, which have already been described elsewhere [5], and six in genes causing other diseases. In total we detected 78 patients (73 newly recruited and five previously undefined) with pathogenic variations. Forty-eight ARID1B, six ARID1A, seven SMARCA4, eight SMARCB1, three SMARCA2, one SMARCE1, four SOX11, and one PHF6 variants were found. No pathogenic variant was identified in ARID2 or DPF2. As expected, the most common variants in this cohort were in ARID1B. Of note, several pathogenic missense variants in PHF6 have been reported in Börjeson–Forssman–Lehmann syndrome (BFLS: OMIM 301900) [36,37,38]. Patients with a pathogenic PHF6 variant who were initially diagnosed as CSS in early childhood [4] were later diagnosed as BFLS [6]. As our patient was diagnosed before the age of 1 year, it remains to be seen how his phenotype will evolve and whether his diagnosis will remain CCS or be changed to BFLS later.

In contrast to various types of variants found in ARID1B and ARID1A such as missense, nonsense, splicing substitutions, small insertion/deletion variants and gross changes, SMARCA4, SMARCB1, and SMARCE1 variants maintain the overall protein sequence (almost all were missense variants or an in frame 3-bp deletion). As most missense variants in these genes are positioned within functional domains, they are expected to alter the wild-type protein function. For example, all missense variants in SMARCB1 were located within the SNF domain, which is a highly conserved and core component of this protein [39]. Interestingly, nonsense, truncating, or frameshift variants in SMARCB1 have only been found in preliminary stage tumors, including in malignancy or tumor predisposition syndrome [40], but never in CSS. Of note, we identified one frameshift variant, c.1052dup, p.Leu352Thrfs*9 and a deletion involving intron 8 and exon 9 (the last exon) of SMARCB1 and its downstream gene (der1-like domain family, member 3 (DERL3)) in two CSS patients (CSS235 and CSS076, respectively). Interestingly, both changes did not lead to nonsense-mediated mRNA decay because of newly created stop codons in the downstream sequence. Therefore these two exceptional changes in SMARCB1 may lead to altered protein function. The genomic deletion in CSS076 also involved DERL3, which encodes a Derlin family protein that localizes to the endoplasmic reticulum (ER) and plays a role in the degradation of misfolded glycoproteins in the ER [41]. Many LoF DERL3 variants are registered in gnomAD with pLI scores = 0.00; therefore, DERL3 is unlikely to be a haploinsufficient gene. Thus, we concluded that DERL3 deletion did not contribute to the CSS phenotype.

Pathogenic SMARCB1 variants in CSS were clustered in the last two exons (exon 8 and 9), while those in schwannoma occur mostly in the first three and last three exons. In contrast, truncating variants throughout the entire gene and gross deletions in SMARCB1 are associated with malignancy, as in rhabdoid tumor [40]. SMARCB1 deletion, even a partial deletion, has never been described in CSS [40]. In this study, we highlighted two novel SMARCB1 variants, both of which affect the SNF5 domain. Despite many studies of SMARCB1 deletions that result in incomplete recruitment of SWI/SNF subunits [39, 42, 43], no functional studies specifically targeting the SNF5 domain or C terminus of SMARCB1 have been reported. Together with the other pathogenic variants found in CSS that are confined to exons 8 and 9, these two novel variants highlight the importance of the SNF5 domain and C terminus in the pathogenicity of CSS.

In conclusion, we report a comprehensive genetic analysis of the largest CSS cohort ever assembled. We found pathogenic variants in 55.7% of newly recruited individuals and three CNVs in initially ‘unsolved’ individuals. Most of the pathogenic variants identified replicated previous findings of other cohorts, although many novel variants and two unique variants in SMARCB1 were also detected. However, over 40% of individuals with CSS remain genetically uncharacterized. Several studies of other Mendelian diseases achieved diagnostic success by using a combination of approaches, such as RNA sequencing [44] or whole genome sequencing [45, 46]. Our study can, therefore, be improved by further extensive analyses using other methods.