Introduction

Hereditary spastic paraplegias (HSP) are a clinically and genetically heterogeneous group of rare neurodegenerative disorders. HSP have a variable age at onset and are mainly characterized by progressive lower limb spasticity and weakness. These hallmarks are caused by degeneration of the motor axons that mostly affects the distal ends of the long central nervous system tracts.1 Their estimated prevalence ranges from 1–10/100 000 depending on the geographic localization2 and is 4.1/100 000 in Portugal.3

These disorders are historically divided on clinical grounds into pure and complex forms, according to the absence or presence of additional neurological and extraneurological features, such as neuropathy, cognitive impairment, cerebellar ataxia and epilepsy. Consequently, ancillary tests such as brain/spinal cord magnetic resonance imaging, electroneuromyography and others can show various abnormal patterns. Genetically, all classical modes of inheritance have been described, with 79 SPG-associated loci and more than 60 identified genes, a number that is constantly increasing.4, 5, 6, 7, 8, 9, 10 Moreover, at least 11 additional genes have also been associated with spastic paraplegia, some of them very recently.11, 12 Although a considerable number of genes have already been identified, most of the families worldwide have not been extensively screened so far and remain without a molecular diagnosis, as was the case of 68.9% families in Portugal, before this study. Indeed, among the HSP a considerable number of genes and variants are apparently orphan, affecting single families.

Identification of the responsible genes has a great impact on patients and their families as it reveals the cause of the disease, and may also be important to better understand how these variants arise (founder effects and susceptibility chromosomes, among others) and how they lead to specific phenotypes. This study aimed to provide a molecular diagnosis for the Portuguese families identified through a population-based survey3 using targeted next generation sequencing (NGS) to screen these patients for the presence of variants in 70 genes already associated with HSP or candidate genes, covering all classical modes of inheritance.

Materials and methods

Patients

In Portugal, 193 families with hereditary spastic paraplegia were identified in a national, population-based survey.3, 13 After exclusion of the most common genes in most of the families and isolated cases (ATL1, SPAST and REEP1 in families with dominant inheritance, and SPG11, ZFYVE26 and CYP7B1 in the recessive ones), there were still 68.9% (133/193) of the families without a molecular diagnosis, 62% (98/159) if we consider only the families with DNA available for testing. Taking that into consideration, 98 unsolved families (ie, with DNA available) were selected to be screened for variants in known genes using one patient per family.

Among these 98 Portuguese families, 38 showed a probable recessive inheritance, 44 a dominant inheritance and 16 were isolated cases. Pure and complex forms were present and age at onset in the whole cohort ranged from less than 1 year to 65 years. Written informed consent for genetic testing was obtained from all tested individuals or their respective legal guardian and the genetic analyses were approved by the ethical committees of ICBAS (Portugal) and of the Paris-Necker Hospital (France).

HSP panels

Two sequential custom NGS panels were developed at ICM, France, to search for variants in HSP-related genes (Supplementary Table S1). This strategy was favored to exome sequencing for ethical concerns regarding the risk of secondary or incidental findings in HSP diagnosis in agreement with our local ethics committees and for the cost effectiveness given the number of genes involved in HSP that could carry a mutation. The first panel covered 34 genes among which 31 were known to be responsible for dominant and recessive HSP forms and the other three (ALS2, SACS and SETX) were implicated in overlapping phenotypes.14, 15, 16 The 34 genes corresponded to 531 regions with a total length of 109 768 base pairs. In order to increase achieving a molecular diagnosis, a second panel was later designed that covered 70 genes. This added 30 newly identified genes5 to the previous panel as well as three genes (FBXO7, GJA1 and SAMHD1) causing overlapping phenotypes17, 18, 19 and three candidate genes that encode partners of known HSP proteins (AP5B1, AP5M1 and AP5S1). This panel covered a total length of 210 363 bases corresponding to 1001 regions. These two panels targeted exonic regions and at least 20 intronic bases at the exon-intron boundaries.

Genetic analysis was performed using a customized Roche/Nimblegen capture followed by NGS in the MiSeq apparatus (Illumina, San Diego, CA, USA). The procedure consisted of four major steps: (1) library preparation, where we used the Illumina TruSeq DNA LT Sample Preparation kit v2-Set A (Illumina) according to the manufacturer’s protocol for the first panel and the Kapa HTP Library Prep Kit Illumina (Roche, Basel, Switzerland) with its corresponding protocol for the second panel; (2) double capture using the Roche NimbleGen SeqCap EZ Reagent Kit and SeqCap EZ Library (Roche) using the SeqCap EZ Library SR User’s Guide protocol; (3) massive parallel sequencing using the Illumina MiSeq Benchtop Sequencer; and (4) data analysis using the CLC Bio-Genomics Workbench 6.5.1 software (https://www.qiagenbioinformatics.com/). After the alignment with the Homo sapiens (hg19) reference sequence, data were filtered taking into consideration that the variants had to: (i) be present in the target regions, (ii) cause a change at the protein level (nonsense, amino acid change or a splicing effect), (iii) be present in the local database with a frequency below 15% and, (iv) if known in the databases, its minor allele frequency had to be below 1% for autosomal recessive inheritance and 0.2% for autosomal dominant transmission. All filtered variants were further analyzed using Alamut v.2.9.0 software (Interactive Biosoftware, La Rochelle, France) for functional effect prediction with SpliceSiteFinder, MaxEntScan, NNSPLICE, GeneSplicer, Human Splicing finder, Polyphen-2, SIFT, MutationTaster, Align GVGD and UMD-Predictor.

Twelve patients per experiment were sequenced using the first panel, while 24 were analyzed simultaneously using the second panel. The first 25 Portuguese patients tested were sequenced using the first panel and the remaining 73 patients were screened with the second panel.

In order to detect rearrangements, we analyzed the coverage of all the regions with an in-house algorithm based on the analysis of the coverage. In the genes where a suggestive alteration in dosage was detected, multiplex ligation-dependent probe amplification or quantitative real-time PCR was applied to confirm the deletion/duplication.

All the variants that were not excluded after filtering and segregation analysis (whenever possible) were submitted to ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/). ClinVar accession numbers are SCV000574442–SCV000574511.

Sanger sequencing

We confirmed the presence of the variants identified by NGS checking, by Sanger sequencing, the region where the variant was present, including additional family members when DNA was available. Primers were designed using Primer3Plus software (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi). The PCR amplification was done using the DreamTaq DNA Polymerase Kit (ThermoFisher Scientific, Waltham, MA, USA) or when necessary, AccuPrime GC-Rich DNA polymerase (Invitrogen, Carlsbad, CA, MA, USA), followed by sequencing by GATC BioTech using BigDye chemistry in an ABI3730 sequencer (Applied Biosystems, Foster City, CA, USA). Sequence analysis was performed using SeqScape 2.6 software (Applied Biosystems).

Multiplex ligation-dependent probe amplification

In order to confirm the presence of large gene rearrangements in the SPG11 gene we performed multiplex ligation-dependent probe amplification using the SALSA MLPA kit P306 (MRC Holland, Amsterdam, Netherlands) according to manufacturer’s instructions and analyzed the resulting fragments on an ABI 3130xl Genetic Analyzer using 500-LIZ (Applied Biosystems), as a size standard, and GeneMarker v1.90 (SoftGenetics, State College, PA, USA).

Results

Study design

A total of 98 index cases of Portuguese families without molecular diagnosis were screened for variants using either the first or the second custom sequencing panel. The mean coverage varied from 126 to 801 reads: 419 reads per base with 94% of the regions with a coverage ≥30 × with the first panel and 362 reads with 97% of the regions with at least 30 × of coverage with the second panel. All the variants considered relevant (rare, predicted deleterious, expected to have an impact given the suspected mode of inheritance) were confirmed by Sanger sequencing and segregation analysis was performed in all the available family members. The regions with less than 30-fold coverage were screened by Sanger sequencing in autosomal recessive cases in which only one heterozygous potentially disease-causing variant was identified. Relevant insufficiently covered exons in patients with a compatible phenotype were also sequenced by Sanger (ie, exon 1 of SPG7 in patients with cerebellar atrophy and/or optic atrophy). An in-house index of coverage was calculated for each exon and for each patient to search for genomic rearrangements.

Genetic characterization

A molecular diagnosis of HSP was confirmed in 20.4% (20/98) of the cases. The coverage analysis was responsible for the identification of one large homozygous deletion encompassing exons 12 to 14 in SPG11 that was confirmed by multiplex ligation-dependent probe amplification (Table 1, Supplementary Figure S1) and was considered to affect spatacsin function. Additionally, after filtering of the NGS data, we found 139 nucleotide variants present in 74 families with minor allele frequency below 1.5% that potentially affected protein function according to at least one prediction software. These variants were divided in: (i) Disease-causing variants (17 variants in 14 families), (ii) Likely disease-causing variants (6 variants in 5 families), (iii) variants of unknown significance – VUS (52 variants in 37 families) and (iv) Excluded variants (64 variants in 45 families) by several criteria detailed in Figure 1. In the remaining 24 families, no disease-causing variants were found after filtering.

Table 1 Disease-causing and likely disease-causing variants found in 20 families
Figure 1
figure 1

Schematic illustration of the Portuguese cohort with the criteria applied for filtering and classification of the variants identified in the 98 families.

We considered 24 (of the 140) variants to be affecting protein function in 20 families (Table 1, Figure 2 and Supplementary Figure S1). Most (n=18) were classified as disease-causing variants because they were already reported as disease-causing variants, caused an early stop codon (nonsense and frameshift) or were large gene rearrangements. The other ones (n=6) were considered as likely disease-causing because we did not perform functional studies to prove it. All of them had low frequency in databases and a predicted deleterious effect (Table 1) and additionally, in 16 of these families, we could demonstrate their segregation with the disease (Figure 2 and Supplementary Figure S1), which included all the families with likely disease-causing variants.

Figure 2
figure 2

Pedigree with segregation analysis, electropherogram and conservation of the seven novel missense variants. A black circles and squares indicate affected individuals with HSP; an asterisk in a symbol indicates family members with available DNA. Electropherogram with the position of the missense variant boxed. Conservation of the region of the altered amino acid (boxed).

Among the 52 VUS (Table 2), three are illustrative of the difficulties we faced in the interpretation of their biological relevance (Supplementary Figure S2): a homozygous missense variant in a gene causing a recessive form where family members were not available for segregation analysis (family SR26); a variant in a gene causing a dominant form, of interest only if we consider incomplete penetrance (family SR49) and a variant in the homozygous state in a gene causing a dominant form that segregated with the disease (family S88). In addition, in families SR65 and S86 we found compound heterozygous variants in two different genes known to interact at the protein level (Supplementary Figure S2). In both cases, the variants were found in two different members or interactors of the adaptor protein complex 5, but neither their cosegregation nor their effect at the cellular level could be analyzed, which would represent a prerequisite to prove these potential cases of digenism.

Table 2 Variants of unknown significance (VUS) found in 37 families that remained without a molecular diagnosis

In this study, we report 10 new disease-causing variants among the 24 variants affecting protein function, of which seven are new missense variants (Figure 2), one is a nonsense variant, one a frameshift variant and the last is the deletion of three consecutive exons in SPG11 (Table 1).

The yield of mutational results varied according to inheritance mode (Figure 3). We found the disease-causing variants in nine of the 38 recessive families (23.7%) and in six of the 44 families with dominant transmission (13.6%). Of note, one family with a homozygous SPG7 variant was initially considered as dominant (Family S26, Supplementary Figure S1) because of a suspected family history of the disease in the father, who however was not examined and not tested. Five of the 16 isolated cases were also explained at the molecular level (31.3%), including one confirmed de novo variant in KIF1A (Family SR98, Figure 2). Segregation analysis could not be performed in three families, but all of them had variants already described as disease-causing in the literature.

Figure 3
figure 3

Distribution of the number of families in each class of variants identified. (a) Recessive families, (b) Dominant families, (c) Isolated cases.

Clinical characterization

The phenotype of the patients was mostly similar to that of previously reported series. A detailed phenotypic description of the families is presented in Table 3. For example, in the two families carrying SPG11 variants, the clinical picture included early age at onset and a complex phenotype with cognitive impairment in both families and neuropathy in one of them. Magnetic resonance imaging was not available to check for the presence of a thin corpus callosum. In another family with a homozygous ZFYVE26 variant and an SPG11-like phenotype, thin corpus callosum was overt. Similarly to published cases, a pure form of the disease was observed in (i) the case with a de novo SPAST variant (SR4, Supplementary Figure S1), (ii) the patient presenting a variant in REEP1 (SR99), (iii) the two families with CYP7B1 variants (families S93 and SR97, Supplementary Figure S1) and the patient with an NIPA1 variant (S61, Supplementary Figure S1). On the contrary, in our KIF5A families (S41, S31, S27, S58, Figure 2 and Supplementary Figure S1), pure and complex forms were found, in agreement with Goizet and colleagues.28 We also identified three families with SPG7 variants (Supplementary Figure S1) with late-onset HSP and a complex phenotype: families SR75 and S26 also presenting ataxia and the sporadic case CI2 that has minor cerebellar atrophy at brain magnetic resonance imaging which confirms that the involvement of the cerebellum is part of the SPG7 core phenotype. In family SR84 (Supplementary Figure S1), presenting two known variants in the GBA2 gene, one affecting the catalytic site and one destabilizing the protein by disturbance of the charge balance,33 patients had an onset in the expected range (infancy or childhood) and a complex phenotype with cerebellar signs and ataxic gait (Table 3).

Table 3 Clinical characteristics of the 20 families with an established molecular diagnosis

Interestingly, we confirmed the higher occurrence of cases with heterozygous vs homozygous/compound heterozygous (SPG30) variants in KIF1A. One patient in the SR98 family (Figure 2) carried a novel heterozygous de novo KIF1A variant located in the motor domain and presented a complex phenotype with mental retardation and neuropathy and an early age at onset, in the first year of life.

On the other hand, the clinical features of some of our patients extended the phenotypic spectrum of some clinico-genetic entities. This was the case of the family with the SACS variants (SR6, Figure 2 and Table 3), in which patients presented with spasticity and peripheral neuropathy but without ataxia, as in another single case recently reported in an Italian family.15 In the family with a C19orf12 variant (family SR88, Supplementary Figure S1), the phenotype was similar to SPG43 but also presenting cerebellar ataxia, mental retardation and dementia. Interestingly, the same variant was described in patients with neurodegeneration with brain iron accumulation type 4 (NBIA4),20 extending the spectrum of phenotypes associated with this variant. Brain imaging data were not available for this patient, so we could not check for the presence of brain iron accumulation. Lastly, patients of the S75 family (Figure 2) showing ERLIN2 variants had an early onset, at 8 and 19 years old, however with a pure phenotype, contrary to the complex phenotype previously described.34

Discussion

Genetic diagnostic yield

The identification of the genetic cause for HSP in only 20.4% of this cohort shows that there is still a large set of genes responsible for spastic paraplegia to be uncovered or that novel inheritance modes, as in the case of KIF1A,35 should be taken into account. Although most of these families had been previously screened for SPG3, SPG4 and SPG31 in the families with dominant forms36 and for SPG11, SPG15 and SPG5 in the families with recessive transmission (unpublished data), we expected a higher frequency of diagnosed families since we were testing almost all the known genetic causes of HSP, including all the genes identified by Novarino and colleagues.37 This frequency is very similar to that found in a previous study covering far fewer genes in patients where only SPAST (SPG4) variants were excluded.38 On the other hand, our results are also very similar and comparable to a study where exome sequencing identified disease-causing variants in eight of 48 HSP families (16.6%) and potentially disease-causing variants in another eight families where previous screening for the most likely genes had been performed.39 Taking into account the full cohort, we have found the responsible gene in 42% of the Portuguese families (81/193), or in 51% if we only take into account the families with available DNA (81/159), a frequency similar to a Greek study where only 16 HSP genes were screened.40 The most frequently mutated genes are (Figure 4): SPAST (SPG4) with a frequency of 18% (28/159) and SPG11 with a frequency of 13% (20/159). Interestingly, we found a high frequency of KIF5A variants, present in four out of 44 families with dominant inheritance (9.1%) and in 4 of the 159 families (2.5%) becoming the third most frequently mutated gene in dominant forms. This high occurrence was also found in the Greek population where it was the second cause of disease in families with dominant transmission40 and in the study by Warrenburg et al where it was found as the most frequently mutated gene (after exclusion of the most likely genes).39 Most of the other genes tested in our cohort have accounted for single families until now, as is the case of REEP2 (SPG72)41 and in the case of a family that was negative in this screening but after exome analysis proved to segregate ALDH18A1 (SPG9) variants.7 In line with our results showing that most of the new HSP genes are rarely involved, the yield of positive results obtained with the two panels, the second containing 30 additional HSP genes, was very similar. We cannot exclude that a variant in the 30 additional genes added to the second panel may explain some of the 17 cases without diagnosis (of the 25) tested with the first panel. This is however very unlikely given (1) the low relative frequencies of these genes,5, 37 (2) their involvement in recessive forms while 10 of 17 of our patients were associated with dominant inheritance of the disease, (3) and because seven of these cases processed under exome sequencing more recently do not carry mutations in the genes present in the second panel (unpublished data).

Figure 4
figure 4

Distribution of the disease-causing variants found in 81 families from the full cohort of Portuguese cases.# (a) Recessive families, (b) Dominant families, (c) Isolated cases. *Case described in this paper of a family classified with dominant inheritance but with a homozygous SPG7 variant. # Include mutations found in previously screened genes.7, 36, 41

Our relatively high percentage of diagnosis among the isolated cases (5 out of 16) is also an interesting finding that shows how important it is to test these patients that frequently are not considered in genetic analyses due to the absence of additional family members to confirm diagnosis.

This study also demonstrates that a panel strategy is also capable of detecting large rearrangements, as proved with the identification of a large deletion, although at the homozygous state, encompassing exons 12–14 in SPG11 in family SR17. This was possible by the analysis of the coverage, where we compared the mean coverage of all patients tested with each other taking into account the average coverage of each one.

Towards an unbiased approach

Our results also highlight the importance of testing all families for the same set of genes, regardless the presumed inheritance mode. This is important as sometimes the transmission mode is unclear in the pedigree because of its size or censured consanguinity, and because there is an increasing number of HSP genes associated with different modes of transmission (eg, BICD2, KIF1C and KIF1A), sometimes associated with a different disease presentation (eg, SPG7). We have some examples in our cohort of families with an apparent recessive mode of transmission (including sporadic cases) in whom we found a variant in a gene associated with a dominant form. This can happen when the disease is caused by a ‘de novo’ variant, as in families SR4 (Supplementary Figure S1) and SR98 (Figure 2), and also in late-onset families in which the disease could have been missed in the older generations. The reverse is also true since we report one case with a homozygous variant in SPG7, usually associated with a recessive form, found in a family classified as dominant (S26, Supplementary Figure S1). This could likely be explained by censured consanguinity or by the presence of older individuals reported as affected but with a different neurological condition. Family SR98, with a variant in the motor domain of KIF1A, is also an example of a gene that can have recessive or dominant inheritance depending on the localization of the variant. It was recently shown by Lee and colleagues that de novo heterozygous variants in the motor domain of KIF1A can be disease-causing35 and that this type of variants are more common than the homozygous ones, which has been later confirmed in several studies39, 42, 43 including our own. In family SR99 (Figure 2), we found a new variant (p.Ala27Pro) in REEP1 that, despite being present in the unaffected father, is likely the disease-causing variant because incomplete penetrance was already described for variants in this gene and because the same amino acid was affected in another HSP case.44

Need for international collaborative efforts

Our results also highlight the difficulty of reaching a conclusion as to the causative nature of the variants in 37% of the cases. Large screening studies, like ours, result in a large quantity of data that has to be interpreted, a task that is not always easy even with all the available software. This is especially difficult in HSP since the disease can have different inheritance modes. An example of the difficulty in interpretation without functional data is illustrated by family SR6 (Figure 2), where we found two new variants, within six bases of distance, that segregate together, and we were not able to demonstrate which one was deleterious, although the predictions on their effect on protein function is stronger in one of them.

It is highly likely that in our 37 families with VUS, some of these VUS could be variants with new inheritance forms for already known genes. This could be the case in family SR88, where we found a KIF5A variant present at the homozygous state in patients in two generations of a family with no reported consanguinity (Supplementary Figure S2). This might represent the first autosomal recessive case due to KIF5A variants, but this cannot be ascertained in the absence of functional evidence or additional cases segregating in autosomal recessive pedigrees. We also found some variants in candidate genes selected by function (interactors) for which we were not able to conclude as to their causative effect due to the insufficient number of family members to validate their co-transmission (families SR65 and S86, Supplementary Figure S2), and the absence of cell lines to demonstrate the disruption of the AP5 complex by biochemical experiments.45 We observed a variant in a gene associated with dominant transmission, BSCL2, that is in a conserved region, is predicted to affect function by five software and is in the same domain as the two published variants (with reports of incomplete penetrance) but, since it is present in the non-affected mother and there are no other known carriers, we were not able to conclude as to the causative effect of this variant (case SR49, Supplementary Figure S2). Absence of family members to confirm segregation of a homozygous variant in the WDR48 was also an issue in family SR26 (Supplementary Figure S2). This family presents a complex phenotype with motor neuropathy that is in accordance with the reported clinical features,37 although with a later age at onset (20 and 22 years old vs 1-year-old), and only functional studies will allow to prove its causative role. Therefore, the availability of relatives to test segregation still remains crucial to conclude on a causative variant, and diagnosis in HSP should not rely only on the clinical/genealogical presentation because the clinical phenotype and full mutational spectrum associated with each gene are continuously expanding.5

In conclusion, this comprehensive study of a large cohort of 98 families of homogeneous origin (only Portuguese families) allowed us to conclude on the frequency of all the less common HSP genes. Also, it highlights the fact that although a high number of genes have been already identified in this condition, a large percentage of cases remain without molecular diagnosis, some of them due to the absence of extensive screening. Gene panel strategy is a cost-effective way to screen HSP families and probably a small panel like ours with 34 genes could give a diagnosis to almost half of the families if used as a first approach. The use of an extensive panel of genes like our second panel, including the more recent and less frequently mutated genes such as VCP39, 46 would only slightly increase the number of diagnosis. This suggests that performing exome sequencing in the families without diagnosis in the most frequent genes (34) is probably a good strategy in HSP.