23andMe: Genome-wide association studies included participants from 23andMe, Inc. who self-reported stuttering status through a questionnaire. Cases included participants who answered “yes” (99,776 individuals) to the question: “Have you ever had a stammer or stutter?” Controls (1,023,243 individuals) included participants who answered “no” to this same question (see Demographic Table 1). As is a common standard in population-based studies investigating stuttering, our study relies on self-report (see Bloodstein and Ratner (2008) in which all but two of the reviewed papers were based on retrospective questionnaire or interview-style surveys).2,78 All individuals included in the analyses provided informed consent and answered surveys online according to 23andMe human subject protocol, which was reviewed and approved by Ethical & Independent Review Services, a private institutional review board (http://www.eandireview.com).
ISP: Polygenic model testing was performed using individuals with developmental stuttering acquired through the International Stuttering Project. Stuttering status in the ISP cohort was confirmed by speech-language pathologists with expertise in stuttering and fluency disorders. See Polikowsky et al. 2022 HGG Advances31 for a detailed description of this cohort, and genotyping information.
The National Longitudinal Study of Adolescent to Adult Health (Add Health): Polygenic model testing was performed using individuals who self-reported stuttering via an Add Health questionnaire. Add Health represents an ongoing, nationally representative, longitudinal study of the social, behavioral, and biological factors influencing health and developmental trajectories from early adolescence into adulthood.33 Add Health collected demographic and health survey data as well as in-home physical and biological data from participants. See Harris et al. 2019 Int J Epidemiol for genotyping information. For our study, self-reported stuttering cases were defined as participants who at one point answered ‘‘yes’’ to the following survey question: ‘‘Do you have a problem with stuttering or stammering?’’ All control individuals answered ‘‘no’’ to the above question. Self-reported race/ethnicity was used to group participants.
Statistical Analysis
Eight ancestries- and sex-specific genome-wide association analyses were performed to determine variant association with stuttering risk (Table S1). Each performed GWAS used a logistic regression that assumed an additive model for allelic effects:
Stuttering status ~ age + pc.0 + pc.1 + pc.2 + pc.3 + pc.4 + v2_platform + v3_0_platform + v3_1_platform + v4_platform + genotype
Reported p-values were calculated using a computed likelihood ratio test. Principal components for each logistic regression model were derived independently for each ancestries, using ~ 65,000 high quality genotyped variants present across all five genotyping platforms. Principal components were computed on a subset of participants randomly sampled across all the genotyping platforms (137K, 102K, 1000K, and 360K participants were used for AA, EAA, EA and AdA, respectively). PC scores for participants not included in the analysis were obtained by projection, combining the eigenvectors of the analysis and the SNP weights. Summary statistics were reported for imputed autosomal variants that were successfully imputed across all platforms (v2v3v4v5) and reached the following quality control thresholds: average rsq > 0.5, minimum rsq > 0.3, and batch check p-value > 1x10− 50.
We aggregated association summary statistics across ancestries-specific association studies using multi-ancestries meta-regression, as implemented in MR-MEGA.35 Analyses were performed for female-only GWAS, male-only GWAS, and a sex-combined meta-analysis. We included three axes of genetic variation as covariates in the sex-combined meta-analysis, and, due to the lower number of contributing analyses in the female- and male-only meta-analyses limiting the number of possible axes of genetic variation, included one axis as a covariate in the sex-specific analyses.
Annotation
The sentinel variant for each genome-wide significant locus was reported for each ancestries and sex-specific study. The genome-wide significance was defined as P < 5x10− 8 79 this threshold applies a Bonferroni correction where α = 0.05 and assumes there are approximately 1 million independent (i.e. not in linkage disequilibrium) common signals across the human genome. Annotated gene(s) for each locus included the predicted functional gene(s) for each loci (when available) according to Open Targets “Variant-to-gene (V2G) pipeline”, which integrates evidence from molecular quantitative trait loci, chromatin interactions, in silico functional predictions from Ensembl, and distance between the variant and gene canonical transcription start site.80,81 Loci were defined according to independent linkage disequilibrium blocks identified in 1000 Genomes reference using the matched ancestries reference. Reported sentinel variants represent the variant with the smallest p-value within each associated region. All reported positional coordinates (chromosome and base-pair locations) refer to human genome reference build 37. We also looked for replication of any genome-wide significant signal among the other independent ancestries- and sex-specific GWAS.
Credible Sets
95% credible sets were established to determine if genome-wide significant hits were unique across all sex- and ancestries- specific GWAS and trans-ancestries meta- and mega- GWAS. The potential causal variants for SNPs within significant regions was based on approximate Bayes factor82 assuming a prior variance of .1, and using the method from Maller et al.83 to define these sets. A hit was determined to be unique if there were no overlaps in SNPs with other credible sets from the same chromosome.
Variant-effect size concordance analysis
We compared summary statistics from each ancestries- and sex-specific to one another to determine whether the concordance rate between the two summary statistics was high (Table S6). The concordance rate was calculated by the proportion of overlapping LD blocks that had the same direction of effect over the total variants present in both GWAS analyses with p-values below 0.005 threshold. See Table S6 for details regarding the number of variants used in each concordance combination.
SNP Heritability and Partitioned Heritability
Genome-wide SNP based heritability (h2) was calculated using summary statistics resulting from the EA, AA, and AdA GWAS results using the LD Score regression software.37 We used LDSC to estimate liability scaled h2 assuming a 10% population prevalence. LD maps were estimated from the 1000 Genomes phase 3 European, Admixed American, and African populations for the respective ancestries. Sample size prevented h2 calculations in the East Asian cohort since estimates are likely to be unreliable (sample size below range of 5,000–10,000).84
To better understand the types of variation that contribute most to stuttering, we partitioned SNP heritability of EA-male, EA-female and EA-sex-combined (See Supplemental Methods for sex-combined meta-analysis details) stuttering using stratified LDSC.85 LD scores, regression weights, and allele frequencies from European ancestries were obtained from: https://alkesgroup.broadinstitute.org/LDSCORE. We performed 80 different tests, resulting in a p-value = 6.25 x 10− 4 Bonferroni-corrected significance level globally. Partitioning was performed for 52 baseline annotations as described by Finucane et al.85. Enrichment was considered significant if p-value < 9.6 x 10− 4, derived by Bonferroni correction (52 gene-sets).
Next, we estimated enrichments for cell-type-specific and tissue-specific heritability86 on EA-male, EA-female and EA-sex-combined stuttering, while controlling for the baseline models. Brain cell types used to estimate enrichment of heritability consisted of neurons, astrocytes, and oligodendrocytes using data from Cahoy et al.87. Enrichments were considered significant if p-value < 0.017, derived by Bonferroni correction (3 gene-sets). Gene expression data (computed from GTEx88 database) used to estimate enrichment of heritability consisted of eight brain regions with empirical evidence relating to stuttering.38–41, 43–45 Enrichments were considered significant if p-value < 6.25 x 103, derived by Bonferroni correction (8 gene-sets). Lastly, 20 chromatin annotations, derived from Roadmap Epigenomics consortium89 and EN-TEx86,90, with epigenetic marks of me1, me2, me3, and ac, from four brain regions previously associated with stuttering,38,40,41, 43–45,67 were used to estimate enrichment of heritability with stuttering. These marks were considered significant if p-value < 2.5 x 10− 3, Bonferroni adjusted (20 gene-sets).
Literature Replications
Gene replication analysis performed using methods detailed in Polikowsky et. al.31; however, previously calculated effective number of tests was then multiplied by eight, since we looked for replication of signals across our eight independent sex- and ancestries- specific GWAS. As such, the effective number of tests used for our Bonferroni correction represented the number of independent tag SNPs in each gene with pairwise r2 < 0.4 multiplied by eight. Gene replication results were Bonferroni corrected for the effective number of tests in each gene and the variant with the minimum p-value within each gene was reported.
SNP-based replications looked for replication of the top hits reported in Shaw et al.32 and Polikowsky et. al.31 across all eight sex- and ancestries-specific studies. Bonferroni multiple-test correction applied (p-value = 0.05/8 tests or significance: p-value < 6.25x10− 3).
Stuttering polygenic risk-score model development
Polygenic risk-score models were trained using GWAS results from each sex- and ancestries-stratified in PRScs,91 using a continuous shrinkage prior to adjust individual SNP weight for LD and variant significance. Default auto-phi parameters were used in both the male and female derived models and were not optimized to prevent overfitting, with LD reference panels constructed using 1KG phase 3 EUR reference for EA, and AFR for AA. Each model was applied to both the international stuttering project (ISP) sample31 and Add Health sample,33 matched according to ancestries and stratified by sex. The ISP testing set included 651 EA male stuttering cases and 4,264 sex-matched controls were included, as well as 242 EA female cases and 1,788 female-controls; 48 AA male stuttering cases and 308 sex-matched controls were included as well as 16 AA female stuttering cases and 90 sex-matched controls. The Add Health testing set included 352 EA male stuttering cases and 3,104 sex-matched controls; 236 EA female cases and 3,517 female-controls; 117 AA male stuttering cases and 847 sex-matched controls were included as well as 107 AA female stuttering cases and 1,101 sex-matched controls.
Genetic datasets were scored using PLINK 1v9.92 Genetic liability scores were z-score normalized. Liability score distributions between cases and controls were compared via student’s two-sample t-test.
Genetic Correlation
To assess common underlying genetic architecture between stuttering and various other comorbid traits, we performed a genetic correlation analysis, comparing the EA-male and EA-female GWAS results to sex-specific summary statistics from 17 traits with available sex-specific GWAS results obtained from http://www.nealelab.is/uk-biobank/ (see Table S14). Genetic correlations were also performed with EA-male, EA-female and EA-sex-combined with non-sex-stratified summary statistics of beat synchronization (see Table S14). These traits were selected to include phenotypes previously identified as comorbidities with stuttering.13, 46–51 Each binary trait needed a case size > 1000, due to these constraints genetic correlations were only performed in the European-specific GWAS. All genetic correlation estimates were calculated between the European stuttering GWAS results (male and female-specific for 17 sex-stratified traits, and male, female, and sex-combined for beat synchronization) through LDSC.36,37
Mendelian Randomization
We performed both Egger and weighted median Mendelian randomization analysis using the MendelianRandomization R package.93 We performed a sex-specific Mendelian randomization analysis for each of 18 traits with prior evidence of association with stuttering in the literature13, 46–51 (hearing loss, asthma, dermatitis/eczema, hayfever/allergic rhinitis, sleep duration, daytime sleepiness, chronotype, recent thoughts of suicide or self-harm/suicide ideation, depression, anxiety, epilepsy, Attention-Deficit/Hyperactivity Disorder, alcohol dependency, alcohol consumption frequency, walking pace, body mass index, testosterone, and beat synchronization obtained from http://www.nealelab.is/uk-biobank/, PGC + iPSYCH data,94 or 23andMe,Inc.56) included in our genetic correlation analysis to determine if the genetic risk for any trait was either causally related or pleotropic to stuttering in our EA-male and EA-female GWAS. In addition to EA-male and EA-female, we performed Mendelian randomization with our EA-sex-combined GWAS with our sex-combined trait: beat synchronization. We filtered input variants for each included trait to only include variants that 1) were included in both the trait and stuttering GWAS and 2) variant p-value in each GWAS < 5x10− 6, and the independent instrumental SNPs were selected based on linkage disequilibrium from 1000 Genome European (LD pruning with 1000 kb windows, 1 SNPs each step, and LD < 0.2 by PLINK). Analysis details and results annotated in Table S15.
Gene module Enrichment
We performed an enrichment test for gene modules using our top identified genes associated with variant signals in either the EA-male or EA-female GWAS to determine if any sets of highly correlated genes (gene modules) were associated with stuttering risk (See Supplemental Methods). Top associated genes were determined for all variants with a p-value < 5x10− 6 using Open Targets Genetics V2G pipeline.80,81 Gene co-expression networks comprised groups of functionally related genes or ‘‘modules’’ Gerring et al.95 identified from GTEx v.7 tissue gene expression data. Module enrichments were reported for any gene tissue-specific analysis with an FDR adjusted p-value < 0.05 among any of the 49 available GTEx tissues. We performed a competitive gene pathway analysis for reported module enrichments using g:Profiler and subsequently annotated the outputted biological pathways (Tables S20-24).
Colocalization
We performed a Bayesian colocalization analysis between our EA-male and EA-female genome-wide significant hits and tissue-specific eQTL signals from GTEx v.8 data88 using fast enrichment aided colocalization analysis (See Supplemental Methods).96,97 We looked for colocalization solely within regions with a variant identified as a top hit (Tables 1–3). Evaluated regions included all sentinel variants in either the EA-male, EA-female, AA-male, AdA-male, or AdA-female GWAS as well as any other variant found in the same LD block. LD blocks were defined according to European-based LD calculated from 1000 Genomes reference.98 Colocalization analyses were tissue-specific and included all tissues available in GTEx v.8. We reported the results of any colocalization signal with a regional colocalization probability (RCP) (i.e., the probability that one of two SNPs in an LD block is responsible for a genuine association) > 0.05 (Table S25).
GWAS Catalog
After filtering the GWAS Catalog99 (Release Date: 2022-21-12) to contain only genome-wide significant loci (p-value < 5.0 x 10− 8), 20 of 30 unique genes from sex-specific and meta- and mega- analyses were successfully found in the GWAS Catalog by querying the listed mapped genes. From these queried findings, GWAS Catalog associated traits were binned into 20 trait categories (Table S19a). Our genome-wide significant hits, and the associated GWAS Catalog findings can be found in (Table S19b). Number of unique genes per category can be seen within Fig. 5.