Main

Soil-transmitted helminths, including whipworm (Trichuris), hookworms (Necator and Ancylostoma) and the large roundworm (Ascaris), are among the most prevalent and devastating parasites of humans globally and predominate in impoverished nations1. Trichuris infects 1 billion people, and chronic infection of high intensity can lead to typhlitis, colitis, chronic dysentery and malnutrition through malabsorption as well as reduced physical and cognitive development2. Consequently, trichuriasis, which disproportionately affects children, has an estimated global burden of 1 million–6.5 million disability-adjusted life years, exceeding that of schistosomiasis, trachomiasis, trypanosomiasis or leishmaniasis1. Despite this, Trichuris species are classified by the World Health Organization as neglected parasites in urgent need of improved control3.

Contrasting with the substantial burden of trichuriasis and other neglected helminths is the observation that human populations in endemic countries tend to suffer from substantially fewer immunopathological diseases4, which are common and increasingly prevalent5 in countries in which exposure to pathogens is limited. These observations have inspired the 'hygiene hypothesis'6, which proposes that a lack of exposure of humans to common pathogens impairs immune function and leads to increased autoimmune disease. This hypothesis is supported by clinical data, with routine deworming positively7 and early-childhood helminth infection negatively8 correlating with autoimmune disorders. Recent studies have shown that porcine Trichuris (T. suis) administered to humans suffering from IBD (including Crohn's disease and ulcerative colitis) can reduce clinical symptoms9,10. Similar observations have been made in patients with multiple sclerosis11. Although helminths can alter immune responses in their hosts via a variety of excretory-secretory (ES) molecules12, the specific interactions between T. suis and its host remain unclear. By sequencing the T. suis genome and transcriptomes (mRNAs and small RNAs), we provide deep insights into the molecular biology of this parasite and its modulation of host immune responses. These data provide a solid basis for exploring human trichuriasis, developing new anti-parasitic drugs and elucidating how helminths suppress autoimmune disorders.

Results

Sequencing, assembly and synteny

We sequenced the genomes of single adult female and male T. suis at 140-fold coverage, producing draft assemblies of 76 and 81 Mb, respectively (Table 1, Supplementary Figs. 1 and 2 and Supplementary Tables 1 and 2). Matches to conserved eukaryotic genes indicated that each assembly is 96% complete, with minimal redundancy (Supplementary Table 3). Alignment of these assemblies showed high similarity, with 68 Mb aligning as direct one-to-one matches in blocks of a mean length of 2.5 kb. Overall, sequence identity was 99.2% (38,854 SNPs). Despite the reported XX and XY karyotypes for female and male Trichuris, respectively13, we found no evidence for a Y chromosome among the male-specific scaffolds, suggesting that this chromosome contains largely repetitive sequences common to both sexes; this finding is consistent with observations made of the T. suis karyotype, suggesting that the sex chromosomes were the smallest chromosomal pair and were morphologically very similar in both sexes13. Repetitive sequences comprise 32% of the genome, including 8% DNA transposable elements, 2.9% long tandem repeats and 3.3% retrotransposons (Supplementary Table 4). Each genome encoded 1,000 transfer RNA (tRNA) genes, with copy numbers reflecting codon usage in protein-encoding regions (Supplementary Fig. 3 and Supplementary Tables 5 and 6).

Table 1 Features of the scaffolded assembly of the adult male and female T. suis genomes

Protein-encoding gene set

The female and male T. suis genomes encode at least 14,470 and 14,781 protein-encoding genes, respectively, representing 70% of each genome, including introns and exons. We identified 14,356 and 14,315 female and male genes with an ortholog or homolog in the opposite sex, with 10,403 genes being defined as unambiguous one-to-one orthologs. Evidence for sex-specific genes was limited, with just one and 41 supported as female and male specific, respectively (see Supplementary Note). Of these sex-specific genes, only three male genes have a predicted function, having homology to C. elegans frk-1 (encoding a receptor tyrosine-kinase), gpc-1 (encoding a G protein-coupled receptor (GPCR)) and his-66 (encoding a histone protein), respectively. The sex-specific genes show no clustering among scaffolds, providing little evidence for their association with the sex chromosomes. Most of the remaining differences in the genes of the two genders relate to a higher copy number of some genes in the male. From both assemblies, we defined a unified set of 14,820 genes for T. suis (Table 1 and Supplementary Table 7), with 12,910 (87.1%) supported by high-throughput RNA sequencing (RNA-seq) data. The majority (59.8%) of these genes have homologs (BLASTp cut-off: 1 × 10−5) in other nematodes, including 6,286 (42.4%), 6,340 (42.7%), 6,149 (41.5%) and 8,480 (57.2%) in Ascaris suum14, Brugia malayi15, Caenorhabditis elegans16 and Trichinella spiralis17, respectively (Fig. 1). Functions were assigned to 9,342 (63.0%) protein-encoding genes (Supplementary Tables 8, 9, 10, 11, 12, 13). Focusing on key functional or druggable proteins14, we predicted 653 peptidases and 288 peptidase inhibitors. Peptidase classes S1 (116) and S8 (42) are expanded in T. suis compared with those represented in other nematode genomes14,15,16,17. The T. suis genome also encodes 269 phosphatases and 232 kinases. We identified a large complement of receptors and transporters18; these molecules include 228 GPCRs, as well as 1,962 channel, pore and transporter proteins. Among the last group are 133 peroxisomal protein importers, more than in A. suum (n = 74) (ref. 14), which suggests a greater importance of fatty-acid digestion and metabolism in T. suis.

Figure 1
figure 1

Homologs shared between T. suis (class Enoplea, order Trichocephalida) and related nematode species.

We predicted 618 canonical ES proteins in adult T. suis (Supplementary Tables 14 and 15), including 165 proteases, many of which might have a role in disrupting intestinal epithelial cells in the host19,20 and in the formation of the syncytial tunnel around the Trichuris stichosome21. Notable among these molecules are 33 chymotrypsin-like serine proteases, which have key roles in helminths associated with host invasion22, immunosuppression23 and tissue destruction24. In addition to proteases, non–membrane-bound transporters comprise a major component of the secretome. These transporters include 41 pore-forming toxins (porins), 25 of which have homology to the Trichuris trichiura porin TT47, which induces ion-conducting pores in planar lipid bilayers and assists in the formation of the syncytial tunnel in the intestinal epithelium25. Helminth-mediated immunomodulation by ES products is well documented12. Among the predicted T. suis ES proteins, we found a variety of immunomodulators (Supplementary Table 16). On the basis of these findings and available literature for helminths12,26,27, we propose a Trichuris-driven immunomodulation model (Supplementary Fig. 4), in which the parasite suppresses inflammation by secreting (i) serpins to inhibit neutrophil cathepsins and elastases; (ii) apyrases to prevent conversion of regulatory T cells to pro-inflammatory T cells; (iii) cystatins to promote anti-inflammatory (producing interleukin-4 (IL-4) and IL-10) T cells by disrupting antigen presentation by dendritic and B cells; (iv) calreticulins that bind to dendritic cells and stimulate IL-4 production and limit inflammation by binding free calcium ions; and (v) molecular mimics12 of host galectins, mammalian macrophage inhibitory factor and tumor growth factor-β that stimulate apoptosis in activated T cells, promote alternative activation of macrophages and block the stimulation of (proinflammatory) Toll-like receptor pathways. This model is consistent with the pathophysiology described for Trichuris infection26, and probably operates in tandem with immunosuppressive processes linked to glycans27 and lipids (for example, sphingolipids; see Supplementary Note).

Transcriptome and differential transcription

We explored stage-, sex- and tissue-specific transcription (mRNAs), with a focus on parasite-host interactions. We predicted 36,763 transcripts, with 15,174 of them being perfect matches to the intron/exon chain (excluding UTRs), as annotated in the genome, and 21,589 representing novel splice isoforms (Supplementary Fig. 5 and Supplementary Table 17). In total, 6,293 (43.7%) T. suis genes were predicted to encode at least two isoforms. The number of splice-isoforms per gene only moderately correlated with exon number (R2 = 0.44; Supplementary Figs. 6 and 7). Alternative splicing correlated with gene function, reflected in an enrichment (P ≤ 0.05; Pearson's chi-squared analysis) of protein catabolism (i.e., proteases), membrane-bound ion transport and kinase activity among alternatively spliced genes and apoptosis among single splice-isoform genes. Genes with multiple and single splice isoforms also differed in their conservation and predicted essentiality. Of the 6,307 T. suis genes with C. elegans homologs, alternatively spliced genes predominated by a three-to-one margin (4,510 versus 1,875) and represented 204 (75.8%) of the 269 essential genes predicted. The latter observation needs to be considered when mining helminth genomes for novel drug targets. If a novel inhibitor targeting products of such genes interacts with one of the spliced domains, isoform switching may be sufficient to overcome its effect.

Some protein domains were significantly associated (P ≤ 0.05; Pearson's chi-squared analysis) with specific, alternative splice events, with exon skipping and the use of alternative first or last exons appearing to differ in their functional implications for transcription (Fig. 2a). Most notable was an over-representation of substrate-binding motifs (for example, immunoglobin, EGF-like or DnaJ) for genes biased toward transcripts with skipped exons. Remodeling of binding-motif structure through alternative splicing affects binding specificity in other organisms28, and we propose that exon skipping is important in regulating binding specificity of proteins in T. suis. Given the varied functions associated with alternative first- or last-exon splicing events (Fig. 2a), we hypothesize that these specific modifications might play a part in regulating protein localization, another known role for alternative splicing28.

Figure 2: Stage- and tissue-specific small-RNA and mRNA transcriptome of T. suis.
figure 2

(a) Association between gene function and alternative splice variation. Charts show the inferred function of protein domains encoded by genes showing a statistically significant (P ≤ 0.05; Pearson's chi-squared analysis) positive (+) or negative (−) bias toward skipped exon (SE), alternative first (FE) or last exon (LE) splice events. Only genes encoding ten or more transcripts are included in this analysis. (b) Proportional representation of major protein classes or groups encoded by the genome (Gen), and their proportional abundance in all transcriptomic data (All) and in larval (L1/2, L3 and L4), adult male (Am), female (Af) and tissue-specific libraries, including in the male (Mp) and female posterior body (Fp) and the stichosome (St). (c) Self-organizing heatmap (transcripts per million (TPM) values normalized by gene) clustering miRNAs by their transcription abundance (represented as log2-transformed reads per kilobase per million reads (RPKM) values) in each larval, adult and tissue-specific library. (d) Self-organizing heatmap (TPM values normalized by gene) clustering 22A-RNAs by their transcription abundance (represented as log2-transformed RPKM values) in each larval, adult and tissue-specific library.

T. suis undergoes substantial developmental changes throughout its direct life cycle29. To understand developmental processes in this parasite, we used RNA-seq to characterize transcription in various stages, sexes or body portions (stichosomal versus all non-stichosomal tissues from male and female adult worms): first and second (L1/L2), third (L3) and fourth (L4) larval stages adult male and female; stichosome and adult male posterior body and female posterior body excluding the stichosome (Supplementary Fig. 8 and Supplementary Table 18). Overall, a number of major functional classes of proteins showed higher representation in the transcriptome of T. suis than in the genome (Fig. 2b). Secretory proteins were notable in this regard, making up 4% of the T. suis gene set but representing 10% of the transcriptional abundance in all libraries. Peptidases, particularly secreted peptidases, were also over-represented in the transcriptome and, notably, were upregulated during larval development and in the stichosome.

The stichosome is the thin, elongate anterior end of Trichuris embedded tightly within a syncytial tunnel21 in the superficial layer of the large intestinal mucosa. Within this tunnel, the parasite secretes proteins and other molecules and absorbs nutrients from cell cytoplasm and surrounding tissue fluids, probably through thousands of bacillary cells30. Given its central importance in feeding and interaction with the host, we focused on transcription in the stichosome relative to the rest of the worm body (Supplementary Table 18; see Supplementary Note for detailed comparisons among other stages or tissues). Transcription was enriched for 2,210 genes (encoding 3,721 transcripts) in the stichosome relative to both the male and female posterior bodies (Supplementary Table 18). Among these genes are 160 peptidases (encoding 256 transcripts) and 41 porins (85 transcripts), supporting their role in host-tissue degradation and syncytial tunnel formation19,25 (Supplementary Fig. 9). Also notable is the enrichment of a large number of secreted and membrane-bound transporters (222 genes encoding 371 transcripts) of various ions (for example, sodium, phosphate and calcium) and small molecules (for example, glucose and nucleosides). Sugar metabolism is enriched in the stichosome, suggesting that absorbed glucose is rapidly metabolized in the stichocytes. Also upregulated in the stichosome are transcripts associated with endocytosis and vesicle formation, lysozyme and peroxisome pathways as well as fatty acid and amino acid (cysteine and methionine; lysine) degradation. At least one isoform of each putative immunomodulatory gene encoded by T. suis is transcribed in the stichosome, with 22 transcripts encoding galactins, serpins, venom allergen–like proteins, apyrase or calreticulin specifically enriched in the stichosome relative to both the male and female posterior bodies.

Chymotrypsin-like (S1) serine proteases (n = 28 of 31 genes, and 51 of 135 transcripts) are also upregulated in the stichosome. Many are homologs of vertebrate plasmin, which is thought to regulate blood clotting in the host31. A poorly understood consequence of trichuriasis is bloody diarrhea32, and some evidence suggests that Trichuris ingests blood33. It may be that some T. suis chymotrypsin-like serine proteases act as anticoagulants or assist in digesting blood, serum and tissue components (for example, fibrinogen). Notably, T. muris infection alters the mucus barrier in the host's gut epithelium, leading to an increased susceptibility to nematode infections34 through the degradation of mucin 2 (Muc2) polymers35. Muc2 depolymerization by T. muris is blocked by chymostatin and antipain35, suggesting a probable role for chymotrypsin-like and other serine proteases. Several of the secreted serine proteases enriched in the stichosome are homologs of Schistosoma mansoni serine protease 1 (SP1) and human kallikrein. The latter molecule regulates the degradation of kininogen to bradykinin, stimulating vasodilation, the cytosolic release of Ca2+, neutrophil recruitment and increased inflammation36. SP1 is a potent vasodilator in mice37, suggesting that it has an ability to convert vertebrate kininogen to bradykinin. Given the anti-inflammatory capacity of T. suis, we propose that some of these chymotrypsin-like serine proteases might degrade host kininogen but do not enable bradykinin production, thereby preventing bradykinin receptor stimulation and, thus, inhibiting inflammation. Notably, bradykinin receptors have key roles in various autoimmune disorders, including IBD38 and multiple sclerosis36.

Genetic regulatory networks

The T. suis gene set has complete RNA-interference machinery, suggesting potential for functional genomic studies and indicating a role for small noncoding RNAs in gene regulation. We explored these small RNAs in T. suis (Supplementary Figs. 10 and 11 and Supplementary Tables 19, 20, 21) and produced 435 million sequence reads. Approximately 92% of these reads mapped to the T. suis genome, with 16%, 23% and 9% classified as microRNAs (miRNAs), small interfering RNAs (siRNAs) and tiny noncoding RNAs (tncRNAs), respectively. Approximately 4% of the small-RNA reads mapped with an antisense (>80% of reads) bias to transposable elements, consistent with Piwi-interacting RNAs (piRNAs)39. However, similarly to small RNAs in Ascaris suum40, < 0.01% had characteristics consistent with 21U-RNAs, which function as piRNAs in C. elegans41.

We identified 319 miRNAs, with 132 having close homologs in other nematodes (Supplementary Table 22). These miRNAs accounted for 16% of all small-RNA reads sequenced, with tsu-let-7 (50% of all miRNA reads), tsu-miR-1 (17%), tsu-novel-51 (8%; a homolog of tsp-novel-51 miRNA from T. spiralis) and tsu-miR-228 (4%) the most highly transcribed. Approximately two-thirds of the miRNAs were most abundant in larval stages, suggesting a central role in development, with a diminishing number of miRNAs enriched in adults (Fig. 2c). This trend was reversed in the transition from L4 to adult female. To explore the functional implications of differential transcription of these miRNAs, we predicted miRNA-binding sites linked to 3′ UTRs among 22,954 of the 23,824 mRNA isoforms (representing 7,180 genes) for which at least part of the 3′ UTR could be identified on the basis of RNA-seq data. We focused on miRNA-mRNA interactions recognized or proposed for C. elegans. Of the 785,143 predicted binding sites with homology to C. elegans (both miRNA and mRNA), 300,042 were supported by information in public databases and 3,238 by experimental findings42. Owing to differences in gene copy number, these 300,042 binding sites represented 45 and 62 miRNAs as well as 3,205 and 3,877 coding genes in C. elegans and T. suis, respectively.

For T. suis, the shift from L3 to L4 coincides with a universal downregulation of 24 of these conserved miRNAs, including tsu-miR-1, tsu-miR-252 and tsu-miR-236—the second, seventh and eighth most abundant miRNAs, respectively, in T. suis overall. We identified 69 transcripts enriched in L4 (relating to 62 T. suis genes and 61 C. elegans homologs) with binding sites for each of these miRNAs. Many of the C. elegans genes with inferred homology to these transcripts are involved in larval or embryonic development (for example, rol-3, slt-1 and sox-3), growth (for example, egl-4, unc-44 and lin-39) or early sexual determination (for example, sex-1; WormBase), suggesting similar functional roles in T. suis. The maturation of T. suis to adulthood coincides with a variety of sex-specific changes in miRNA levels (relative to L4s). In both sexes, tsu-miR-228 (the fourth most abundantly transcribed miRNA in T. suis) and several isoforms of tsu-miR-61 were downregulated, and tsu-miR-34 and two 'minor' miRNAs—tsu-miR-256 and tsu-miR-50—were upregulated. Many of the coding genes (n = 447) upregulated in male and female adults are predicted to be co-regulated by tsu-miR-61 and tsu-miR-228. Homologs of these coding genes in C. elegans are enriched in GO terms (biological process) for embryonic and genital development, reproduction, morphogenesis and growth and metabolism, and include tbx-2, vps-16, xnp-1 and dyci-1 (WormBase). Notable among predicted tsu-miR-34–regulated genes were homologs of srp-2 (encoding serpin-2, an anti-inflammatory protein in helminths)12, which is downregulated in the adult worm (with the exception of the stichosome) compared with larval stages.

When we compared male and female T. suis adults, we found that major differences in miRNA transcription also related to tsu-miR-61, tsu-miR-228, tsu-miR-236 and tsu-miR-252, highlighting their importance in this nematode. Enriched in males are tsu-miR-228 and four copies of tsu-miR-61, and in females, tsu-miR-236, tsu-miR-252 and one copy of tsu-miR-61 (with closest homology to cel-miR-61-5p). Considering the ambiguity associated with the enrichment of different tsu-miR-61 isoforms in both males and females, we focused on tsu-miR-228, tsu-miR-236 and tsu-miR-252. In males, the enrichment of tsu-miR-228 coincides with a downregulation of 412 transcripts (representing 320 T. suis genes) with a predicted tsu-miR-228 binding site. On the basis of their function in C. elegans homologs, we infer many of these transcripts to be linked to vulva development (for example, exc-4, mys-1, nekl-2 and sem-4), egg production (for example, cbd-1, nsy-1, ppt-1, unc-29 and unc-58) and embryogenesis and germline development (for example, bcat-1, lars-1, rnp-4, rpt-5, slt-1 and tbp-1). In females, enriched transcription of tsu-miR-236 and tsu-miR-252 coincides with a downregulation of 262 transcripts (representing 205 genes) predicted to be co-regulated by these miRNAs. Homologs of these 'female-suppressed' coding genes in C. elegans include genes involved in spermatogenesis (for example, cogc-5 and cpb-1), male mating or fertility (for example, goa-1 and odc-1), the regulation of germline specification or apoptosis (for example, glp-1, him-1, rpt-5, let-60, vps-16 and vps-41) and chemosensation (for example, crh-1, grk-2 and lys-2). Collectively, these data suggest that sexual dimorphism in T. suis might relate, at least partially, to post-transcriptional sex suppression by miRNAs rather than exclusive transcriptional promotion by mRNAs.

In addition to miRNAs, we identified 1,028,808 putative small RNAs mapping to coding regions of the genome. Of these RNAs, 673,355 mapped antisense (≥80% of reads at each location) to exons, suggesting a potential role as siRNAs41. Most abundant among the siRNAs predicted for T. suis (except those derived from males) were sequences of 24–25 nt with a 5′ guanine (i.e., 24G and 25G), compared with 22G and 26G sequences predominating among siRNAs predicted for other nematodes to date40,41. Putative siRNAs were predicted for 3,497 protein-encoding genes. Many siRNAs have key roles in germline tissues40,41. In T. suis, we identified transcripts for 508 coding genes, for which putative siRNAs were uniquely transcribed in the adult female and the female posterior body relative to the stichosome, the adult male and the male posterior body (Supplementary Tables 7 and 18; see URLs). These coding genes were enriched for transposable elements/transposases, histones or histone methytransferases, DNA- or RNA-binding, chromatin folding and homeodomain-related proteins. Similarly, but in lower abundance, these functions were also enriched in relation to the 69 coding genes associated with siRNAs uniquely transcribed in the adult male and the male posterior body. We hypothesize that these highly transcribed siRNAs protect chromatin in the T. suis germline, and this hypothesis is supported by the observation that 162 of the female-enriched siRNAs are absent from the larval stages studied here (Supplementary Tables 7 and 18).

Novel class of tncRNAs

Conspicuous among the T. suis small RNAs is an abundance of 22-nt sequences with a 5′ adenine cap. Although representing just 2.9% (n = 58,307) of consensus small RNAs, these sequences represent 9.2% of all small-RNA transcription. By location, 22-nt 5′-adenylated sequences are evenly distributed between coding and noncoding regions, and within noncoding regions, between annotated (such as transposable elements, tRNAs or other noncoding RNAs) and un-annotated spaces. However, 89% of transcription attributed to these sequences relates to un-annotated, noncoding space in the genome. On the basis of their size and abundance, these sequences are consistent with tncRNAs43; however, they have characteristics not previously attributed to this class. For instance, in T. suis, they have a clear strand bias, with 83% of their transcription occurring on the Watson (i.e., antisense) strand. These 'antisense'-biased tncRNAs (henceforth called 22A-RNAs) form 1,208 clusters (ranging from 22 to 11,831 nt) among 238 assembly scaffolds, with a median of three 22A-RNA sequences per cluster at a median spacing of 97 bp. Although 40% of multicopy 22A-RNA sequences are found in the same cluster, clusters comprising one repeated 22A-RNA sequence are rare, and sequences are often shared among clusters and genomic scaffolds, indicating that tandem duplication is not the only mechanism associated with cluster formation. Few transposable elements are found within 25 kb of these clusters, suggesting that their insertion or translocation within the genome is not recent.

At this stage, we can only speculate about the function(s) and mechanism(s) of action of these sequences. Their 100-nt genomic neighborhoods vary: some regions resemble (but do not overlap with) known protein-encoding sequences, others resemble known noncoding RNAs such as tRNAs, and still others resemble neither. Eleven of these neighborhoods show partial similarities to cryptic tRNAs, 100 nt in size, discovered in C. elegans by the modENCODE Consortium44. The sequences of the 22A-RNAs themselves are also heterogeneous, with only one over-represented 8-nt sequence motif (5′-A[CA]GATAT[GT]-3′) occurring in 4.5% (245 of 5,457) of 22A-RNA sequences (Supplementary Fig. 12). Given these findings, we propose that 22A-RNAs may be processed from larger noncoding RNAs of diverse types, some of which are highly conserved and familiar, others of which are both hypothetical and unfamiliar. Despite having no obvious promoter motif (such as that proposed for 21U-RNAs)41, these sequences seem to be transcriptionally regulated, and their abundance varies substantially among stages, sexes and tissues in T. suis; including, notably, an enrichment in the adult male body and male posterior body relative to all other stages and tissues (Fig. 2d), which may suggest a role in the male germline. As a proportion of overall small RNA transcription, 22A-RNAs are most abundant in the stichosome, wherein they comprise 22% of all small-RNA reads determined. Indeed, the stichosome is notably restricted in its classes of small RNAs, with miRNAs (39% of all small-RNA reads from the stichosome) and 22A-RNAs dominating the small-RNA population in this organ. Whether this finding points to an involvement of these novel T. suis noncoding RNAs in host interactions deserves detailed investigation.

Discussion

Globally, helminthiases are seriously neglected causes of morbidity and mortality. Genomic and transcriptomic explorations of T. suis should enable the design of urgently needed therapeutics against human trichuriasis, one of the world's most important and neglected helminthiases. An intriguing feature of T. suis is its possible use as a therapy for human autoimmune disorders9,10,11. A detailed characterization of how this parasite modulates the host immune response is thus a key priority. Secreted proteins (including cystatins and serpins, thioredoxin peroxidase and various putative mimics of host proteins) seem to have a central role in this process, primarily through inhibiting inflammation. Our findings indicate a role for parasite-derived lipids, including the inferred synthesis of β-glucosylceramide, a known anti-inflammatory and putative therapy for IBD45, during the early developmental phase of T. suis (see Supplementary Note). It is likely that both proteins and lipids work in concert with N-linked glycans, which are known immunomodulators produced by T. suis27, particularly L4 and adult stages, in which pathways associated with their synthesis are transcriptionally enriched (see Supplementary Note). The detailed characterization of these molecules in vitro and in vivo, using existing models of IBD and other autoimmune disorders, might pave the way for parasite-derived therapies5. Indeed, a better understanding of the T. suis–host interactions might shed new light on why helminth exposure seems crucial for the development of a healthy immune system in humans. This is the first study to characterize the genomes of male and female individuals of a dioecious nematode. We found little evidence for sex-specific genes or assembly contigs, despite the reported XY karyotype of this species. However, intriguingly, miRNAs seem to have a major role in regulating sexual development in this species, with tsu-miR-228 in male, and tsu-miR-236 and tsu-miR-252 in female worms predicted to regulate and suppress key feminizing and masculinizing developmental genes, respectively. This is the first time that this has been observed for a metazoan.

Methods

Sample preparation and storage.

Trichuris suis were isolated from experimentally infected pigs inoculated orally with a single dose of 5,000–50,000 embryonated eggs (Animal Ethics Permission No. 2010/561-1914; University of Copenhagen). Individuals of T. suis were isolated at 10 (L1/L2 larvae), 18 (L3s), 28 (L4s) and 49 (adulthood) d after inoculation (p.i.)29,46 and washed in physiological saline (37 °C) and RPMI 1640 (GIBCO) with antibiotic-antimycotic (GIBCO). Adult male and female T. suis were separated. Stichosomes were excised from whole adult worms (n = 10, irrespective of sex), pooled and frozen, as were the posterior portions of the worms (n = 10 of each sex). All stages or tissues were snap frozen in liquid nitrogen and stored at −80 °C.

Genomic sequencing and assembly.

Total genomic DNA was isolated each from a single adult male or female T. suis47,48. Paired-end (insert sizes, 170 bp and 500 bp) and mate-paired (800 bp, 2 kb, 5 kb and 10 kb) libraries were constructed from total and whole-genomic amplified (WGA) genomic DNA, respectively14,49 and sequenced using a HiSeq 2000 machine (Illumina). Low-quality sequences, base-calling duplicates and adapters were removed using standard approaches. Sequence quality and heterozygosity were assessed by 17-mer frequency distribution50 and genome sizes estimated51. Corrected and filtered data were assembled into contigs using SOAPdenovo v2.0 (ref. 50) and assessed for accuracy using SOAP2aligner52. Assembly completeness and redundancy were assessed using CEGMA53 and RNA-seq data using Bowtie2 (ref. 54).

RNA isolation and RNA-seq.

Total RNAs from L1/L2 (n = 50,000 from five pigs), L3 (n = 15,000 from four pigs) and L4 (n = 3,000 from two pigs), adult male (n = 10), adult female (n = 10), and stichosomal (mixed sex; n = 10) and nonstichosomal portions of adult females (n = 10) and males (n = 10) were individually purified using TriPure reagent (Roche). Polyadenylated (polyA+) RNA was purified from 10 μg of total RNA for each library using Sera-mag oligo(dT) beads and fragmented, purified and sequenced using HiSeq 2000 (refs. 14,49). Small noncoding RNAs (18–30 nt) were isolated from 10 μg of total RNA for each library by size fractionation on polyacrylamide gels, purified, adaptor-ligated, reverse transcribed, amplified by PCR and sequenced using HiSeq 2000. All RNA-seq data were adaptor trimmed and length and quality filtered using standard approaches.

Synteny and polymorphism analysis, and annotation of repeat content.

For comparative analysis, the assemblies for adult male or female T. suis were aligned using MUMmer3 (ref. 55). Repetitive sequences in each assembly were identified14 using Tandem Repeats Finder (TRF)56, RepeatMasker57, LTR_FINDER58, PILER59 and RepeatScout60, with a consensus population of predicted repetitive elements constructed in RepeatScout using fit-preferred alignment scores. Transfer RNAs were predicted using tRNA-SCAN61. The male assembly was explored for scaffolds likely to represent the male-specific Y chromosome13,62. Reads from all genomic sequence libraries each for male and female T. suis were aligned to their own and the opposite sex (both repeat unmasked and hard-masked) assembly (i.e., male-to-male, male-to-female, female-to-male and female-to-female) using Bowtie2 (ref. 54). Contigs with >80% coverage in same-sex but <20% coverage in opposite-sex read alignments were deemed 'sex-specific'.

Prediction and functional annotation of the protein-encoding gene set.

The male and female protein-encoding gene set of T. suis was inferred in MAKER2 (ref. 63). Briefly, (i) the nonredundant T. suis transcriptome was aligned each assembly using BLAT64 and filtered for full-length ORFs, which were used (ii) to train hidden Markov models (HMM) for de novo gene prediction using SNAP65 and AUGUSTUS66, with these models supplemented using (iii) homologous genes from T. spiralis17 and C. elegans16; and (iv) all T. suis RNA-seq data from all libraries used to infer each transcript using Tophat2 (ref. 67) and Cufflinks2 (refs. 68,69); (v) all HMM-predicted, homology and evidence-based information was then combined into a single consensus gene set, and (vi) genes overlapping with predicted repetitive regions of the genome and/or having significant E < 1 × 10-5, BLASTn homology to known repetitive sequences (i.e., transposable elements) in RepBase57 and no close homology to C. elegans or T. spiralis protein-encoding genes were removed. The male and female T. suis gene sets were unified by orthology prediction using InParanoid70, with T. spiralis17 as an out-group.

Conserved protein domains encoded by each gene were identified using InterProScan71, with these data used to infer Gene Ontology72. Using Reciprocal BLASTp and OrthoMCL73, the T. suis inferred proteome was clustered with predicted homologs or orthologs for other nematodes, including Ascaris suum14, Brugia malayi15, C. elegans16 and T. spiralis17. Each contig was assessed for a known functional ortholog in the Kyoto Encyclopedia of Genes and Genomes (KEGG) using the KEGG orthology bases annotation system (kobas)74. In addition, T. suis inferred proteins were compared by BLASTx/BLASTp with protein sequences available for A. suum, B. malayi, C. elegans and T. spiralis, and in the databases UniProt75, SwissProt and TREMBL76, as well as specialist databases for key protein groups represented in MEROPS77, WormBase78, KS-SARfari and GPCR-SARfari, and the Transporter Classification database (TCDB)79. ES proteins were predicted using Phobius80 and by BLASTp comparison with the validated signal peptide database (SPD)81 and proteomic data for the nematodes B. malayi82,83 and Meloidogyne incognita84 and the trematode Schistosoma mansoni85.

Differential transcription analysis of mRNA.

Reconstruction and quantification (in fragments per kilobase per million reads (FPKMs)) of the T. suis transcriptome was conducted using TopHat2 (ref. 67) and Cufflinks2 (refs. 68,69). Predicted alternative splice events were classified86. Comparisons of splice events and gene function (based on encoded Pfam domains) were conducted by pairwise Pearson chi-squared analysis (P value ≤ 0.05). We also compared the relationship between gene essentiality and being a single or multi-isoform gene, with essentiality predicted14. Differential transcription was assessed using NOISeq87, with 20% of the evaluated reads for each library used in five iterations to simulate technical replicates.

Annotation and differential transcription analysis of small noncoding RNAs.

Canonical miRNAs were identified and quantified in miRDeep2 (refs. 88, 89, 90), and supported using miRNAs published for A. suum40, C. elegans91, Haemonchus contortus92, Brugia pahangi92 and T. spiralis93. The 3′ UTR for each Cufflinks-predicted T. suis transcript was identified by comparison with the T. suis genome annotation. Each 3′ UTR was screened for miRNA binding sites using PITA94. These binding sites were filtered on the basis of homologous miRNA-transcript binding interactions predicted for C. elegans in curated databases (microRNA.org, RNA22 and TargetScan) or demonstrated empirically42. All non-miRNA reads from each small-RNA library for T. suis were then aligned to the male T. suis genome using Bowtie2 (ref. 54) and clustered using ShortStack95, with a minimum cluster depth cutoff of 10. Small-RNA reads having perfect alignment overlap (i.e., the same start and stop position) were defined as homologous and condensed into a consensus sequence by majority rule. Each consensus small RNA was classified41 and nucleotide diversity within homologous small-RNA reads was assessed using custom Perl scripts. Specific small-RNA sequences (for example, 21U-RNAs)41 and their 5′ and 3′ flanking regions were explored for sequence motifs using MEME96. Differential transcription among stage- or tissue-specific libraries was assessed (in reads per million mapped reads, RPKM) for miRNAs, siRNAs and 22A-RNAs using NOISeq87, and clustered by stage- or tissue-specific transcription pattern using R.

Analysis of the genomic neighborhoods and primary sequences of 22A-RNAs.

We extracted 22A-RNAs with 100-nt flanks lacking scaffolding (N) residues, merged those with spatial overlaps along the genome, and further condensed them to 80% sequence identity with CD-HIT-EST97. We probed resultant nonredundant 22A-RNA regions for protein-encoding exons with BlastX79, and known noncoding RNAs (ncRNAs) with INFERNAL 1.1/cmscan98. BlastX was run against predicted proteomes from all published nematode genomes in WormBase WS240 (ref. 87), as well as the T. suis male proteome from this study; cmscan was run against the ncRNA database RFAM 11.0 (ref. 99). Regions passing these filters were tested for similarity to (i) genomic DNA from other nematode species, (ii) other 22A-RNA neighborhoods and (iii) novel ncRNAs from C. elegans. The first was assayed by BlastN against published nematode genomic sequences in WormBase WS240 (ref. 87). The second was assayed by BlastN against 22A-RNA regions spatially merged for genomic overlaps but not condensed with CD-HIT-EST. The third was assayed by BlastN against a set of 8,126 C. elegans ncRNAs taken from the WS240 release of WormBase87. All searches used E-value thresholds of ≤10−3.

A set of 25,259 C. elegans ncRNAs was obtained from WormBase WS240. We filtered out those named 'asRNA', 'rRNA', 'scRNA', 'snoRNA', 'snRNA' or 'tRNA', leaving 8,126 ncRNAs with no official similarity to well-known structures. Most of these ncRNAs had been discovered by modENCODE45; 176 others were long noncoding RNAs100. We then checked for previously undescribed motifs via RFAM 11.0 and cmscan.

To discover whether 22A-RNA sequences contained novel motifs, we extracted 22A-RNA sequences without flanking protein-encoding or ncRNA similarities, merged them spatially and for 80% identity, and scanned with MEME96, using a first-order Markov model from the adult male T. suis genome (via MEME's fasta-get-markov). We ran MEME with arguments: '-dna -revcomp -nsites 100 -bfile TS_M_200bpormore.1markov.txt -nmotifs 10 -evt 0.05 -minw 6 -maxw 22 -mod anr'. For one resulting 8-nt motif, we used FIMO101 to determine where it occurred in original, unmerged 22A-RNA sequences, with arguments: '–bgfile TS_M_200bpormore.1markov.txt–output-pthresh 0.001'. The motif was displayed as a logarithmic WebLogo102.

URLs.

WormBase, http://www.wormbase.org/; microRNA.org, http://www.microrna.org/microrna/home.do; RNA22, https://cm.jefferson.edu/rna22v1.0/; TargetScanWorm, http://www.targetscan.org/worm_52/; salient data files are accessible via http://gasser-research.vet.unimelb.edu.au/Trichuris_suis/ and ftp://ftp.wormbase.org/pub/wormbase/species/t_suis/; browsable male and female genomes are accessible via http://gasser-research.vet.unimelb.edu.au/jbrowse/JBrowse-1.11.2/index.html?data=TsuisMale/ and http://gasser-research.vet.unimelb.edu.au/jbrowse/JBrowse-1.11.2/index.html?data=TsuisFemale/, respectively, or through WormBase via ftp://ftp.wormbase.org/pub/wormbase/species/t_suis/.

Accession numbers.

All short-read data are available via Sequence Read Archive: SRR1041639, SRR1041640, SRR1041641, SRR1041642, SRR1041643, SRR1041644 (genomic DNA male); SRR1041645, SRR1041646, SRR1041647, SRR1041648, SRR1041649, SRR1041650 (genomic DNA female); SRR1041651, SRR1041652, SRR1041653, SRR1041654, SRR1041655, SRR1041656, SRR1041657, SRR1041658 (mRNA); SRR1041659, SRR1041660, SRR1041661, SRR1041662, SRR1041663, SRR1041664, SRR1041669, SRR1041670 (small RNA). Annotated assemblies of each genome are accessible via BioProject PRJNA208415 (male) and PRJNA208416 (female).