Data validation and quality control
Paired-end sequencing using the Illumina HiSeqX10 sequencing platform generated 17× genome coverage with 79,792,942 reads (150 bp) for a 10-Gy/46 gamma-irradiated mutant line. On the basis of data from the NCBI Sequence Read Archive (SRA; accession number
SRR16008784), this hybrid taxonomically matched 8.30% with its closest species
Dendrobium catenatum, followed by
Phalaenopsis equestris at 0.55%. The resulting genome assembly was 678,650,699 bp long, with a total of 635,396 contigs, with the longest being 30,571 bp and shortest being 300 bp, and a mean value of 1068 bp. The N50 value was 1423, GC content was 32.48%, and there were 447,500 contigs (Table
2).
Table 2
Genome assembly statistics of the gamma-irradiated Dendrobium hybrid mutant and RagTag scaffolding.
Assembly | γ mutant (Dendrobium hybrid) [35] | Dendrobium catenatum | Dendrobium huoshanense [27] | RagTag scaffold of γ mutant (RagTag.Scaffold) |
---|
#contigs (≥0 bp) | 635,396 | 286,396 | 2256 | 549,354 |
#contigs (≥1000 bp) | 213,573 | 29,592 | 2256 | 163,119 |
#contigs (≥5000 bp) | 8,302 | 3,709 | 1279 | 5190 |
#contigs (≥10,000 bp) | 625 | 2,523 | 907 | 370 |
#contigs (≥25,000 bp) | 7 | 1,684 | 448 | 29 |
#contigs (≥50,000 bp) | 0 | 1,401 | 145 | 20 |
Total length (≥0 bp) | 678,650,699 | 1,104,259,548 | 1,284,285,095 | 687,254,899 |
Total length (≥1000 bp) | 439,221,924 | 1,016,149,702 | 1,284,285,095 | 471,383,498 |
Total length (≥5000 bp) | 57,139,593 | 973,286,060 | 1,282,134,848 | 184,746,937 |
Total length (≥10,000 bp) | 7,915,569 | 964,906,549 | 1,279,530,669 | 154,097,582 |
Total length (≥25,000 bp) | 194,559 | 952,168,543 | 1,272,192,375 | 149,748,116 |
Total length (≥50,000 bp) | 0 | 942,392,114 | 1,262,665,926 | 149,476,984 |
Number of contigs | 447,500 | 64,087 | 2256 | 369,938 |
Largest contig (bp) | 30,571 | 33,291,853 | 100,197,051 | 18,000,059 |
Total length (bp) | 604,787,319 | 1,040,039,458 | 1,284,285,095 | 616,868,961 |
GC (%) | 33.48 | 34.61 | 35.73 | 33.49 |
N50 | 1423 | 1,149,703 | 71,787,458 | 2096 |
N75 | 949 | 434,049 | 52,753,504 | 1039 |
L50 | 105,200 | 184 | 8 | 46,924 |
L75 | 228,327 | 553 | 13 | 154,552 |
#Ns per 100 Kbp | 0.00 | 4167.30 | 123.03 | 1394.82 |
RagTag scaffolding of the mutant assembly, based on the reference genome of
Dendrobium huoshanense, contained a total length of 687,254,899 bp; an increase of 8,604,200 bp with an N50 value of 2096 (Table
2). The final largest contig length of the RagTag scaffolded assembly increased to 18,000,059 bp from 30,571 bp, with increased N’s per 100 kilobase pairs (Kbp) to 1394.82 from 0. When compared against the Uniprot database using BLASTX with an
e-value cut off of 10
−3, the 96,529 genes predicted by MaSuRCA resulted in 60,741 potential genes governing different molecular functions, cellular components and biological processes. BLAST results were filtered based on a cut-off of qcov >60
% and pi identity >70
% to ensure confidence of annotations.
We also identified 216,232 SSRs and designed 138,856 microsatellite primers; these will be useful for generating polymorphic differences among progenies and putative gamma-irradiated mutant lines.
Most BLASTX hits showed affinity with
D. catenatum based on functional annotation of genes (Figure
2). BUSCO (v.5.2.2) analysis revealed 913 (56.57%) single-copy orthologs that do not match with any databases; this indicates a possible effect of both the genomic background of the developed hybrid cultivar, and of the gamma radiation.
Figure 2.
BLASTX hits based on functional predicted gene models for different organisms.
The complex genome of the ‘Emma White’ hybrid
Dendrobium cultivar is derived from five unique and unrelated species; it has been hybridized 11 times over a period of 68 years, with selection for targeted economic trait improvement (Table
1). Low BUSCO values may be attributed to its fragmented assembly. However, the presence of genomic material from several other species of same genus (otherwise contaminant species) in the hybrid cultivar may have resulted drastic changes in the missing BUSCO values
[37–39]. NCBI taxonomical data for mutant
Dendrobium, based on raw sequence data, also supports the view, since it has limited synteny with its closest relative,
Dendrobium catenatum, at less than 9%.
In addition, multigenome hybrid cultivars are genetically heterogeneous and have an outcrossing nature, indicating higher compatibility. For example, the outcrossing species
Arabidopsis lyrata had 32,670 predicted genes, even at 8.3× DNA coverage, compared with 27,025 genes in the selfing species
Arabidopsis thaliana (125 megabase pairs [Mbp]), which diverged 10 million years ago
[40] because of genomic loss and rearrangement. In a similar way, these novel
Dendrobium hybrid cultivars have a distinct genome, because of introgression from other wild species chosen by plant breeders to create new genetic variations in a short space of time compared with evolutionary changes. It can also be attributed to deletions, mostly noncoding DNA and transposons, and the presence of a highly mutagenized background with severe developmental abnormalities; apart from presence of unclustered genes
[41].
Reuse potential
The mutant
Dendrobium hybrid sequencing and genome assembly presented here can be adopted as a primary reference genome, as well as complementing existing conventional
Dendrobium species already in the public domain. Studies of induced mutants allow rapid discovery of new alleles at low cost using high throughput TILLING
[42]. As evident from other crops
[43, 44], this is especially true in vegetatively propagated
Dendrobium hybrids to obtain high-density mutations using gamma radiation mutation breeding. These results provide a baseline for further research on the molecular understanding of desired traits in mutant germplasm, and to develop genomic resources for orchid improvement.