Multi-platform discovery of haplotype-resolved structural variation in human genomes
ABSTRACT
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.
Subject Area
- Biochemistry (11774)
- Bioengineering (8764)
- Bioinformatics (29242)
- Biophysics (14997)
- Cancer Biology (12136)
- Cell Biology (17430)
- Clinical Trials (138)
- Developmental Biology (9432)
- Ecology (14196)
- Epidemiology (2067)
- Evolutionary Biology (18326)
- Genetics (12259)
- Genomics (16813)
- Immunology (11882)
- Microbiology (28122)
- Molecular Biology (11615)
- Neuroscience (61049)
- Paleontology (452)
- Pathology (1874)
- Pharmacology and Toxicology (3239)
- Physiology (4967)
- Plant Biology (10434)
- Synthetic Biology (2889)
- Systems Biology (7347)
- Zoology (1653)