Physical map-assisted whole-genome shotgun sequence assemblies

  1. René L. Warren1,6,
  2. Dmitry Varabei1,6,
  3. Darren Platt4,
  4. Xiaoqiu Huang3,
  5. David Messina2,
  6. Shiaw-Pyng Yang2,
  7. James W. Kronstad5,
  8. Martin Krzywinski1,
  9. Wesley C. Warren2,
  10. John W. Wallis2,
  11. LaDeana W. Hillier2,
  12. Asif T. Chinwalla2,
  13. Jacqueline E. Schein1,
  14. Asim S. Siddiqui1,
  15. Marco A. Marra1,
  16. Richard K. Wilson2, and
  17. Steven J.M. Jones1,7
  1. 1 British Columbia Cancer Agency, Genome Sciences Centre, Vancouver, British Columbia V5Z 4S6, Canada;
  2. 2 Washington University School of Medicine, Genome Sequencing Center, St. Louis, Missouri 63108, USA;
  3. 3 Department of Computer Science, Iowa State University, Ames, Iowa 50011-1040, USA;
  4. 4 U.S. Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA;
  5. 5 The Michael Smith Laboratories, Department of Microbiology and Immunology, The University of British Columbia, Vancouver, British Columbia V6T 2Z4, Canada
  1. 6

    6 These authors contributed equally to this work.

Abstract

We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity.

Footnotes

| Table of Contents

Preprint Server