PolyScan: An automatic indel and SNP detection approach to the analysis of human resequencing data

  1. Ken Chen1,
  2. Michael D. McLellan,
  3. Li Ding,
  4. Michael C. Wendl,
  5. Yumi Kasai,
  6. Richard K. Wilson, and
  7. Elaine R. Mardis
  1. Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri 63108, USA

Abstract

Small insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) are common genetic variants that are thought to be associated with a wide variety of human diseases. Owing to the genome’s size and complexity, manually characterizing each one of these variations in an individual is not practical. While significant progress has been made in automated single-base mutation discovery from the sequences of diploid PCR products, automated and reliable detection of indels continues to pose difficult challenges. In this paper, we present PolyScan, an algorithm and software implementation designed to provide de novo heterozygous indel detection and improved SNP identification in the context of high-throughput medical resequencing. Tests on a human diploid PCR-based sequence data set, consisting of 90,270 traces from 13 genes, indicate that PolyScan identified ∼90% of the 151 consensus indel sites and ∼84% of the 1546 heterozygous indels previously identified by manual inspection. Tests on tumor-derived data show that PolyScan better identifies high-quality, low-level mutations as compared with other mutation detection software. Moreover, SNP identification improves when reprocessing the results of other programs. These results suggest that PolyScan may play a useful role in the post human genome project research era.

Footnotes

| Table of Contents

Preprint Server