Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome

Olof Emanuelsson; Ugrappa Nagalakshmi; Deyou Zheng; Joel S. Rozowsky; Alexander E. Urban; Jiang Du; Zheng Lian; Viktor Stolc; Sherman Weissman; Michael Snyder; Mark B. Gerstein

doi:10.1101/gr.5014606

Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome

¹ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA;
² Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA;
³ Department of Genetics, Yale University School of Medicine, New Haven, Connecticut 06520–8005, USA;
⁴ Department of Computer Science, Yale University, New Haven, Connecticut 06520-8285, USA;
⁵ Center for Nanotechnology, NASA Ames Research Center, Moffett Field, California 94035, USA

Abstract

Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.

Footnotes

↵6 Present address: Stockholm Bioinformatics Center, AlbaNova University Center, Stockholm University, SE-10691 Stockholm, Sweden
↵7 Corresponding authors.

↵7 E-mail michael.snyder{at}yale.edu; fax (360) 838-7861.

↵7 E-mail mark.gerstein{at}yale.edu; fax: (360) 838-7861.
[Supplemental material is available online at www.genome.org.]
Article is online at http://www.genome.org/cgi/doi/10.1101/gr.5014606
- Received December 7, 2005.
- Accepted June 8, 2006.
Freely available online through the Genome Research Open Access option.