Abstract
Sequencing costs are falling, but the cost of data analysis remains high, often because unforeseen problems arise, such as insufficient depth of sequencing or batch effects. Experimenting with data analysis methods during the planning phase of an experiment can reveal unanticipated problems and build valuable bioinformatics expertise in the organism or process being studied. This protocol describes using R Markdown and RStudio, user-friendly tools for statistical analysis and reproducible research in bioinformatics, to analyze and document the analysis of an example RNA-Seq data set from tomato pollen undergoing chronic heat stress. Also, we show how to use Integrated Genome Browser to visualize read coverage graphs for differentially expressed genes. Applying the protocol described here and using the provided data sets represent a useful first step toward building RNA-Seq data analysis expertise in a research group.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133:523–536, PMID: 18423832
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628, PMID: 18516045
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344–1349, PMID: 18451266
Shendure J (2008) The beginning of the end for microarrays? Nat Methods 5:585–587, PMID: 18587314
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140, PMID: 19910308
Nikolayeva O, Robinson MD (2014) edgeR for differential RNA-seq and ChIP-seq analysis: an application to stem cell biology. Methods Mol Biol 1150:45–79, PMID: 24743990
Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14, PMID: 19371405
Nicol JW, Helt GA, Blanchard SG Jr, Raja A, Loraine AE (2009) The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25:2730–2731
Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11:R14 PMID: 20132535
Acknowledgements
The example data set was from the Workshop in Next-Generation Sequencing (WiNGS), which was co-sponsored by the NSF Research Coordination Network on Integrative Pollen Biology (award 0955431), the NSF Plant Genome Research Program (award 1238051), and the Department of Bioinformatics and Genomics at UNC Charlotte. NIH R01 grant number 21737838 supports development of the IGB software.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this protocol
Cite this protocol
Loraine, A.E., Blakley, I.C., Jagadeesan, S., Harper, J., Miller, G., Firon, N. (2015). Analysis and Visualization of RNA-Seq Expression Data Using RStudio, Bioconductor, and Integrated Genome Browser. In: Alonso, J., Stepanova, A. (eds) Plant Functional Genomics. Methods in Molecular Biology, vol 1284. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-2444-8_24
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2444-8_24
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-2443-1
Online ISBN: 978-1-4939-2444-8
eBook Packages: Springer Protocols