Elsevier

Biochimie

Volume 94, Issue 11, November 2012, Pages 2353-2359
Biochimie

Review
A pipeline for the identification and characterization of chromatin modifications derived from ChIP-Seq datasets

https://doi.org/10.1016/j.biochi.2012.06.002Get rights and content

Abstract

The advent of massive parallel sequencing of immunopurified chromatin and its determinants has provided new avenues for researchers to map epigenome-wide changes and there is tremendous interest to uncover regulatory signatures to understand fundamental questions associated with chromatin structure and function. Indeed, the rapid development of large genome annotation projects has seen a resurgence in chromatin immunoprecipitation (ChIP) based protocols which are used to distinguish protein interactions coupled with large scale sequencing (Seq) to precisely map epigenome-wide interactions. Despite some of the great advances in our understanding of chromatin modifying complexes and their determinants, the development of ChIP-Seq technologies also pose specific demands on the integration of data for visualization, manipulation and analysis. In this article we discuss some of the considerations for experimental design planning, quality control, and bioinformatic analysis. The key aspects of post sequencing analysis are the identification of regions of interest, differentiation between biological conditions and the characterization of sequence differences for chromatin modifications. We provide an overview of best-practise approaches with background information and considerations of integrative analysis from ChIP-Seq experiments.

Graphical abstract

Heat map of a histone mark at ± 5 kb from the Transcription Start Site of a subset of the REFSEQ Gene Set.

  1. Download : Download high-res image (281KB)
  2. Download : Download full-size image

Highlights

► High throughput sequencing is increasingly used for epigenetic research. ► A pipeline is described for the identification and analysis of epigenetic changes. ► Standardized protocol is described to explain the computational problems.

Introduction

The dramatic advances in next generation sequencing in recent years have popularised the concept that sequencing offers many opportunities for mapping chromatin modifications and epigenetic changes of complex eukaryotic genomes. The milestone of draughting the human genome compromising at least 3 billion bases of genetic code was not only breakthrough science [1], [2]. Since then, the field of genome research has grown tremendously, reaching new frontiers in sequencing technology and now the opportunity of mapping chromatin modifications and its determinants which are critical in regulating DNA replication and repair, genome stability and its transcription. The genome is indeed distinguished by covalent modifications such as DNA methylation which provides a direct mechanism of transcriptional control. The capacity to regulate critical nuclear processes such as transcription has been explored by gain or loss of function experiments [3], and experimental evidence supports the general view that genomic methylation is often inversely associated with transcriptional expression [4]. The mechanism by which DNA methylation represses transcription is intensely investigated because of its critical role in mammalian development [5]. Together with chromatin assembly, the specific chemical modification of histones are thought to contribute to transcriptional control by altering genome structure and function by signalling other protein determinants [6], [7], [8]. The residues of histones undergo post translational modifications that include acetylation, methylation, phosphorylation as well as many others [9]. These chemical variations to the histone tail have been intensively studied because of their critical roles in gene expression and the advent of high-throughput technologies can define genome-wide maps for histone modifications and changes associated with gene structure and function [10]. Indeed, more recently, these complex regulatory events have been investigated by integrating diverse experimental datasets [11], [12].

The rapid developments in next generation sequencing technologies are forging unheralded advances in genome-wide single-base resolution analysis. Significant to the goal of mapping genome-wide chromatin modifications, is the complexity of interpreting not only genomic distribution as well as specific patterns of modification, but also, understanding these relationships to add new insights into the regulation and dynamics of gene expression. In the past this was technically demanding and cost challenging for all but the smallest reference genomes [13]. Large-scale genome sequencing projects are now more feasible and provide a highly specific approach to distinguish specific changes in gene regulating histone modifications. Recent technological advances have reduced the cost-per-base which fits large scale epigenetic profiling, however, this has only complicated data analysis structural and functional gene annotation because of the large and fragmented assemblies of short sequence reads [14]. A single run of a high throughput sequencer generates a large volume of data, requiring specialised methods and large-scale computational infrastructure for reliable analysis. The establishment of institutional and commercial sequencing centers and of standardized analysis workflows for the primary analysis of sequence data in projects such as GALAXY [15], [16] emphasizes this increasing trend to unify bioinformaticians and biomedical scientists, to solve complex problems by improving data management and analytical integration.

Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-Seq) is a method of assessing chromatin bound interactions [17]. This method is now widely used to identify the genomic coordinates of epigenetic features such as histone modifications and DNA methylation. Here, we briefly describe the approaches used in our laboratory, with an emphasis on experimental design and on automated analysis methods and tools, aiming to improve the biological interpretation and the reproducibility of inference from ChIP-Seq based experiments. Fig. 1 gives a schematic overview of an example experimental workflow. The details of this workflow are described below.

Section snippets

Experimental design

The most fundamental decisions made in a ChIP-Seq project include the choice of material to sequence and of the fragment length and coverage depth for sequencing. While scripted or otherwise automated analyses can be repeated or re-run at relatively little cost, sequencing remains expensive and biological material is often limited, particularly when derived from human subjects.

ChIP-Seq data must be aligned to a previously defined reference genome. Detailed annotation of sequence features in the

Annotation and queries for differential ChIP-Seq regions

Annotation can occur when genomic regions of interest are identified and are necessary for biological interpretation. Annotation can be provided by intersecting the genomic coordinates of the epigenetic regions of interest with annotation files such as those provided by the UCSC table browser [33]. Example annotations include transcription features such as gene bodies, exons, introns, transcription start sites, transcription termination sites, single nucleotide variants (SNV), CpG islands and

Conclusion

High throughput sequencing and high throughput analyses are becoming increasingly available for epigenetic research. The field is developing and changing rapidly and our experience suggests that thoughtful experimental design and careful construction and testing of automated analysis pipelines to minimize error are key elements. Errors resulting from low quality read or technical bias can be introduced in downstream analyses. Failure to implement quality control in earlier stages of analysis

Sources of funding

The authors acknowledge grant and fellowship support from the Juvenile Diabetes Research Foundation International (JDRF), the Diabetes Australia Research Trust (DART), the National Health and Medical Research Council (NHMRC) and the National Heart Foundation of Australia (NHF). AE-O is a Senior Research Fellow supported by the NHMRC. Supported in part by the Victorian Government's Operational Infrastructure Support Program.

Declarations

None.

References (46)

  • R.D. Hawkins et al.

    Next-generation genomics: an integrative approach

    Nature Reviews Genetics

    (2011)
  • Z. Wang et al.

    Combinatorial patterns of histone acetylations and methylations in the human genome

    Nature Genetics

    (2008)
  • S.J. Cokus et al.

    Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning

    Nature

    (2008)
  • M. Yandell et al.

    A beginner's guide to eukaryotic genome annotation

    Nature Reviews Genetics

    (2012)
  • J. Goecks et al.

    Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

    Genome Biol

    (2010)
  • K.H. Lars Feuerbach et al.

    Analyzing epigenome data in Context of genome Evolution and human diseases

  • D.S. Johnson et al.

    Genome-wide mapping of in vivo protein-DNA interactions

    Science

    (2007)
  • M. Haring et al.

    Chromatin immunoprecipitation: optimization, quantitative analysis and data normalization

    Plant Methods

    (2007)
  • R. Koehler et al.

    The uniqueome: a mappability resource for short-tag sequencing

    Bioinformatics

    (2010)
  • L. Pirola et al.

    Genome-wide analysis distinguishes hyperglycemia regulated epigenetic signatures of primary vascular cells

    Genome Research

    (2011)
  • C.S. Ross-Innes et al.

    Differential oestrogen receptor binding is associated with clinical outcome in breast cancer

    Nature

    (2012)
  • S. Anders et al.

    Differential expression analysis for sequence count data

    Genome Biol

    (2010)
  • M.A. Quail et al.

    A large genome center's improvements to the Illumina sequencing system

    Nature Methods

    (2008)
  • View full text