The ENCODE project offers a fresh perspective on big data by providing an organized framework for genomics (www.nature.com/encode). Other big-data efforts tend to focus on rapidly locating needles in petabyte-sized haystacks (such as finding the Higgs boson, for instance), whereas ENCODE aims to supply a structured overview.
ENCODE's organization of information is hierarchical, with raw data at the bottom and layers of annotation above. The processed summaries become progressively broader — for example, starting at the level of signals representing the degree to which DNA is bound by transcription factors, moving on to the locations of sites where these factors bind, and then to overviews of regulatory networks. At the summit are the linked publications documenting the annotation.
The ENCODE data model could be useful in other fields: for example, astronomy and Earth science are in the process of organizing their reams of data (M. J. Raddick and A. S. Szalay Science 329, 1028–1029; 2010), but don't yet compare with ENCODE in terms of the level of integration.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gerstein, M. ENCODE leads the way on big data. Nature 489, 208 (2012). https://doi.org/10.1038/489208b
Published:
Issue Date:
DOI: https://doi.org/10.1038/489208b
This article is cited by
-
Allele-specific transcription factor binding in a cellular model of orofacial clefting
Scientific Reports (2022)
-
Correlation of an epigenetic mitotic clock with cancer risk
Genome Biology (2016)
-
Sharing big biomedical data
Journal of Big Data (2015)
-
Teaching 'big data' analysis to young immunologists
Nature Immunology (2015)
-
Implementation of the CDC translational informatics platform - from genetic variants to the national Swedish Rheumatology Quality Register
Journal of Translational Medicine (2013)