The rise of genome-wide association studies put a spotlight on how small variations in DNA can affect a person's risk of disease. Exactly how these small changes influence the production of proteins is unclear; notably, up to 90% of the variants discovered through genome-wide association studies occur outside the protein coding regions of genes. So, as early as 2005, Francis Collins, then director of the US National Human Genome Research Institute (NHGRI) in Bethesda, Maryland, championed the concept of the Genotype-Tissue Expression project (GTEx) to create a reference database of genetic expression in healthy human tissue. A 2008 workshop confirmed the feasibility Collins's idea, and in October 2010 the US National Institutes of Health (NIH) Common Fund finally launched the project, as Collins ascended to lead the NIH.

On 29 May, after a two-year pilot period, researchers behind GTEx released a progress update about how the database has evolved and how it promises to serve as a useful resource going forward (Nat. Genet. 45, 580–585, 2013). The project has already collected data on genetic expression from more than 1,800 post-mortem tissue samples from a total of 190 healthy donors.

According to GTEx program director, the NHGRI's Jeff Streuwing, the database is the first to analyze levels of RNA in different tissues within an individual donor—24 different tissues per person, on average. In the two months following the official April launch of the GTEx database, about 55 researchers requested access.

One draw of the database is that scientists who study specific gene mutations or single nucleotide polymorphisms (SNPs) “can go put in that SNP number and see how that SNP acts in local tissues,” Struewing says. Researchers must understand normal gene expression to grasp which signals go awry in illness, explains Kristin Ardlie, one of several principal investigators of the GTEx project and director of the Biological Samples Platform at the Broad Institute in Cambridge, Massachusetts. The goal is for genetic understanding to yield more effective disease therapeutics, Ardlie says, and the research GTEx allows is “the next step along that pathway to understand the function and regulation” of genetic expression.

Stephen Hewitt studies malignant tissue samples in his pathology lab, so the genetic expression data he sees primarily come from diseased cells. The GTEx database will bring him “a catalog of gene expression in a normal tissue,” which is “incredibly valuable,” says Hewitt, a cancer researcher at the US National Cancer Institute in Bethesda, Maryland, who helped develop the method GTEx uses to preserve its samples but has not been involved in data collection.

GTEx might also provide a baseline of RNA levels for researchers to understand how therapies influence tissue differently. Researchers already make “use of RNA sequence data in the prediction of drug responses,” Ardlie says. She points to a recent study that related the gene expression in cancer cell lines to their drug response; for example, that expression of the Schlafen family member 11 gene (SLFN11) predicted sensitivity to topoisomerase inhibitors (Nature 483, 603–607, 2012).

Last year, at the American Society of Human Genetics annual meeting in San Francisco, GTEx researchers reported that preliminary analysis of gene expression information in the database from blood, lung, thyroid, heart, muscle and skin tissue confirmed the presence of SNP epicenters regulating RNA levels known as 'expression quantitative trait loci' (eQTLs). Moreover, it showed the potential to pinpoint new eQTLs. Past research, for example, has pointed to an eQTL for orosomucoid like 3 (ORMDL3), a gene thought to influence asthma risk, though researchers haven't yet detected a difference in expression between diseased and healthy individuals (Nat. Rev. Genet. 12, 277–282, 2012).

Now that the pilot phase has ended, GTEx has begun to scale up—with the goal of offering gene expression data from 900 donors by 2015. Down the road, the GTEx team hopes to run whole-genome methylation analysis on its samples, which would give more information on underlying biological mechanisms of gene expression, and proteomic analysis, which would provide insight on the proteins at work in each cell, Struewing says.