Introduction

While cardiovascular disease (CVD) has traditionally been considered a disease of Western society, its global incidence is on the rise and it is currently more prevalent in low- and middle income countries in Asia and Africa [1]. To prevent CVD, accurate personal risk-assessment is paramount. The 2012 European Society of Cardiology (ESC) guidelines recommend risk-assessment using the updated SCORE charts based on age, gender, smoking, blood pressure, and total cholesterol [2]. The recent joint guidelines by the American College of Cardiology and the American Heart Association (ACC/AHA) recommend a model based on the Framingham Risk Score using generally similar parameters [3]. However, these current risk prediction models only provide a rough estimate of individual risk. Therefore, great value is posited in the identification and development of new biomarkers for CVD risk prediction.

Decades of research have shown that improvement of risk prediction requires comprehensive understanding of the disease mechanism. The tremendous progress achieved in the ‘omics’ field has successfully improved the understanding of CVD pathophysiology by comprehensively interrogating disease states at the molecular level. This molecular phenotyping has become feasible by novel, robust, and fast high-throughput analytic platforms providing novel opportunities for molecular biomarker identification [4]. Transcriptomics, the study of ribonucleic acid (RNA) transcripts and their expression patterns at a genome-wide level, is particularly promising for biomarker identification.

This article will review current knowledge of transcriptomics biomarkers in the cardiovascular field and provide an overview about the promises and challenges of the transcriptomics approach for biomarker identification.

RNA

RNA has long been considered as the messenger molecule between genes and proteins, where RNA is transcribed from DNA to messenger RNA (mRNA) and subsequently translated into protein [5, 6]. In recent years, non-coding RNA species have been characterized including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) [7, 8].

miRNAs are endogenous, non-coding small RNAs of about 22 nucleotides regulating gene expression at a post-transcriptional level [9, 10]. They are involved in a broad range of biological processes and their dysregulation impacts disease development [11]. Of great interest is that miRNAs are stable in biological fluids such as blood and urine [12, 13], are actively secreted in microparticles and show tissue-specificity, attractive features of potential biomarkers [14].

lncRNAs cover RNA molecules over 200 nucleotides and are observed in a wide range of tissues. They exert a broad repertoire of functions and have been linked to differentiation and developmental processes and disease [8, 15]. Compared to miRNAs, the widespread attention on lncRNAs is a rather recent phenomenon, nonetheless some promising evidence of using lncRNAs as biomarkers exist [16].

Technology Platforms

Historically, investigation of RNA expression was performed using northern blotting or RT-PCR approaches, at best investigating several RNA targets at once. For several years, the use of expression microarrays has allowed rapid unbiased screening of nearly the entire transcriptome for discovery of the most promising targets. In microarray-based methods tens of thousands of transcripts are simultaneously analyzed by chemically labeling RNA molecules and subsequent hybridization to probes on the microarray. The strength of microarrays lies in the extensive coverage, the high-throughput applicability and the relative inexpensiveness of the microarray approach. However, microarray technology is limited by the amount of RNA required, the limited dynamic range for quantification and can only detect predefined transcripts. Furthermore, questions are raised about the reproducibility and reliability of microarray experiments.

Currently, we are on the brink of a new revolution, brought about by the advent of next-generation RNA-sequencing (RNA-seq). Although still prohibitively expensive, advances in RNA-seq will allow for superior scrutiny of the transcriptome, providing absolute quantification of transcripts while including splice variants, non-coding RNA and yet unknown transcripts [17]. RNA-seq uses deep-sequencing technologies whereby a population of RNA (e.g., mRNA or miRNA) is converted to a cDNA library which is subsequently sequenced in a high-throughput base-by-base manner to obtain short sequences. The reads, typically 30–400 bp depending on the DNA-sequencing technology used, are used to reconstruct the original RNA-sequence in silico [18]. The use of this so called next generation sequencing technology for the analysis of RNA has pioneered work with small regulatory RNAs, possibly because this field has benefited less from microarrays as the usual size of small RNAs is too short to be captured adequately with the limited resolution of microarrays [19]. Detailed descriptions of microarray and RNA-seq approaches are out of the scope of this work, but many excellent reviews provide a comprehensive overview, e.g., [1921].

As the technological capabilities for measuring transcript expression have vastly improved, the importance of expression data for the development of new biomarkers has soared. The opportunity for transcriptome-wide screening of biomarkers allows for unbiased investigation of their potential as an individual biomarker for disease.

Transcriptomics-based Biomarkers in Cardiovascular Disease

Recent advances in the cardiovascular biomarker field have identified novel and emerging transcriptomics-based biomarkers (Table 1). Here, we highlight examples that have started to emerge in clinical practice.

Table 1 Studies evaluating gene expression data for the use of biomarker identification for coronary artery disease

ST2 (IL-1RL-1, Interleukin 1 receptor-like 1)

ST2 represents a promising biomarker identified by a transcriptomics approach. Weinberg and colleagues [22] identified the ST2 gene as upregulated in cardiac myocytes subjected to mechanical strain by microarray analysis. Soluble ST2 is a secreted receptor belonging to the IL-1 receptor family that regulates inflammation and immunity [23]. The soluble form of the protein can be measured in peripheral blood and a test kit for measurements of soluble ST2 is already commercially available (Critical Diagnostics Presage ST2 Assay). It has been shown that ST2 levels rise above normal in the context of various cardiac diseases [24] such as heart failure [25] and ischemic heart disease [26]. In the Framingham Heart Study, measurements of soluble ST2 showed clear gender differences, an increase with age and increased levels in association with diabetes and hypertension [27] and soluble ST2 added prognostic value to standard risk factors [28]. Novel findings, however, indicate that genetic factors account for up to 40 % of the inter-individual variability of soluble ST2 levels, which must be taken into account in future studies of ST2 as a biomarker [29]. ST2 is a clear example how the initial microarray analyses identified a target as cardiac biomarker and led to the development of a suitable assay.

Growth Differentiation Factor-15 (GDF-15)

GDF-15, a distant member of the TGF-β cytokine superfamily, has been identified by gene expression microarray analyses as being massively upregulated in nitric oxide (NO)-treated cardiomyocytes [30], under oxidative stress, in pressure overloaded left ventricles of mice with aortic stenosis, and a mouse model of dilated cardiomyopathy [31]. Levels of GDF15 can be measured in serum and plasma and evidence are accumulating that GDF15 is a strong and independent predictor of mortality and disease progression in patients with established disease, such as acute coronary syndromes, angina pectoris, heart failure [32]. Moreover, circulating GDF-15 levels are independently related to intermediate cardiovascular phenotypes, including endothelial dysfunction, intima media thickness, plaque burden, and left ventricular hypertrophy and dilatation [33, 34]. Thus, measurement of GDF-15 may contribute to a refined risk assessment on top of traditional risk factors and biomarkers.

The same group that reported on GDF15 as cardiac biomarker identified follistatin-like 1 (FSTL1) as an inducer of GDF15 production and an independent biomarker in acute coronary syndrome by using an expression screen for cDNAs encoding activators of the GDF15 promoter [35]. FSTL1 had previously been indicated as a putative biomarker in chronic systolic heart failure [36] and has been discussed as a novel therapeutic target for post-myocardial infarction and acute coronary syndrome [37].

Expression Signatures

A precise gene expression signature, i.e., an RNA expression pattern, has the promise to diagnose and classify diseases and potentially guide personalized treatment decisions for patients [4]. Gene expression signatures have already been shown to accurately predict cardiomyopathy etiology in heart failure [38, 39] and to be useful in monitoring clinically significant allograft rejection [40, 41]. These data support ongoing efforts to incorporate biomarkers based on expression profiling to determine prognosis and response to therapy [38, 42].

In the Personalized Risk Evaluation and Diagnosis in the Coronary Tree (PREDICT) study, a whole blood gene expression score was developed and validated for the assessment of obstructive CAD in non-diabetic patients [43, 44]. This score is a function of the expression levels of 23 genes grouped into highly correlated terms reflecting biological processes or cell types [44] and is associated with the probability of obstructive CAD [45]. Subsequently, a multiplex assay for expression levels of the 23 gene transcripts became commercially available (Corus CAD, CardioDx, Palo Alto, CA) [45]. Multiplex tests are often complex, containing multiple sample processing steps, operators, machines and types of reagents which can affect assay variability. Assessment of the laboratory process variability showed that the Corus CAD intra-batch PCR variability contributed most to the overall variability while the reagent lot contributed most to inter-batch variability [45]. Thomas et al. [46] evaluated the diagnostic accuracy of the gene expression score to determine obstructive CAD in symptomatic patients referred for myocardial perfusion in the multicenter COMPASS study. The investigators found that the gene expression score was a significant predictor of obstructive CAD and resulted, at a predefined threshold, in a high sensitivity and high negative predictive value. Although the added value of a transcriptomics profile such as Corus CAD must be rigorously tested against current standard-of-care risk prediction and explored in different populations to define its clinical utility, the Corus CAD assay is extremely promising and one of the best examples of the value of transcriptomics-based biomarkers in the cardiovascular field today.

Circulating microRNAs

Changes in the circulating miRNA levels have been associated with cardiovascular disease [47, 48]. As PCR-based techniques for quantifying circulating miRNAs improved, studies began to explore whether miRNAs could serve as clinical biomarkers, e.g., as biomarkers of the acute coronary syndrome [49, 50], acute myocardial infarction [51], heart failure [52].

In the Bruneck study, one of the largest studies measuring miRNAs, Zampetaki et al. [53] screened levels of 19 circulating miRNAs by quantitative RT-PCR. Three miRNAs formed a signature for myocardial infarction: miR-126, miR-223 and miR-197. Those miRNAs added information to the Framingham Risk Score for the endpoint coronary heart disease and led to better patient stratification to risk categories, indicating the potential value of these miRNAs as biomarkers for cardiovascular risk prediction.

However, most published miRNAs studies were small case-control studies and should be interpreted with caution and further work in larger populations is required. Detailed overviews of the current miRNA biomarker literature are given in, e.g., [9, 54, 55].

MicroRNA Signatures

Similar to specific gene expression signatures, signatures of miRNAs may reflect a given disease state and have potential as a biomarker. Meder et al. [56•] assessed whole-genome miRNA expression in whole blood samples of patients with acute myocardial infarction (AMI); 121 miRNAs were identified to be significantly dysregulated in AMI. The predictive power of these miRNAs were evaluated by receiver operator characteristic curves, and area under the curve (AUC) values of up to 0.94 were observed for the most predictive single miRNAs, miR-1291, and miR-663b. Using an algorithm for self-learning pattern recognition, a unique 20-miRNA signature was identified that predicts AMI with higher power and better AUC compared to individual miRNAs, even at stages when troponin T was still negative. These study results implicate that miRNA signatures, derived from peripheral blood, can serve as a valuable biomarker and may improve biomarker-based diagnosis of AMI. However, it needs to be mentioned that the sample size was rather small and larger patient cohorts are needed for validation.

In a subsequent miRNA study the same group investigated the kinetics of miRNA dysregulation in serial measurements in AMI patients and confirmed a 6-miRNA signature, including five out of the 20 miRNAs identified in the previous study [57]. These serial measurements identified distinct miRNA patterns in the very early phase of AMI that resolved within the first days of successful therapy. Significant differences were seen mainly at the two earliest time points, indicating those miRNAs to be early markers of AMI. The authors hypothesize that, although the release of molecules from injured myocardium may be similar for miRNA and proteins, a whole-blood approach may provide further information because it would reflect the disease processes involved in the pathogenesis of rather than solely detecting myocardial necrosis.

Clearly, future studies are needed to examine the value of miRNA signatures as potential robust biomarkers; nevertheless, miRNAs and miRNA signatures are emerging promising new players in cardiovascular biomarker research.

Long Non-coding RNAs

Recently another class of non-coding RNAs, lncRNAs, has aroused interest in cardiovascular function and disease. Growing evidence suggest that lncRNAs are key regulatory molecules at every level of cellular physiology, and their alterations are associated with multiple human diseases [58, 59] and may provide promising new targets for biomarker identification. Despite the progress made in oncology studies that tested lncRNAs as biomarkers for, e.g., breast cancer [60], endometrial carcinoma [61] and lung cancer [62••], data on lncRNA biomarkers in the cardiovascular field is still poor and further work is essential to improve the overall understanding and value of lncRNAs as biomarkers.

Challenges in Biomarker Development

Multiple stages are required for the “pipeline” of transcriptomics biomarker discovery and development. These stages include among others i) discovery of putative biomarkers for the target disease phenotype, ii) (technical) validation of those biomarkers in various disease and population cohorts to characterize biomarker performance, and iii) subsequent testing in large prospective clinical trials before translation into clinical routine. In addition, the impact of a new biomarker on clinical outcomes in terms of efficacy and cost effectiveness is a further step that should be taken. Novel technologies have contributed to a massive increase in biomarker discovery projects and reports, however, only few have been validated for routine clinical practice [63].

Numerous excellent reports are published providing a comprehensive overview of pitfalls and challenges for biomarker discovery and translation, e.g., [55, 6367]. Here, we briefly review the key challenging points (summarized in Fig. 1).

Fig. 1
figure 1

Challenges in transcriptomic biomarker development. Figure depicts main steps in Biomarker discovery and development and associated challenges to overcome

Study Design

An appropriate study design is a foremost requirement for reliable transcriptomics-based biomarker identification, ensuring adequate sample size for analysis and accounting for possible confounders. We recently showed that age, gender, body mass index, inflammatory status, and smoking influence gene expression [68]. Likewise, consideration should be given to the influence of cardiovascular risk factors, ethnicity, and medication on gene expression [4]. In addition, common gene variants (i.e., single-nucleotide polymorphisms) and epigenetic patterns can influence gene expression [4]. To achieve adequate statistical power, large sample sizes, accurate clinical phenotyping and well-characterized populations are mandatory [64, 69]. Another primary consideration in study design is the choice of tissue or cell type to investigate. Due to the ease of access, circulating blood is often used as surrogate source of diseased tissue. However, it is unclear whether the blood transcriptome is suitable as a surrogate for tissues like, e.g., heart tissue. One needs to consider that whole blood contains a mixture of cell types whose proportions show inter variability and may alter depending on disease state [70].

Animal models and in vitro experiments are still important methods employed for biomarker research. However, the translation of these studies toward clinical application is difficult and could lead to false targets. Comparison of transcriptomics data from ex-vivo monocytes and the in vitro monocytic THP-1 cell-line showed important differences [71]. Likewise, recently Seok et al. showed that human inflammatory expression profiles where highly similar between various causes of inflammation, yet very different from mice inflammatory expression profiles [72••]. This indicates that great care must be taken when translating such results into the clinical setting.

Analytical Considerations and Standardization

In contrast to genomic data, a subject’s gene expression data will vary spatially and temporally. To reduce confounding factors influencing gene expression data, such as different sample preparations and differences in the PCR runs, gene expression data have to be normalized. This is a critical issue and a major concern in transcriptomics studies. Especially for circulating miRNA measurements normalization is a “hot topic” in the current discussion, and several normalization approaches are used such as quantile-quantile normalization or spike-in of artificial RNA material [20]. However, normalization is currently applied in a non-standardized fashion and application of universal reference material is required. Furthermore, variation caused by preanalytical and analytical factors can substantially influence gene expression data [4]. Schurmann et al. [73] showed that factors such as RNA quality, storage time of blood, and batches of RNA processing and amplification have strong influence on gene expression data. Other studies provide evidence for the variability inherent to the PCR process and about batch effects in high-throughput technologies [45, 74]. In addition, numerous variables have been shown to influence the detection of miRNAs in the preanalytical phase such as heparin [75••] and can lead to erroneous results [76]. This can be particularly challenging in the clinical setting, as differences in sample collection, sample processing, and assay performance in different clinical centers are to be expected. Therefore, to eliminate technical and analytical variability and avoid artifactual data generation, consensus on standard methods for all steps is imperative.

Validation

Validation of initial discovery results in independent, large-scale studies are required in the field of biomarker research. Ideally, results of transcriptomics analyses will be validated in multi-center real-world studies, even comprising decentralized processing of RNA and PCR analysis and optimization of (decentralized) clinical laboratory testing procedures [4]. After validation of the initial expression results, the putative biomarker must be rigorously tested against the existing standard of care and explored in a wider population to define its clinical utility.

Another aspect that will become increasingly important is the validation of biomarkers for specific subgroups. It has been common practice for clinical laboratories to use specific reference values for several important subgroups like men and women or children and adults, when evaluating diagnostic markers. However, it is uncommon to determine the predictive value of a biomarker for specific subgroups. This is about to change, as it is clear from the recent recommendations on cardiovascular risk-assessment by the ACC/AHA, stating that race- and sex- specific risk-assessment is highly recommended [3].

Multidisciplinary Approaches

Getting candidate biomarkers into large-scale validation studies requires the integration of diverse skills. Most biomarker discovery is conducted in labs lacking the resources and multidisciplinary expertise needed [63]. Therefore, biomarker discovery should be a component of large research networks, involving industry and experts in distinct fields such as molecular biology, analytical chemistry, bioinformatics, clinical-trial design, epidemiology, statistics, and health-care economics [63]. Several collaborative initiatives have emerged in recent years to orchestrate biomarker research efforts (including transcriptomics-based biomarkers). These include, among others, the Innovative Medicines Initiative (IMI) (www.imi.europa.eu/) and the BiomarCaRE Consortium (www.biomarcare.eu), both funded by the European Union.

Transcriptomics, Genomics, and Epigenomics

The current trend in biomarker research is increasingly focused on the discovery of causal biomarkers indicative of changes in pathophysiologic processes that are the basis of the complex disease and a potential target for drug development. GWAS provide an important tool to reveal causality through the principle of “Mendelian randomization”. Zacho et al. is a case in point, showing that genetically raised CRP levels did not influence risk of myocardial ischemia [77]. Another clear example is the recent landmark paper by Voight et al. which showed that genetic predispositions that raised HDL-cholesterol levels had no influence on disease outcome, as opposed to genetic alterations in LDL-cholesterol levels [78]. The method of ‘Mendelian randomization’ is also well-suited to indicate causality of transcriptomics-derived biomarkers.

GWAS has found many single nucleotide polymorphisms (SNPs) affecting disease, yet the complex mechanisms through which they exert their effect, is still largely unknown, as many appear in non-coding regions of the genome. SNPs which influence mRNA expression are known as expression Quantitative Trait Loci (eQTL).

SNPs associated with complex diseases are more likely to be eQTLs compared to other SNPs and 45 % of genes associated with CVD contain eQTLs [79, 80]. SNPs also influence known risk factors of cardiovascular disease, for example lipoproteins, for which 96 eQTLs have been found in 157 known loci [81, 82]. This shows that eQTLs may be an important mechanism for cardiovascular risk SNPs, and emphasizes the importance of transcriptomics for the interpretation of GWAS results.

In addition to genetic biomarkers, epigenetic DNA modifications like DNA methylation and histone modifications could serve as biomarkers of disease. Most interest has recently been directed at DNA-methylation biomarkers, enabled by development of ‘epigenome-wide’ DNA-methylation arrays. To elucidate the tissue specific down-regulation of gene expression by DNA-methylation in a high-throughput fashion, transcriptomics are indispensable. In a recent study, Grundberg et al. compared DNA-methylation to GWAS and transcriptomics data and found that 28 % of methylation quantitative trait loci (meQTL’s) are associated with nearby SNPs, and 6 % of SNPs played a role in both DNA-methylation and adipose tissue gene expression [83], showing the complex interplay between genetic variants, methylation, and expression.

In addition, SNPs may also influence the expression of mRNA through interference with non-coding RNA (ncRNA) regulatory activity. For example, Gamazon et al. analyzed the effects of SNP’s on expression (mRNA-eQTL) and microRNA expression (miRNA-eQTL) and showed significant enrichment of miRNA-eQTLs in known mRNA-eQTLs, thereby providing important evidence for specific miRNA-mRNA interactions. Furthermore, many of the found SNPs were associated with traits of complex diseases [84•]. In an identical fashion, Kumar et al. identified SNPs that influence lincRNA expression, and showed associations of these SNPs with complex diseases [85]. This indicates that dysregulation of transcriptome interactions could be an important disease mechanism, and may thus form interesting biomarker targets.

Future Perspectives

Despite a tremendous increase of interest in the transcriptome, we are only just scratching the surface of its complexity. To fully elucidate the transcriptome requires robust sample processing as well as advances in technology and analysis methods.

Whole transcriptome RNA sequencing is still in its infancy yet new developments seem very promising. Meanwhile, several companies acknowledge the trend for multimarker diagnostics, and have developed custom expression arrays and multiplex PCR solutions suited for clinical application. Improvements in microfluidics lead to reduced sample volume requirements, smaller machines and laboratory set-ups and will soon culminate in lab-on-chip solutions.

Advances in analysis methods require standardization of data normalization and optimal modeling [86]. An increasingly important strategy of in silico modeling is the systems biology approach [87]. It combines data at various biological levels (e.g., genomic, epigenomic, transcriptomic, and proteomic) to identify targets of interest (Fig. 2). In addition, it sheds light on the relation of the target biomarker to other markers, paving the way for in silico pathway analysis and enabling the identification of pathological pathways [88].

Fig. 2
figure 2

Transcriptomics for biomarker discovery. Simplified schematic of relevant transcriptome interactions for current biomarker development. Large studies are required to elucidate the complex interactions of the genome and epigenome with the transcriptome and subsequently the proteome. Bullets denote contemporary techniques. eQTL, expression quantitative trait loci; meQTL, methylation quantitative trait loci; mRNA, messenger RNA; ncRNA, non-coding RNA

As new biomarkers emerge on the horizon, improved risk prediction will have to be translated into increased health benefits from therapeutic intervention. This is especially interesting for causal biomarkers, which can themselves act as a target for novel drug development. Furthermore, companion diagnostics indicating individual drug efficacy, will likely take a more prominent role, as we progress toward personalized medicine.

Conclusion

Over the last years, gene expression analyses strongly influenced the area of biomarker identification and development in the cardiovascular field. Several potential biomarkers have been identified including gene expression signatures and non-coding RNAs, and a few have been translated into clinical utility. However, several aspects in the “transcriptomics pipeline” of biomarker development deserve consideration, ranging from appropriate study design and material to analytical methods, standardizations, most importantly, and validation. Finally, to reach clinical application of the biomarker, fundamental questions about the clinical potential need to be evaluated as outlined by Morrow and deLemos [89]: i) can the clinician measure the biomarker?, ii) does the biomarker add new information?, and iii) does the biomarker help the clinician to manage patients?.