Introduction

Papillary thyroid carcinoma (PTC) is the most common endocrine malignancy and encompasses a variety of morphological/architectural variants, all of which are characterized by a distinctive nuclear appearance. In recent years, PTC has become an important paradigm of solid tumour molecular pathogenesis principally arising from intensive investigation prompted by the Chernobyl accident.

The discovery of ret rearrangements [12, 13, 35] and their association with radiation [42] was followed by the demonstration of the BRAF V600E mutation [21, 30] found more commonly associated with sporadic PTC in non-radiation exposed populations [22, 32]. In the past, our group and others have noted an association between classic morphology and the BRAF V600E mutation and between variant morphology and ret rearrangements particularly ret/PTC-3 [11, 39]. Similarly ret/PTC-3 appears to strongly correlate with the solid/follicular variant seen commonly in children exposed to the Chernobyl fallout [42]. Nevertheless, the utility of these genetic lesions to diagnostic pathology and clinical practice has remained negligible.

Recently, gene expression microarray technology has been used to attempt to identify clinically relevant biomarkers of malignancy related to the thyroid [2, 6, 9, 10, 20, 28]. The discovery of such a biomarker or panel of biomarkers allied to the gold standard triage method of fine needle aspiration cytology (FNAC) would represent a significant advancement in the treatment of the solitary thyroid nodule.

An intriguing but commonly occurring variant of PTC is known as follicular variant (FVPTC). This lesion, which by definition retains the classic nuclear features of PTC, shows no evidence of the architectural papillae. FVPTC may be a controversial lesion due to interobserver variation in its pathological diagnosis [25]. Further, the occurrence of follicular patterned lesions with poorly or incompletely developed nuclear features may occur, which are easily dismissed as benign thyroid nodules. This has led to the controversial designation “well-differentiated tumour of uncertain malignant potential” for tumours of this type [45].

It is clear that this is a complex and contentious area, and that further work needs to be done to ascertain the underlying molecular biology of this particular variant. Recently, inroads into elucidation of molecular pathways underpinning PTC have been carried out using microarray studies. The overriding objective of these investigations was to identify clinically useful biomarkers. However, the majority of these studies have analysed PTC as though it were a homogenous singular entity without deference in a detailed manner to sub-variants and, in particular, the most common variant (FVPTC). The identification of specific biomarkers of FVPTC and a deeper understanding of its origins are clearly warranted.

The aim of this expression microarray study using a novel microarray platform was twofold: to identify markers that distinguish PTC from benign thyroid tissue and lesions and, secondly, to identify potential markers and further explore the molecular pathology of FVPTC.

Materials and methods

Patients and tissue samples

Tissue from 25 thyroid resections was collected prospectively from patients undergoing partial or total thyroidectomy for a variety of reasons at St. James’s Hospital, Dublin. The study had approval of the local ethics committee and informed consent was obtained from each patient by the clinical team before surgery. Small samples (<1 cm) were divided and immediately snap-frozen in liquid nitrogen for storage at −80°C until use. Histopathological examination of formalin-fixed paraffin-embedded sections was performed by two pathologists (SF and MT) for diagnostic categorisation. Classification of neoplastic tissue was made according to a recognised system [24]. The cohort comprised 11 benign lesions or conditions including follicular adenoma, nodular goitre, normal thyroid tissue, and Graves’s thyroiditis. The remaining 14 samples were diagnosed as PTC and comprised six classical morphology PTC and eight FVPTC (see Table 1). Immediately before RNA extraction, frozen sections were cut and stained to confirm the presence of representative lesional tissue with morphology corresponding to that noted in the diagnostic formalin-fixed paraffin-embedded sections.

Table 1 Sample cohort

RNA isolation and characterization

Samples were ground in liquid nitrogen and homogenised in RLT buffer (Qiagen, UK). RNA was then extracted using the RNeasy Mini Kit with optional on-column RNase-free DNase digestion (Qiagen) according to the manufacturer’s instructions. RNA quantity was determined using UV spectrophotometry. RNA quality was assessed using the RNA 6000 Nano LabChip® Kit in conjunction with the Agilent 2100 Bioanalyzer (Agilent Technologies, Waldbronn, Germany).

Microarray analysis

Applied Biosystems Human Genome Survey Arrays were used to analyse the transcriptional profiles of thyroid RNA samples in this study. Digoxigenin-UTP-labelled cRNA was generated and linearly amplified from 5 μg of total RNA using Applied Biosystems Chemiluminescent RT-IVT Labelling Kit v 2.0 using manufacturer’s protocol. 10 μg of labelled cRNA were hybridized to each pre-hybed microarray in a 1.5-ml volume at 55°C for 16 h. Array hybridization and chemiluminescence detection were performed using Applied Biosystems Chemiluminescence Detection Kit following manufacturer’s protocol. Images were collected for each microarray using the 1700 analyser. Images were auto-gridded and the chemiluminescent signals were quantified, corrected for background and spot, and spatially normalized.

TaqMan® PCR validation

Sufficient RNA remained from 20 of the initial 25 samples for TaqMan® polymerase chain reaction (PCR) validation in a series of 50 targets. RNA was reverse transcribed using a High Capacity cDNA Archive Kit (Applied Biosystems, CA, USA). Primers and probes for TaqMan® PCR were obtained by using Applied Biosystems’ pre-designed TaqMan® Gene Expression Assays. PCR was carried out using an ABI PRISM 7900 Sequence Detection System (Applied Biosystems). Analysis of relative gene expression data was performed using the ΔΔCT method [23] with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as an endogenous control/reference assay.

Statistical analysis

Microarrays were analysed using Spotfire DecisionSite™ for Functional Genomics (Spotfire AB, Goteborg, Sweden) and R version 1.9.1, a free language and environment for statistical computing and graphics (R Development Core Team, 2004). Arrays were initially normalized, and genes were deemed undetectable and, therefore, excluded from final gene lists if they had a signal-to-noise ratio of less than three (S/N < 3) in greater than 18 of the 25 arrays.

An ANOVA test was used to generate p values for statistical differences between groups. Their p values were then adjusted for multiple comparisons using the technique described by Benjamini and Yekutieli [3]. Genes were deemed statistically different between groups if they had an adjusted p < 0.05 and an average fold-change difference of greater than 2. Hierarchical clustering was performed based on the statistically different genes to determine whether samples grouped appropriately. Gene ontology analysis was performed using an online database known as the Panther classification system (http://www.pantherdb.org). Correlations between microarray and TaqMan® expression data were measured using the Pearson coefficient.

Results

Unsupervised clustering of all 25 samples demonstrated clustering into two major groups (data not shown) comprising the N group and the T group (see Table 1). There was no tendency for FVPTC to cluster independently with classic morphology PTC, confirming the close relationship of these variants of PTC. To identify potential markers of malignancy, ANOVA with false discovery rate correction was used to compare the benign and malignant thyroid cohorts. A p value cut-off of <0.05 and fold-change difference of ≥2 yielded 236 statistically significant probes. Of these, 172 corresponded to fully annotated probes and are listed in Table 2.

Table 2 Genes differentially expressed in malignant vs benign thyroid tissue

Supervised hierarchical clustering was performed on the 25 arrays based on the 236 statistically significant probes to determine whether the samples would segregate appropriately. The resulting heat map can be observed in Fig. 1. Benign lesions and tumours clustered together with the exception of one tumour sample that clustered with the benign group (T5).

Fig. 1
figure 1

Hierarchical clustering of samples. This heat map shows the clustering of the 25 samples based on the 236 probes found to be differentially regulated in benign vs malignant thyroid tissue. Clustering was performed using the unweighted pair group method with arithmetic mean, with Euclidian distance as the similarity measure. Average value was used as the ordering function

A binomial statistics tool was used to compare classifications of multiple clusters of lists to a reference list (i.e. the complete human genome) to statistically determine over- or under-representation of Panther classification categories. Biological processes over-represented in the up-regulated PTC cohort included tumour suppressor, oncogenesis, DNA replication, cell cycle, and cell adhesion (p < 0.0001). Genes involved in homeostasis and other homeostasis activities were highly over-represented in the down-regulated cohort (p < 0.000001).

ANOVA tests were used to determine which genes were differentially regulated in the FVPTC cohort only. Fifteen genes were identified, including cluster of differentiation 14 (CD14), CD74, CTSC, CTSH, CTSS, DPP6, ETHE1, human leucocyte antigen A (HLA-A), HLA-DMA, HLA-DPB1, HLA-DQB1, HLA-DRA, osteoclast stimulating factor 1 (OSTF1), TDO2, and a previously uncharacterized gene (noname).

Microarray results were validated using a reverse transcription reaction followed by TaqMan® PCR for 50 gene targets. The ΔΔCT method [23] was used to analyse relative gene expression data. GAPDH was used as an endogenous control, and T12 was chosen as an arbitrary calibrator sample. Gene expression profiles for TaqMan® PCR were plotted in conjunction with those for microarray results in Fig. 2. Pearson co-efficient was used to directly compare data from microarray analysis and TaqMan® RT-PCR. Table 3 depicts genes differentially expressed in both benign vs malignant and FVPTC vs classic morphology PTC.

Fig. 2
figure 2

TaqMan® PCR validation of microarray experiments. Profile charts of gene expression levels comparing results obtained by microarray analysis (n = 25) to TaqMan® PCR analysis (n = 20) for six genes. Plots for those genes with multiple probes are also displayed where appropriate

Table 3 Correlation between TaqMan® and microarray data

Discussion

The primary aim of this study was to generate an overview of molecular markers of malignancy in PTC with a view to identifying discriminators between common sub-types (classic PTC and FVPTC), using genome-wide expression microarray technology validated by TaqMan® RT-PCR. To this end, lesions that were well characterized histologically were selected.

The application of microarray analysis designed to identify transcripts strongly associated with each group of interest yielded a gene list of 173 genes that were differentially expressed between cohorts. Significant down-regulation of Coxsackie virus receptor was recorded in the malignant cohort of thyroid lesions. The Coxsackievirus B and adenovirus receptor (CAR) plays a dual role as a homotypic junctional adhesion protein and as a viral receptor. It is biologically plausible that altered expression may impact on the morphology peculiar to PTC given its association with cellular adhesion. CAR has been shown to be differentially expressed in various human adenocarcinomas, and differential expression may represent a new factor in thyroid tumourigenesisigenesis [27].

Rab 23 expression was also down regulated in the malignant cohort. The Rab small G protein family is composed of approximately 40 members. Many of them are ubiquitous and are expressed and participate in transport processes, such as endocytosis and exocytosis [26]. Other gene targets demonstrating significant down-regulation in the malignant group were syndecans 2 and 3. Syndecans are transmembrane proteoglycans expressed on adherent cells. Changes in syndecan expression have been postulated to influence cell adhesion, migration, and the structure of focal contacts and the cytoskeleton [8].

The abundance of metallothionein genes in the cohort of genes down-regulated in PTC is interesting partly because of the sheer number of isoforms detected (Table 2). There have been many studies showing increased metallothionein expression in a plethora of cancer types but few showing decreased expression [4, 7, 33]. Although previous microarray experiments showed metallothionein genes to be down-regulated in thyroid tumours to a certain extent [2, 9, 16], none have detected so many as the current study.

Apart from microarray experiments, there has been little reported in the literature regarding metallothioneins in thyroid cancer. An early report by Nartey et al. [31] showed metallothioneins to be expressed actively in certain human thyroid neoplastic tissues but not in normal thyroid tissue, which would seem to contradict the current authors’ findings. In contrast to this, a later immunohistochemical study showed an absence of metallothionein expression in 13 of 20 PTCs [36]. Interestingly, in three of the seven positive PTCs, metallothionein positivity was restricted to areas of follicular differentiation. In one of the only recent studies, Huang et al. [17] followed up their initial microarray experiment by showing that MT1G is down regulated in PTC via hypermethylation. The biological significance of low metallothionein expression in thyroid tumours is therefore still poorly understood; however, it is interesting to speculate that metallothioneins may have roles as tumour suppressors in thyroid carcinoma.

Many genes identified, such as LGALS3 [9, 16, 20], LYN [46], TFF3 [2, 9, 16], CRABP1 [9, 16], BAX [2], MAPK4 [28], CD44 [16], TIMP1 [20], FGFR2 [9], and S100A11 [20, 40], have been previously reported in both microarray and conventional experiments in thyroid cancer. The data generated in this study corroborates the importance of several biological processes in the progression of thyroid neoplasia. For example, S100A11 expression was up regulated in the PTC cohort compared with benign lesions, paralleling the increased protein expression of this gene target identified at the protein level using immunohistochemistry [29]. S100A11 has also been suggested as a biomarker of malignancy in the context of colorectal carcinoma as long ago as 1995 [41].

Correlation of highlighted features with the current state of knowledge of the molecular pathology of thyroid neoplasia goes some way towards providing an external validation of the data obtained from the AB1700 system. However, additional validation using TaqMan® RT-PCR was performed.

In general, microarray and TaqMan® data correlated well with approx. 80% of comparisons having p < 0.05. Candidate genes were selected contingent on results identified as over-represented biological processes (oncogenesis, cell cycling, DNA replication, and homeostasis) using the Panther binomial statistics tool as opposed to the more traditional method of selecting the most highly disregulated genes. This may account for the poor correlation observed with certain genes. Some genes, such as TFF3 and FGFR2, had excellent correlation between microarray and TaqMan® results, whereas others, such as BAX and TSHR, showed poor correlation despite previous studies implicating them in thyroid cancer [15, 38]. Where there was discordance in the data, it could be accounted for by differences in the sequences targeted by the TaqMan® target sequences and the microarray probe. In those cases, the array and pre-designed TaqMan assays interrogated different exons or alternative splice variants. This finding highlights the importance of matching the targets to be validated from microarray data sets.

Analysis of differentially expressed transcripts in the FVPTC revealed many transcripts showing similar expression level patterns in both FVPTC and classic morphology PTC. Examples of these transcripts are included in Table 2. In addition, somewhat surprisingly, unsupervised clustering of all samples showed no tendency for FVPTC to cluster independently of classic morphology PTC, emphasizing the very close relationship of these PTC variants (data not shown). However, analysis of differentially expressed genes in FVPTC exclusive of those identified in classic morphology PTC revealed 15 genes exhibiting differential expression in FVPTC compared with benign lesions and conditions outwith of classic morphology PTC. These genes displayed a narrow gamut of function represented by the transcripts involved (Table 3). TaqMan® RT-PCR was performed for all of these targets to confirm the array findings and correlation with the array data was strong (see Table 3).

Aberrant expression of two major groups of transcripts was noted in FVPTC. Relatively increased expression of class 1 major histocompatibility complex (MHC) genes (HLA-A) and aberrant expression of class 2 MHC genes (HLA-DMA, HLA-DPB-1, HLA-DQB-1, HLA DRA) and associated genes (e.g. CD74 represents the invariant membrane bound moiety of class II HLA molecules and regulates the biology and functions of MHC class II molecules and CD14 is a surface marker of monocytes/macrophages) were the most significant findings. Additionally, relatively up-regulated expression of members of the cathepsin family (cathepsin C, cathepsin H, cathepsin S, and TDO2) was striking in the FVPTC group. HLA expression is generally associated with immune functions such as T cell interaction and antigen presentation. The presence of prominent HLA transcript expression, especially among class 2 in FVPTC, is intriguing. One potential cause of this was that tumour-infiltrating leucocytes were responsible for this finding. However, haematoxylin–eosin (H&E)-stained slides of each case were reviewed to specifically identify the degree of tumour infiltration by leucocytes. Although a mild lymphocytic infiltrate was noted in some cases, there was no apparent over representation of lymphocytes in the follicular variant compared to benign lesions and classic morphology PTC. This raises the clear possibility that the findings represent aberrant increased expression of class 2 HLA transcripts by the epithelium of FVPTC. This is an unexpected finding, as over-expression of MHC class 2 molecules would be expected to increase tumour immunogenicity. A similar aberrant expression of HLA transcripts has been recently described in ovarian neoplasms [34]. Rangel et al. [34] concluded over-expression of HLA-DRA might represent a novel biomarker for malignancy, and this also seems biologically plausible in the FVPTC setting. A recent paper has also described HLA-DRA expression in ret/PTC-activated papillary thyroid carcinoma but not in surrounding normal thyroid follicles [19]. Yu et al. [47] showed discordant expression of (CD74Ii) and HLA-DR in Hashimoto thyroiditis, an autoimmune condition associated with increased incidence of PTC and sharing molecular features such as ret/PTC expression [18, 37]. Hwang et al. [19] have drawn attention to aberrant expression of HLA-DRA in ret/PTC-activated PTC. Expression of HLA-DRA may in some way explain the propensity for PTC to metastasize to lymph nodes and often apparently reside there without markedly worsening prognosis.

Cathepsins C (dipeptidyl-peptidase I), H, and S showed up-regulation in FVPTC compared to benign lesions. Cathepsins are a family of proteases that play an important role in protein degradation. They are key players in the proliferative, invasive, and metastatic potential of malignant tumour cells. Their expression in the relatively biologically indolent FVPTC is intriguing, and it remains possible that cathepsins have cellular roles outside of those involved in invasion and dissemination of tumour cells as indeed has been suggested by others [43]. For example, cathepsin L has recently been shown to play a role in nuclear transcriptional activation, and cathepsins are now recognized to play a role in MHC class II antigen presentation [44]. OSTF1 has no defined role in carcinogenesis, although outside of its role in ossification, it is also known to have roles in signal transduction and protein binding, which may be relevant to carcinogenesis and, particularly, FVPTC. A recent paper highlights the role of bone mineralization proteins including osteopontin and osteoclast stimulating factors as potential biomarkers of malignant tumours in general [1]. Indeed, in addition to elevated OSTF1 expression, increased expression of osteopontin was seen in PTC (data not shown); however, in our data, osteopontin does not appear to be specifically up-regulated in FVPTC, and this finding has also been noted by other researchers [14]. In any case, osteopontin is known to be a downstream effector of ret/PTC [5] and mutated BRAF [14], where it acts in association with CD44, another transcript showing increased expression in both classic morphology PTC and FVPTC.

A particular focus of this study was to compare transcriptome profiles for PTCs with classic morphology and FVPTC given the propensity for FVPTC lesions to prove problematic from a diagnostic perspective. Although the study confirms the close relationship between the two most common variants of PTC, a narrow portfolio of genes and, in particular, gene functions was elucidated in the FVPTC cohort. The targets identified are easily amenable to analysis by more established techniques such as TaqMan® RT-PCR, with associated potential as additional markers for application in the FNAC setting. Clearly, the potential biomarkers identified in this study will require prospective evaluation in the context of real clinical diagnostic situations in the future to consolidate their merit as adjunctive tests in the diagnostic setting and to validate their altered expression states in the pathobiology of PTC development.