figure b

Introduction

Diabetes is a global health threat with 463 million adults living with diabetes [1]. Gestational diabetes mellitus (GDM) is one of the fastest-growing types of diabetes, characterised by high circulating glucose concentrations during pregnancy. Although the exact pathophysiology is unclear, GDM is thought to arise from insufficient insulin production or impaired insulin action [2, 3]. Several factors are associated with the development of GDM and include advanced maternal age, being overweight/obese, ethnicity and previous history of GDM [4]. GDM usually resolves at the end of pregnancy. However, it is a significant risk factor for several short-term complications during pregnancy and delivery as well as long-term complications including type 2 diabetes and obesity in mothers and their offspring [4, 5]. In a large epidemiological study, where one million pregnant women were followed from their pregnancy, GDM alone increased the risk of future type 2 diabetes (HR  12.77 [95% CI 12.44, 13.10]) [6]. Likewise, systematic reviews and meta-analyses demonstrate that women with GDM alone have a tenfold higher risk of future type 2 diabetes [7, 8]. It is therefore essential to identify biomarkers that can stratify the risk of type 2 diabetes development in women with previous GDM. Early identification can help to prevent or delay the development of type 2 diabetes through maintaining a healthy lifestyle or preventative medication [9, 10].

Current guidelines and practices for screening involve an OGTT between 6–12 weeks postpartum followed by regular measurement every 1–3 years of fasting plasma glucose, OGTT or HbA1c [11, 12]. However, the accuracy of these screening tests in stratifying women with previous GDM to type 2 diabetes progressors or type 2 diabetes non-progressors is low, one of the reasons being low compliance by these women for follow-up testing [13, 14]. Several other metabolites are proposed to improve the predictive potential of current screening tests including 1 h plasma glucose. To improve the predictive potential of current screening tests, 1h plasma glucose >8.6 mmol/l (>155 mg/dl) during an OGTT has been proposed [15]. Studies from our group have indicated that postpartum levels of a combination of three lipid species, or IGF1 and IGF binding protein IGFBP2, or C-peptide and ghrelin improve stratification ability [16,17,18] in comparison with traditional risk factors such as age and BMI. Several other metabolites have also been demonstrated to have predictive potential for future type 2 diabetes including betatrophin, 2-, 3-hydroxybutyrate and IL1 receptor antagonist measurements during pregnancy, liver enzymes at 1 year postpartum, and global histone H3 dimethylation in blood cells at 10–20 weeks postpartum [19,20,21,22,23]. Machine learning algorithms have also been used to derive signatures of metabolites or lipid species at 6–9 weeks postpartum, which provide a significantly higher AUC in a receiver operating characteristic (ROC) curve analysis [24, 25] compared with standard clinical glucose measurements for prediction of future type 2 diabetes. To the best of our knowledge, circulating microRNA (miRNA) biomarkers have not been reported as predictors for future type 2 diabetes in women with GDM.

miRNAs are a group of small (~22 nucleotide) single-stranded RNA molecules that do not code for any protein and can be both a biomarker and a mediator of physiological and pathological processes. They act post-transcriptionally to modulate the expression of target genes, including those involved in physiological and disease processes, through a mechanism that inhibits the expression of proteins by interfering with mRNA translation and stability [26, 27]. Circulating miRNAs are found in all biological fluids (e.g. plasma, saliva and urine), usually in association with the RNA-induced silencing complex (RISC), lipid moieties or exosomes, and are good biomarkers of disease progression due to their stability in plasma, frozen/biobanked samples and the availability of sensitive and efficient methodologies to profile them in the laboratory [28, 29]. We, therefore, decided to investigate whether miRNA profiles of women with GDM are different at 12 weeks postpartum in type 2 diabetes progressors vs non-progressors and are predictive of future type 2 diabetes. Here, we assessed plasma miRNA levels in 103 women with GDM at 12 weeks postpartum to predict their progression to type 2 diabetes in the next 10 years.

Methods

Participant recruitment and sample preparation

The study was approved by the Mercy Health Research and Ethics Committee and all participants completed a written informed consent document. Only women with a previous index GDM pregnancy, aged at least 18 years and English speaking were consented for this study. Women with pre-existing diabetes were excluded.

A total of 103 women were recruited at the Mercy Hospital for Women during their first pregnancy with GDM between June 2003 and December 2005 and followed up every 1–2 years for up to 10 years. All details of the participants have been published previously [17, 30]. At 12 weeks postpartum, a fasting blood sample was collected in EDTA tubes and plasma was separated by centrifugation at 1000 g for 10 min. Plasma was then supplemented with 0.1 mmol/l phenylmethylsulfonyl fluoride protease inhibitor (USB, Cleveland, OH, USA) and immediately stored at −80°C for further use.

Diagnosis of GDM and type 2 diabetes

GDM diagnosis was made by an OGTT at 24–28 weeks of gestation, according to the Australasian Diabetes in Pregnancy Society (ADIPS) guidelines of the period, with a fasting venous plasma glucose concentration of ≥5.5 mmol/l and/or ≥8.0 mmol/l glucose 2 h after a 75 g oral glucose load. All women that were clinically diagnosed with GDM had regular dietitian consultations and were advised to follow the recommended Standard of Care diet for controlling their blood glucose (40% carbohydrate, 15% protein and 45% fat). An OGTT was performed at 12 weeks postpartum, and only the women who had a normal glucose tolerance (NGT) at that time were further evaluated for this study. During the postnatal OGTT, women had their weight and height measured and a blood sample was taken for molecular biomarker evaluation. Study participants were followed up every 1–2 years for up to 10 years with further OGTTs. Participants were classified as having progressed to type 2 diabetes (hereafter called progressors) or as not having progressed to type 2 diabetes (hereafter called non-progressors), based on the ADA criteria [31], which remained consistent throughout the duration of this study. Upon type 2 diabetes diagnosis, women were referred to their doctor for further specialist care. Among the non-progressors, six participants had impaired glucose tolerance (IGT) while the remaining 76 participants had NGT as per the ADA criteria at the end of follow-up.

Sample randomisation and distribution

The sample size for discovery was selected based on a previous study [32], and the effect size was considered as 2 to account for a minimum 1 Ct value difference between two groups. A 1 Ct value difference is the smallest change that can be measured by PCR and reflects a twofold increase (duplication) of template. The remaining samples for the validation study were ensured to contain at least 10% progressors and offer 80% power to measure a twofold difference between the groups considering the SD to be less than 35% of the means. The actual SDs were found to be less than 15% of the mean. Study samples were randomised using the RAND feature in Excel and deidentified for miRNA measurement.

RNA isolation

Total RNA was isolated and purified from plasma samples using TRIzol reagent (ThermoFisher Scientific, USA) followed by the RNeasy-HT Kit (QIAGEN, Hilden, Germany) on a QIAcube-HT automated robotic RNA isolation platform (QIAGEN). We added 500 μl of TRIzol reagent (ThermoFisher Scientific), 1 μl of 10 ng/μl glycogen (Sigma-Aldrich, Hamburg, Germany) and 2.5 μl of 50 nmol/l ath-miR-172a (a synthetic spike-in miRNA; Sigma-Aldrich) to 100 μl of plasma. The aqueous phase was separated after chloroform addition and it was used on the QIAcube-HT platform following the manufacturer’s protocol for RNeasy-HT Kit (QIAGEN). The concentration and 260/280 ratios for the extracted RNA were measured using the NanoDrop spectrophotometer (ThermoFisher Scientific). RNA was immediately converted to cDNA using miRNA reverse transcription kit or stored at −80°C until further use.

miRNA PCR using OpenArrays

We measured 754 miRNAs in plasma from discovery cohort samples consisting of non-progressors (n = 11) and progressors (n = 10), using TaqMan-based quantitative PCR (qPCR) on a high-throughput OpenArray platform [33, 34]. cDNA was prepared using Megaplex Human RT Primers (pool A and pool B) and TaqMan miRNA RT Kit (ThermoFisher Scientific) with 50–100 ng RNA input. Pre-amplification was performed using Megaplex PreAmp Primers and TaqMan PreAmp Master mix for 12 cycles. Diluted pre-amplified cDNA (diluted 1:40 with 0.1 × TE buffer containing 10 mmol/l Tris-HCl (pH 8.0) and 0.1 mmol/l EDTA) and TaqMan OpenArray PCR master mix were loaded onto TaqMan OpenArray Human miRNA Panel using AccuFill system. qPCR was carried out using the QuantStudio 12 K Flex System (ThermoFisher Scientific). All specialised reagents for OpenArray are from ThermoFisher Scientific. Resulting data were uploaded to Thermo Connect software and normalised globally as well as filtered to remove datasets with amplification (Amp) score <1.24 and Cq CI < 0.6. The normalised Ct values for each miRNA were then compared in all analyses. All miRNAs that did not show any amplification during PCR were considered undetected/not expressed and assigned a Ct value of 39, which corresponds to the limit of detection of the PCR system [35]. Results were calculated using the ΔCt method (difference in normalised Ct values between progressors and non-progressors).

Quantitative real-time PCR for miRNA validation

For validation, 15 miRNAs selected through discovery data analysis and a spike-in control (ath-miR-172a) were measured using individual miRNA PCR assays (electronic supplementary material [ESM] Table 1) on ViiA7 real-time PCR system (Life Technologies, Foster City, CA, USA). cDNA was synthesised from 10 ng of total RNA using pooled Human miRNA RT Primers and TaqMan miRNA RT Kit. The cDNA was pre-amplified using pooled Human miRNA PCR Primers and TaqMan PreAmp Master mix (all from ThermoFisher Scientific) for 16 cycles. Human miRNA RT and PCR primer pools were created as recommended by the manufacturer using TaqMan miRNA assays. qPCR was then performed using 1:8 diluted pre-amplified cDNA, TaqMan miRNA PCR Primer/Probe mix and TaqMan Fast Universal PCR Mastermix (2×) on ViiA7 in optically clear 96-well plates. Data were exported to Excel, normalised to spike-in control (ath-miR-172a), and then compared between two groups using the ΔCt method (difference in normalised Ct values between progressors and non-progressors).

Statistical analysis

Statistical analyses were performed using Microsoft Excel (ver. 2016; Microsoft, Redmond, WA, USA), the R software (ver. 3.6.2; R Foundation for Statistical Computing, Vienna, Austria) and GraphPad Prism (ver. 8.4.1; GraphPad Software, San Diego, CA, USA). Microsoft Excel was used to perform a two-tailed Welch’s t test and create volcano plot. GraphPad Prism was used to perform a two-tailed Mann–Whitney test and Bonferroni correction for multiple comparisons. Statistical analyses were independently validated using GraphPad Prism. Net reclassification index was calculated as described previously for the same study cohort [17]. R software was used to perform supervised clustering, L1-penalised logistic regression analysis (involving least absolute shrinkage and selection operator [LASSO] and bootstrapping), ROC curve analysis and statistical adjustment. The LASSO algorithm was used to prevent the overfitting of collinear and high-dimensional data [36]. The LASSO regression method used here is based on logistic regression for binary outcomes as either progressors or non-progressors. Bootstrap analysis (n = 1000 iterations) was performed to validate the penalised regression analysis. In each bootstrap iteration, ~37% of the samples are randomly removed and replaced with the same number of randomly selected samples within the set. Feature selection using LASSO and bootstrapping was carried out for all 754 measured miRNAs, including any undetected miRNAs (designated with Ct value of 39). All data (clinical and miRNA) were first checked for normal distribution using the Kolmogorov–Smirnov (KS) test. Parametric tests were used for normally distributed data while non-parametric tests were used for data that did not follow a normal distribution.

Pathway analysis

The computational web tool miRSystem (ver. 20160513; http://mirsystem.cgm.ntu.edu.tw/) was used to perform Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway analysis for each miRNA of interest [37]. The parameters Hit (>3) and O/E ratio (>2) were applied with the default settings, along with the minimum total genes in pathways set at 4. The significant raw p values (≤0.05) of the KEGG pathways converted to −log10(p value) are presented.

Results

Study design and clinical characteristics at baseline

We recruited 148 pregnant women with index (first) GDM pregnancy. At 12 weeks postpartum, all women undertook an OGTT and were found to be normoglycaemic. A fasting blood sample was collected from all women. From the initial 148 women, 45 were excluded from this study due to missing clinical information or inadequate/haemolysed blood sample/volume or lost to follow-up as they had moved interstate/overseas. From the remaining 103 women in this study, 21 developed type 2 diabetes, while the remaining 82 women did not progress to type 2 diabetes in the follow-up period. These women were followed for up to 10 years from study baseline (median 7.6 years). The median follow-up was 5.7 years (range 1.8–8.7 years) for progressors and 7.8 years (range 5.5–9.3 years) for non-progressors.

The study samples were randomly divided into two sets: a discovery set, where we profiled 754 miRNAs to identify significantly different miRNAs between the progressors and non-progressors; and a validation set, where we validated a selected (smaller) set of differentially expressed miRNAs identified through univariate as well as multivariate machine learning analysis. Our discovery set included progressors (n = 10) and non-progressors (n = 11) while the validation set included 11 progressors and 71 non-progressors. The baseline clinical features of study participants in the discovery and validation sets are presented in Tables 1 and 2, respectively. In the discovery set, there were no significant differences in any of the measured variables (Table 1). In the validation cohort, women who progressed to type 2 diabetes were older than non-progressors by ~1.9 years (p = 0.045) but did not show any significant differences for other biochemical variables (Table 2). The study workflow is presented in ESM Fig. 1.

Table 1 Clinical characteristics of GDM women in the discovery set
Table 2 Clinical characteristics of GDM women in the validation set

miRNA profiling and identification of differentially expressed miRNAs in the discovery cohort

We implemented an unbiased approach to discover differentially and significantly regulated miRNAs in postpartum plasma samples of women (progressors and non-progressors). We profiled 754 known/validated human miRNAs using OpenArray panels on the discovery set of samples. As expected for plasma samples [38, 39], 273 miRNAs were expressed in at least one of the samples (Fig. 1a); with the remaining miRNAs being undetectable (Ct value = 39). There were no missing data. All 754 miRNAs (detected and undetected) were used for discovery sample data analyses. Univariate analysis identified two significantly different miRNAs (fold difference > 2, p < 0.05) between progressors and non-progressors (Fig. 1b). Machine learning algorithms were used to identify the miRNAs that are important for stratifying progressors and non-progressors. We employed penalised logistic regression model followed by bootstrapping to identify this set of important miRNAs. We used all miRNAs in the LASSO-based feature selection step as the main aim was to identify miRNAs that can discriminate between progressors and non-progressors. Bootstrapping was applied to generate a bootstrap frequency table. Bootstrap frequency was derived based on the number of iterations in which a specific miRNA appears to be discriminatory in the prediction model (Fig. 1c). We then ranked the miRNAs based on their frequency and identified the top 10% (14 candidate miRNAs, bootstrap frequency > 14%; see highlighted miRNAs in Fig. 1d). One miRNA (miR-543) was common to both univariate analysis and the top 10% of the LASSO bootstrap miRNAs, while miR-329-3p was found to be significant in the univariate analysis and was present in the bottom 50% of the bootstrap miRNAs. Interestingly, the expression of miR-329-3p correlated significantly (r = 0.45, p = 0.000026) with one of the miRNAs (let-7g) selected in our bootstrap analysis (top 10%). These 14 miRNAs based on highest bootstrap frequencies, along with miR-329-3p from univariate analysis (total 15 miRNAs), were then selected for measurement in the validation cohort.

Fig. 1
figure 1

Profiling and analysis of differentially expressed miRNAs in discovery cohort. (a) A supervised (by binary/asymmetric binary distances and average linkage clustering) heatmap of the 273 of 754 miRNAs expressed in the plasma samples of type 2 diabetes non-progressors (n = 11) and progressors (n = 10). Heatmap represented normalised qPCR Ct values (colour bar) with low Ct values (high miRNA levels; dark red) and higher Ct values (low to no expression of miRNA; yellow to white). The other 481 of 754 miRNAs on the panel were not detectable across all plasma samples and are therefore not presented herein. (b) Volcano plot for all detectable miRNAs in the progressors (n = 10) vs non-progressors (n = 11). The Ct value differences are shown on the x-axis and the −log10 p value (calculated using unpaired two-tailed Welch’s t test) is on the y-axis. The horizontal dotted line represents the significant p value (0.05), while the vertical dotted lines represent a twofold (1 Ct value) difference. Significantly altered miRNAs between the two groups are denoted in red colour (each point presenting a unique miRNA). (c) Schematic demonstrating the bootstrap machine learning workflow used in analyses. LASSO regression analysis was performed with bootstrapping wherein a small subset of samples (red circles) were randomly deleted while the same number of samples were duplicated from the remainder of the dataset. This was carried out 1000 times so that the resulting datasets were different from each other. (d) Graph representing the number of times (bootstrap frequency, %) an miRNA (x-axis) was detected as important for discriminating between non-progressors (n = 11) and progressors (n = 10). Following penalised logistic regression and bootstrap (1000 iterations) analysis, the top 14 miRNAs from this analysis (bootstrap frequency > 14%; highlighted in blue) were selected for validation along with the significant miRNAs from the univariate analysis shown in (b). T2D, type 2 diabetes

Measurement in validation cohort identifies most discriminatory miRNAs

We then measured the abundance of 15 miRNAs identified in the discovery set along with spike-in control ath-miR-172a (ESM Table 1). Four out of the 15 miRNAs (miR-491-5p, p = 0.0009; miR-543, p = 0.0003; miR-410-3p, p = 0.005; and miR-369-3p, p = 0.003) were observed to be significant in the validation set samples using t test or Mann–Whitney test and were more abundant in progressors (> twofold) than non-progressors. These miRNAs were seen to target multiple genes associated with inflammatory, pancreas secretory, calcium signalling and diabetes-related pathways (ESM Fig. 2). After adjusting for age only or age and BMI (measured at baseline sample collection), miR-369-3p continued to remain significant (p = 0.007 and p = 0.03, respectively). This miRNA also remained significant after correcting for multiple comparisons. The other three miRNAs (miR-491-5p, miR-543 and miR-410-3p) did not remain significant after age adjustment and/or after correcting for multiple comparisons.

We next performed ROC curve analysis in the validation sample set to understand the discriminatory capacity of the most significant miRNA (miR-369-3p) to predict the development of type 2 diabetes. The AUC, specificity and sensitivity of traditional risk factors with and without the addition of miR-369-3p are presented in Table 3. In these analyses, the base model included the following traditional risk factors: age, BMI, fasting blood glucose during pregnancy, postnatal fasting blood glucose, cholesterol and triacylglycerol. Other risk factors, such as family history of diabetes and use of insulin during pregnancy, were also used in the ROC curve analyses, although they yielded lower AUCs of around 0.5 and were therefore not included in the base model. The base model consisting of six traditional risk factors resulted in an AUC of 0.83. The addition of miR-369-3p to the base model improved the AUC (from 0.83 to 0.92), specificity (from 0.85 to 0.89) and sensitivity (from 0.73 to 0.91) (Table 3). Overall, this resulted in a net reclassification index of 31%.

Table 3 Type 2 diabetes prediction and model development

Discussion

This is the first study to assess whether postpartum circulating miRNA levels in women who had GDM can predict the development of future type 2 diabetes. We implemented an unprejudiced discovery with machine learning variable selection algorithm followed by validation of selected variables in a separate set of samples to identify predictive miRNAs. We randomised all available de-identified samples (n = 103) to discovery and validation sets and measured 754 miRNAs in the discovery set. The discovery approach was twofold: first, we had a separate set of samples with similar clinical features for the discovery study; and second, we profiled most of the available and experimentally validated (754) miRNAs rather than using only previously reported miRNAs. Analysis using LASSO and bootstrapping-based workflows identified the 14 most important (top 10% bootstrap) miRNAs along with miR-329-3p (from univariate analysis). These important miRNAs were measured in the validation sample set and data were used to confirm whether the measured miRNAs were discriminatory between the two groups (progressors vs non-progressors). Following adjustment for age (as age was significantly different between progressors and non-progressors in the validation cohort), as well as Bonferroni test for multiple comparisons, we identified miR-369-3p to be significantly higher in progressors than in non-progressors and a potential biomarker candidate for predicting future type 2 diabetes. The addition of this miRNA to six traditional risk factors increased sensitivity, specificity and AUC in ROC curve analysis, with an overall increase in the net reclassification index of 31%. Although the CIs for the base model overlap with the proposed (base model + miRNAs) model, the addition of postpartum miRNA measurements enabled correct reclassification of seven more study participants (two progressors and five non-progressors).

Our data identified four important miRNAs, miR-491-5p, miR-543, miR-369-3p and miR-410-3p, with only miR-369-3p remaining significant following age adjustment and correcting for multiple comparisons. Interestingly, these miRNAs were found to target pathways involved in inflammation, protein processing, cytokine signalling, immune response and lipid metabolism (ESM Fig. 2). A low-grade subclinical systemic and tissue-specific inflammation is a hallmark of obesity and is causally linked to insulin resistance [40]. Maternal inflammatory markers are known to impact on fetal, neonatal and long-term outcomes of adiposity in the offspring. In our previous work, we observed that total cell-free insulin DNA expression levels were significantly higher in women who progressed to type 2 diabetes later in life, suggesting an overall cell and tissue stress or inflammation [30]. Involvement of the identified miRNAs in inflammation and cytokine/immune signalling pathways suggest predisposition to type 2 diabetes in women who have higher cellular stress at baseline. The pathway analysis was mainly carried out as circulating miRNAs can be biomarkers and also mediators of the disease process. With small study size and few significant miRNAs, pathway analysis needs to be repeated in larger cohorts and miRNA–target mRNA interactions need to be validated using luciferase assays. Nonetheless, this analysis provides an insight into the potential mechanistic link of miRNA identified through this study with type 2 diabetes progression.

miRNA-369 has been shown to be significantly reduced in islets from type 2 diabetes donors compared with non-type 2 diabetes donors [41]; however, it has not yet been reported as a circulating biomarker for any diabetes phenotype and is the novel miRNA that we report to be predictive of future type 2 diabetes. This is the only miRNA that remained significant even after adjustment for maternal age and BMI. It is also the only miRNA within our set that targets metabolic pathways related to MODY as well as taurine and fatty acid synthesis (ESM Fig. 2). We and others have demonstrated that circulating as well as tissue concentrations of taurine are associated with beta cell survival/function and future type 2 diabetes [42, 43]. Thus, it is highly probable that this miRNA would potentially reflect the molecular interplay between gene/signalling networks regulating beta cell function. Even though a single miRNA (miR-369-3p) was observed to remain significant, it is important to understand that miRNAs can be highly correlated with each other and it is likely that miR-369-3p may present a set of other such closely related miRNAs or even some clinical features. It remains to be tested which other miRNAs or clinical characteristics are correlated with miR-369-3p and hence could be potential biomarkers.

miRNAs have been profiled in previous studies as potential biomarker candidates for diagnosis of onset or progression of GDM as well as adverse outcomes of pregnancy [44]. Most studies [45,46,47,48] compared serum/plasma samples from women with GDM in the early stages of pregnancy (16–19 week) with samples from non-diabetic control groups. Multiple miRNAs were identified as significantly different in the circulation of women with vs without GDM. In a study by Sebastiani et al, miRNAs were profiled at a later stage of pregnancy and miR-330-3p was found to be at higher abundance when compared with a control group [49]. Interestingly, in this study there were two subgroups of women with GDM (based on miR-330-3p expression), and the subgroup with relatively higher levels of miR-330-3p had lower fasting insulin and higher incidence of Caesarean section deliveries [49]. In a recent study, higher miR-330-3p levels in the third trimester of pregnancy were reported to be associated with lower exogenous insulin supplementation and better response to GDM management therapy as well as a higher number of vaginal deliveries in women with GDM [50]. The differences observed in the miRNA expression profiles between these different studies could be attributed to type of biofluid, stage of pregnancy for sample collection, ethnicity, age, BMI and smoking status of enrolled participants. To the best of our knowledge, our study is the first to understand the potential of circulating miRNAs as biomarkers for future development of type 2 diabetes in women previously diagnosed with GDM.

We used a unique longitudinal follow-up cohort of women who had GDM and had plasma samples and all clinical data collected at baseline and were then followed for up to 10 years, with type 2 diabetes progression information available for all of them. We divided these samples into two sets: a discovery cohort to profile 754 miRNAs; and a validation cohort to assess the differentially expressed miRNAs. We relied on unprejudiced discovery of dysregulated miRNAs through profiling all known and validated miRNAs as opposed to measuring only those miRNAs reported in previous studies. We believe this is important, as miRNAs can be altered in different ethnicities and age groups, therefore applying knowledge from one cohort to another may not be the best approach. We also employed statistical methods based on machine learning to derive the most important subset of differentially expressed miRNAs for validation studies. This removed sampling bias and enhanced the accuracy of identifying important miRNAs. There was a significant difference in age between the progressors and non-progressors in the validation cohort; however, age on its own could not stratify them. A limitation is the number of progressors being smaller than desired in our study cohort. The progressors in the validation set represent 13% of that set. In the general population, around 10% of women with GDM progress to type 2 diabetes, proportional to the progressors in our validation set. Another limitation is the unavailability of confounding variables (ethnicity, change in BMI and HOMA-IR) for some of the study participants. Although the current study was based on discovery workflows aimed towards handling/eliminating sampling bias (e.g. bootstrap), the discovery and validation samples were all collected from women visiting a single centre over ~10 years of the longitudinal follow-up. Although this offers uniformity to all study samples and clinical data collection/processing workflows, a study limitation is that the validation set comprised different samples from the same cohort of women with GDM, rather than a much desired validation in a totally different cohort. We therefore believe that replication of our findings in larger, geographically distinct, multi-centre, ethnically diverse cohorts is merited.

The most important findings of our work are that machine learning-based identification of miRNAs in the discovery set was validated in a different set of samples, and a model of single miRNA with clinical risk factors achieved an accuracy of 0.92 in ROC curve analyses. Our group has previously reported other metabolites as potential biomarkers in the same cohort of women with GDM [16,17,18, 30]. It is our intention to perform combined predictive analysis using the data from all biomarkers and validate these in a larger cohort in the future. One major advantage of miRNA biomarkers is their stability in stored/biobanked plasma, making them the choice for assessment in existing biobanks to drive future research in enhancing the risk stratification for type 2 diabetes development in women with GDM. The implementation of such tools will enable early dietary/lifestyle intervention in women with GDM, retarding the rate of progression to type 2 diabetes.