Introduction

Children born very preterm (VP, <32 weeks gestational age (GA)) have higher rates of neurodevelopmental delays compared with children born full-term (FT, ≥37 weeks GA), including memory impairments and emotional difficulties.1,2 The mechanisms responsible are not clear, but may be better understood by characterising the neuroanatomical underpinnings using magnetic resonance imaging (MRI).

Episodic memory and emotion are thought to be supported by a complex cortical and subcortical circuitry involving the mesial temporal structures, medial forebrain, hypothalamus, thalamus, orbitofrontal lobe and brainstem, interconnected by white matter tracts, including the fornix, stria terminalis, mammillo-thalamic tract, anterior thalamic radiation, cingulum, uncinate fasciculus, and medial forebrain bundle.3 VP birth has been associated with reduced hippocampal volume,4,5 but we previously found that hippocampal volume was not associated with concurrent memory performance in VP children at age 7 years.5 Disruptions of the hippocampal efferent and afferent tracts, and the broader system involved in episodic memory and emotion, might better explain episodic memory and emotional impairments in VP children. Some studies using tractography have found that the fornix and cingulum,6,7 uncinate fasciculus,8,9 and anterior thalamic radiation9 have altered microstructure in preterm children and young adults compared with FT controls. Only a few studies have reported associations between fornix and cingulum volume and microstructure and memory outcomes in preterm young adults.6,7 Additional studies would be beneficial to characterise a broader range of tracts involved in episodic memory and emotion, and their relationships with episodic memory and emotional outcomes.

Previous research in this area generally examined differences in brain structure between groups of preterm and FT children, and associations between brain structure and neurodevelopmental outcomes in preterm children at a group level. Machine learning methods enable a more individualised approach.10,11 Machine learning has previously been used to classify preterm and FT individuals based on MRI data,11,12,13 and to investigate whether MRI data can predict the longer-term neurodevelopmental functioning of individual preterm children (e.g. refs. 14,15,16). Support vector machines are a type of supervised machine learning algorithm that have been applied successfully and relatively frequently on cohorts of preterm children.11,14,15 We aimed to expand the application of support vector machine learning to investigate the effects of VP birth on tracts traditionally related to episodic memory and emotion in childhood.

This study uses data collected as part of a large longitudinal cohort study with previous publications (e.g. refs. 1,4,5,15,17). The current manuscript is the first to examine a broad range of tracts involved in episodic memory and emotion using tractography, and is the first to apply advanced machine learning methods to analyse tracts. In summary, our first aim was to investigate the volume and microstructure of several of the major tracts traditionally involved in episodic memory and emotion in VP children compared with FT children at age 7 years, including the cingulum, fornix, uncinate fasciculus, medial forebrain bundle and anterior thalamic radiation. Our second aim was to investigate whether the structure of these tracts is related to episodic memory and emotional outcomes in VP and FT children. While the specific roles of individual tracts in episodic memory and emotion are still not completely understood, we hypothesised that cingulum and fornix volume and microstructure would be most strongly related to episodic memory, while uncinate fasciculus and medial forebrain bundle volume and microstructure would be most strongly related to emotional outcome.

Methods

Participants

VP infants (born <30 weeks GA and/or <1250 g; n = 224) and FT infants (born ≥37 weeks GA; n = 46) were recruited during the neonatal period, between July 2001 and December 2003, from the Royal Women’s Hospital, Melbourne. Infants with genetic or congenital abnormalities likely to affect development were excluded. At age 7 years, participants were invited to attend a follow-up study. Of the originally recruited participants, 198 VP participants (88% of the original cohort) and 43 FT participants (93% of the original cohort) attended the follow-up, of whom 159 VP and 36 FT participants had MRI. The study was approved by the Royal Children’s Hospital, Melbourne, Human Research Ethics Committee, and written informed consent was obtained from all parents or caregivers.

Magnetic resonance imaging

Participants underwent MRI in a 3T scanner (Siemens Tim Trio, Erlangen, Germany). All participants were scanned with two diffusion-weighted (single-shot twice-refocused echo planar imaging) sequences. The first sequence (‘b1200’) had multiple b values ranging from 50 to 1200 s/mm2 in increments of 50 (the same b values were used for all children), 25 gradient directions in total (i.e. one direction per b value, except that b = 300 and b = 600 had two directions), one b = 0 s/mm2 volume, repetition time (TR) = 12,000 ms, echo time (TE) = 96 ms, field of view = 250 × 250 mm2, matrix size = 144 × 144 and voxel size = 1.7 mm3. This was an advanced acquisition scheme at the time (2008–2011), and several studies utilising this type of multiple b value sequence have been published (e.g. ref. 18). The second sequence (‘b3000’) had b values of 3000 s/mm2, 45 gradient directions, six b = 0 s/mm2 volumes, TR = 7400 ms, TE = 106 ms, field of view = 240 × 240 mm2, matrix size = 104 × 104 and voxel size = 2.3 mm3.

Tractography

We reconstructed the white matter tracts from the b3000 images using MRtrix version 0.2. Tractography was performed manually over several years. While MRtrix3 became publicly available during the tractography (November 2013), we did not update the version in order to avoid introducing inconsistencies into the analysis. Tractography was based on the regions of interest detailed in Supplementary Table 1. After placing the regions of interest, we delineated the tracts using a probabilistic algorithm based on fibre orientation distributions provided by constrained spherical deconvolution (lmax 6)19 (Fig. 1).

Fig. 1: Tractography reconstructions in one representative full-term child.
figure 1

Tracts are shown for the right hemisphere only. We have attempted to reconstruct most of the major tracts that are shown in Catani et al.’s diagram of the limbic system, and the colours of our tracts match those in Fig. 8 in Catani et al.3.

Tractography was repeated on a subset of randomly selected participants to determine intra-rater reliability for the single operator who delineated each tract. Intra-rater reliability for the cingulum was >0.8.17 Intra-rater reliabilities for the remaining tracts are reported in Supplementary Table 2, and were all >0.8.

Tract measures

Diffusion tensor measures [fractional anisotropy (FA) and axial (AD), radial (RD), and mean (MD) diffusivities] were generated from the b1200 images. This involved correcting the images for motion and eddy current-induced distortions, incorporating b-matrix reorientation, and fitting the diffusion tensor model based on the weighted linear least-squares method, using ExploreDTI software. We did not remove any volumes affected by motion; however, we excluded participants if their diffusion images exhibited severe motion artefact based on visual inspection. Motion parameters for this sequence were similar between the groups (Supplementary Table 3).

Prior to Neurite Orientation Dispersion and Density Imaging (NODDI), we processed the images using the Functional MRI of the Brain Software Library (FSL). This involved: (1) linearly registering the b3000 images to the b1200 images using the FSL linear image registration tool (FLIRT)20; (2) merging the two sequences together; (3) motion correcting the combined sequences using the ‘eddy_correct’ tool (incorporating b-vector reorientation); (4) normalising each sequence separately by its b0 image/s to attempt to account for the different TEs and TRs between the sequences. Finally, we fitted the NODDI model to the combined sequences using the NODDI Matlab toolbox version 0.9.21 NODDI is a multi-compartment tissue microstructure model. The intracellular compartment is modelled as sticks, whose volume fraction represents the neurite density index (range 0–1). The orientation distribution of the sticks is modelled by a Watson distribution, which provides a measure of the neurite orientation dispersion index (range 0–1, from coherently aligned axons with low dispersion to bending, fanning or crossing axons with high dispersion).21

Diffusion tensor and NODDI images in b1200 space were aligned to tract (b3000) space using FLIRT.20 Tractography reconstructions from MRtrix were converted to probability (track density) images, thresholded at 0.01 (voxels with <10 out of 1000 streamlines were removed) and binarised. Resulting tract masks were multiplied by diffusion tensor images and NODDI images that were in b3000 space. Diffusion tensor and NODDI values were averaged across all the voxels in each tract, and tract volumes were calculated (number of voxels in the tract masks multiplied by voxel volume in cm3).

Episodic memory and emotional outcome assessments

Verbal memory and learning was assessed at age 7 years using the California Verbal Learning Task-Children’s Version.22 The children were required to recall a list of 15 words, which was presented over five trials. The variable of interest was the number of correctly recalled words over the five trials (verbal episodic memory and learning). Visual memory and learning was assessed using the Dot Locations Test from the Children’s Memory Scale.23 The children were required to recall the position of six dots presented on a 3 × 3 grid, which was presented over three trials. The variable of interest was the number of correct locations recalled over three trials (visual episodic memory and learning). Age-standardised scores were used, with higher scores indicating better performance. Emotional problems were assessed using the Emotional Symptoms subscale of the parent-report Strengths and Difficulties Questionnaire.24 This subscale consists of five items assessing somatic complaints (e.g. headaches), anxiety (e.g. worries, fears, nervousness) and mood (e.g. unhappy, downhearted).24 Higher scores indicate more emotional problems.

Statistical analysis

Linear regression

Linear regressions were performed using Stata version 14. All linear regression models were fitted using generalised estimating equations to account for correlations between data from multiple births. In addition, all linear regression analyses were performed separately for all the tract measures (left and right FA, AD, RD, MD, neurite orientation dispersion, neurite density and tract volume); hence, the total number of participants with data for each tract could be included in each analysis.

First, we compared tract measures between the VP and FT groups, adjusted for age at MRI and sex, and analyses with tract volumes were additionally adjusted for intracranial volume, to determine whether findings were independent of overall head size. We repeated these analyses with a sex-by-group interaction term, to investigate whether the group differences in tract measures differed between males and females.

Second, we explored associations between the tract measures and episodic memory and emotional outcomes in all children. We adjusted for age at neurodevelopmental assessment, sex, GA at birth and global brain abnormality score on term MRI,25 given these factors have previously been associated with neurodevelopmental outcomes in VP children.1,26 Analyses with tract volumes were also adjusted for intracranial volume. We repeated these analyses with a group-by-tract measure interaction term, to investigate whether the tract measure–outcome relationships differed between the VP and FT groups. We also repeated these analyses with a sex-by-tract measure interaction term, to investigate whether the tract measure–outcome relationships differed between males and females.

All linear regression analyses were false discovery rate (FDR)-corrected for the number of tracts and tract measures (12 by 7 = 84 comparisons); only results that were significant at p < 0.05, FDR-corrected are reported.

Machine learning

Machine learning involves building statistical models that can learn from and make predictions about data.10 A set of features (i.e. independent variables) is used to predict outcome measurements (i.e. dependent variables). Supervised learning involves using ‘training’ data, in which both features and outcomes are known for a set of study participants, to build a model which can be applied to new ‘test’ data to predict unknown outcomes for a set of participants. Support vector machines are a type of supervised learning algorithm. Support vector classification can be used to predict categorical outcomes, while support vector regression can be used to predict continuous outcomes.10

Support vector machine analyses were performed using scikit-learn version 0.17.1,27 implemented in Python (version 3.5.2). Linear support vector classification was used to investigate the ability of all the tract measures combined to classify VP and FT children. We used all tract measures as features; hence, these analyses were restricted to children who had data for all tracts (116 VP children and 28 FT children). All feature measurements were normalised by demeaning and scaling to unit variance prior to model training. Age at MRI, sex and intracranial volume were regressed from all the features prior to using them in the classification model. We tested internal model generalisation ability using a five-fold stratified cross-validation with an 80%/20% train/test split. Stratified folds ensure that the proportions of VP and FT participants in the training and test subsets in each fold are approximately the same as the proportions of VP and FT participants in the entire cohort, and no participants are present in multiple folds. The linear support vector classification model adopted in this study used l2 penalty, squared-hinge loss functions and a balanced class weighting given the unbalanced number of VP and FT participants. The best value of the tuning parameter C, which controls the trade-off between training data classification accuracy and generalisability, was chosen according to a nested three-fold cross-validation loop within each training fold using a grid search with values of 10−3, 10−2, 10−1, 100, 101, 102 and 103.

Accuracy and balanced accuracy scores were calculated for each fold with a final estimate of model accuracy given by the average across the five-folds. We calculated balanced accuracy as the proportions of VP and FT children were not equal. Balanced accuracy is the average of the true-positive rate (proportion of VP correctly predicted) and true-negative rate (proportion of FT correctly predicted), and should therefore better reflect the accuracy across both groups.28 To test whether the accuracy scores were significant, we performed permutation testing, with 5000 permutations.

We performed support vector regression to investigate the ability of the tract measures to predict episodic memory and emotional outcomes in the VP and FT children. Features were standardised as above, and five-fold cross-validation was used to evaluate model generalisation ability. The best value of C was chosen from the values of 100, 101 and 102 using three-fold nested cross-validation. Model accuracy was determined using the Pearson’s correlation between true episodic memory or emotional scores and predicted scores averaged over the five-folds. Model significance was tested using 5000 permutations.

Results

Participant characteristics

Of the 159 VP and 36 FT children who had MRI, we excluded 18 children (15 VP, 3 FT). This was because the diffusion MRI acquisitions of these children were not completed, were incorrect and/or were affected by motion artefact, for the b1200 (n = 3), b3000 (n = 13) or both the b1200 and b3000 (n = 2) sequences. This meant we generated data (tractography, diffusion tensor and NODDI data) for a maximum of 144 VP and 33 FT participants. All of these participants had cingulum and medial forebrain bundle data. We excluded additional participants’ data for other tracts, because certain image artefacts and brain structural abnormalities are not uniform across the brain and affected certain tracts more than others (e.g. cardiac pulsation artefact affected the fornix due to its proximity to the ventricles), and certain tracts are more difficult to delineate than others due to unique characteristics of each tract such as size, thickness and curvature.29 Of the 144 VP and 33 FT participants with cingulum and medial forebrain bundle data, we excluded 22 (20 VP, 2 FT) children for the fornix, 12 (9 VP, 3 FT) children for the uncinate fasciculus, and five (5 VP, 0 FT) children for the anterior thalamic radiation (some of these excluded VP children overlapped). This left 144 VP and 33 FT children with data for the cingulum and medial forebrain bundle, 124 VP and 31 FT children with data for the fornix, 135 VP and 30 FT children with data for the uncinate fasciculus, and 139 VP and 33 FT children with data for the anterior thalamic radiation. All these children were included in each separate linear regression analysis of each tract measure. Because some of the VP children excluded for the fornix, uncinate fasciculus and anterior thalamic radiation overlapped, we were left with a total of 116 VP and 28 FT children who had complete data for all tracts, and these data were included in the machine learning analyses.

Rates of major brain injuries (cystic periventricular leukomalacia and intraventricular haemorrhage) were low in the VP group, although as expected the VP group had more medical complications than the FT group (Table 1). Age at MRI and the proportion of males and females were similar between groups. The VP group had poorer verbal and visual episodic memory scores, and trended towards having greater emotional symptoms, at age 7 years compared with the FT group (Table 1).

Table 1 Perinatal medical characteristics and 7-year neurodevelopmental outcomes of the participants included in the current study, contrasted between the very preterm and full-term groups.

Fewer VP participants had postnatal corticosteroids, and more VP participants were from multiple births, compared with VP non-participants. Otherwise, characteristics were similar between participants and non-participants (data not shown).

VP children compared with FT children

Linear regression

Bilaterally, the fornix had significantly higher MD, AD, RD and neurite orientation dispersion, and lower volume in the VP children compared with the FT children. In addition, the left fornix had significantly lower neurite density in the VP children compared with the FT children (Fig. 2). The left superior-lateral medial forebrain bundle had significantly higher neurite orientation dispersion in the VP children compared with the FT children (Fig. 2). Volume of the anterior thalamic radiation was significantly higher in VP children compared with FT children (Fig. 2). There were no significant group-by-sex interactions.

Fig. 2: Comparisons of tract measures between very preterm and full-term children, from linear regression models.
figure 2

Points represent mean differences between groups, and error bars represent 95% CI. Circle points indicate analyses with left hemisphere (LH) tract measures; triangle points indicate analyses with right hemisphere (RH) tract measures. Analyses are adjusted for age at magnetic resonance imaging and sex (and intracranial volume for tract volume measures only). Asterisks indicate the mean difference was significant at p < 0.05, false discovery rate-corrected for the number of tracts and the number of tract measures. Units of axial, radial and mean diffusivities: ×10−3 mm2/s; units of tract volumes: cm3. ATR anterior thalamic radiation, IM-MFB inferior-medial medial forebrain bundle, SL-MFB superior-lateral medial forebrain bundle.

Support vector classification

After regressing age at MRI, sex and intracranial volume from the features, the tract measures significantly predicted group status with a cross-validated mean (95% CI) accuracy of 72.9 (56.2, 89.5)%, p = 0.009, and a cross-validated mean (95% CI) balanced accuracy of 62.7 (37.7, 87.6)%, p = 0.009. Fewer FT children [16/28 (57.1%)] were correctly classified compared with VP children [103/116 (88.8%)].

Relationships with neurodevelopmental outcomes

Linear regression

Higher FA and lower MD and RD in the left uncinate fasciculus, lower MD and RD in the left cingulum, lower MD and RD in the right superior-lateral medial forebrain bundle and lower RD in the right anterior thalamic radiation were significantly associated with better verbal episodic memory and learning in both groups, independent of GA at birth and neonatal brain abnormality score (Fig. 3). After adjustment for the potentially confounding variables and multiple comparison correction, associations between tract measures and visual episodic memory and learning and emotional symptoms were not significant (data not shown). All tract measure–outcome associations did not significantly differ between the VP and FT groups, except for the relationship between left superior-lateral medial forebrain bundle AD and emotional symptoms (stronger positive association in the FT group versus weaker negative relationship in the VP group; data not shown). All tract measure–outcome associations did not differ significantly between males and females.

Fig. 3: Associations between tract measures and verbal episodic memory in very preterm and full-term children, from linear regression models.
figure 3

Points are regression coefficients (change in the verbal episodic memory score per unit change in the tract measure), and error bars are 95% CI. Circle points indicate analyses with left hemisphere (LH) tract measures; triangle points indicate analyses with right hemisphere (RH) tract measures. Analyses are adjusted for age at neurodevelopmental assessment, sex, gestational age at birth and neonatal brain abnormality score, as well as intracranial volume for the tract volume plot only. Asterisks indicate that the association was significant at p < 0.05, false discovery rate-corrected for the number of tracts and the number of tract measures. Units of axial, radial and mean diffusivities: ×10−3 mm2/s; units of tract volumes: cm3. ATR anterior thalamic radiation, IM-MFB inferior-medial medial forebrain bundle, SL-MFB superior-lateral medial forebrain bundle.

Support vector regression

The support vector regression model using all tract measures did not significantly predict episodic memory and emotional scores in the VP and FT children (all p values for correlations between true and predicted scores were >0.3; Supplementary Fig. 1).

Discussion

Effect of VP birth on white matter tracts

The structure of major white matter tracts traditionally related to episodic memory and emotion, the fornix and medial forebrain bundle, was altered in VP children compared with FT children at age 7 years. This could be caused by ischaemic or inflammatory insults, or secondary alterations in white matter development.30 Understanding the underlying cellular changes is challenging given the indirect nature of the diffusion measures. The NODDI measures used in the current study provided increased specificity to cellular properties (neurite density and orientation dispersion) compared with diffusion tensor measures.21 Our results are consistent with many previous studies that have reported volume and microstructural differences in various white matter tracts between preterm and FT children, including in the tracts examined in the current paper.6,7,8,9 Our findings for the fornix are largely consistent with previous studies, which found lower fornix volume in preterm children compared with FT children,6,7 as well as altered fornix microstructure in preterm children compared with FT children, although our differences were in the AD, RD, MD, neurite density and neurite orientation dispersion parameters, whereas the previous studies found no difference in RD, but differences in an alternative microstructure measure, hindrance modulated orientational anisotropy.6,7 This difference in findings could be related to the differences in scanners, sequences and image processing pipelines between studies. In line with our study, previous studies found no microstructural differences in the cingulum between preterm and FT groups,6,7,9 although in contrast to our study, previous studies found lower cingulum volume in the preterm group compared with the FT group.6,7 There are also some conflicting findings in studies on the uncinate fasciculus. We found no significant differences in uncinate fasciculus volume and microstructure between VP and FT children, whereas other studies found lower FA in the uncinate fasciculus in preterm children compared with FT controls.8,9 In addition, a previous study found higher FA in the anterior thalamic radiation in preterm children compared with FT controls,9 whereas we found no microstructural differences in this tract between VP and FT children, although we did find higher anterior thalamic radiation volume in the VP group compared with the FT group. These differences in results could be related to differing analysis methods and sample characteristics between studies. In particular, all previous studies were based on older participants than the current sample of 7-year-old children ( 19-year-olds,6 30-year-olds,7 12-year-olds,8 and 9-17-year-olds9). Tracts such as the cingulum, uncinate fasciculus and anterior thalamic radiation demonstrate ongoing development over adolescence and young adulthood, such as increasing myelination and axon density,31 and hence group differences in tract measures are likely to change over time, and developmental delays related to VP birth may become more apparent with age in these developmentally sensitive tracts.

Our support vector classification model used tract data to significantly classify VP and FT children. This suggests that VP-born individuals have distinct white matter tract development from FT-born individuals, and this can be detected by machine learning techniques, even 7 years after birth. While this analysis was purely for research purposes because the GA of an individual is already known at birth, accurate classification is important as it demonstrates that the support vector machine model is able to predict based on the tract measures whether a child has white matter structure that is more indicative of a VP or FT individual.11 Our classification performance is similar to previous studies.11,12,13 These previous studies used grey matter data from structural MRI to achieve 86–93% accuracy in classifying preterm and FT 12-17-year-old children,11 and used functional MRI data to achieve >80% accuracy in classifying preterm and FT infants.12,13 Hence, data from multiple MRI modalities and brain regions appear able to classify VP from FT children using machine-based learning. This success justifies further investigation into whether machine learning algorithms can be used to predict the neurodevelopmental outcomes of VP individuals.

Relationship between white matter tracts and episodic memory and emotion

Although weaker than the differences in tract structure between the VP and FT children, the linear regression results provided some evidence for associations between the tracts and verbal episodic memory and learning outcome in VP and FT children, independent of several key perinatal risk factors known to influence long-term neurodevelopmental outcomes. Our findings provide further support that these tracts have a role in episodic memory,3 and that alterations to these tracts may contribute to episodic memory difficulties in VP children.6,7 Interestingly, fornix microstructure differed between VP and FT children, but was not related to episodic memory outcome, whereas other tracts that did not differ between VP and FT children (cingulum and uncinate fasciculus) were related to episodic memory. Additionally, the medial forebrain bundle (superior-lateral aspect) microstructure both differed between VP and FT children and was related to episodic memory outcome. The mechanisms behind these associations can only be speculated, but may reflect importance of the cingulate and orbitofrontal cortices in episodic memory in VP and FT children. The cingulum, anterior thalamic radiation, uncinate fasciculus and superior-lateral medial forebrain bundle have direct connections to these regions, whereas the fornix does not.3,32

Despite linear regressions revealing some associations between the tract measures and verbal episodic memory outcome, support vector regression results suggested that the ability of the tract measures as a whole to predict concurrent episodic memory outcome is limited. This discrepancy is an important finding. The effect sizes from linear regression were modest, and there were few strong associations from linear regression compared with the number of tests performed. In addition, the statistical power of neuroimaging studies in general tends to be low, which can lead to overestimated effect sizes.33 Consequently, the strength of the reported associations from linear regression may be upwardly biased. This could explain why these associations perform poorly when testing in small, out-of-sample datasets using support vector regression. In addition, there may be little extra information to be gained by including all the tracts in one machine learning model, due to shared variance between the tracts, as investigated previously using principal component analysis.34 Unlike the current findings, previous machine learning studies have found that MRI data can predict neurodevelopmental outcomes of preterm children.14,15,16 These studies used whole brain structural and diffusion MRI data from neonates, which may have better predictive ability compared with our specific white matter tracts studied later in childhood. However, our aim was not to achieve better predictive ability than previous studies, but rather to expand the use of machine learning to improve our understanding of the effects of VP birth on tracts that are traditionally related to episodic memory and emotion in childhood.

Strengths and limitations

Strengths of our study include the large sample of children followed up until age 7 years, and the use of advanced statistical analyses in addition to traditional regression analyses. However, loss of participants to follow-up or not having useable MRI data meant that a maximum of 144 (64%) of the original cohort of VP children and 33 (72%) of the original cohort of FT children could be included in the current study, and numbers were even smaller for the support vector machine analyses, which required complete data from all participants. This reduced our power to predict neurodevelopmental outcomes and limits the generalisability of our results. The number of VP children included in our study is generally similar or larger than other studies in the field, and the number of FT controls included in our study is generally similar to other studies in the field;6,7,8,9 however, we acknowledge that we had fewer FT controls than VP children and that the unbalanced groups could affect the support vector classification. We attempted to address the unbalanced groups by performing balanced class weighting within the support vector classification, and calculating balanced accuracy, which is a better reflection of accuracy for both groups. Future studies could recruit more FT controls to further minimise any bias of unbalanced samples on classification. In addition, while we used K-fold cross-validation with a partially nested cross-validation for parameter tuning, future studies could use nested cross-validation to reduce bias on support vector machine performance estimates.35 Given our interest in examining the neural correlates of emotional outcomes, we selected the Emotional Symptoms subscale of the Strengths and Difficulties Questionnaire. However, we acknowledge that this is a brief screening measure and more comprehensive measures of emotional outcome are required to replicate our findings. Many of the tracts delineated, particularly the fornix, are located close to the lateral ventricles, exacerbating the risk of cerebrospinal fluid contamination of the microstructural measures. We tested whether cerebrospinal fluid contamination could be influencing our group differences from linear regression by adjusting for cerebrospinal fluid volume obtained using FreeSurfer, and results were similar (data not shown). Future studies could improve tractography further by using advanced methods for estimating the fibre orientation distribution, such as Multi-Shell Multi-Tissue constrained spherical deconvolution or Single-Shell 3-Tissue constrained spherical deconvolution.36,37

Conclusions

The volume and microstructure of some tracts traditionally related to episodic memory and emotion are altered in VP children compared with FT children. This altered structure can be detected by machine learning techniques. Specific tracts, including the cingulum, uncinate fasciculus, anterior thalamic radiation and medial forebrain bundle, were associated with episodic memory outcome in VP and FT children, independent of several important perinatal risk factors. Despite this, the ability of tract structure to predict the episodic memory outcome of individual children was limited.

This study contributes important information towards understanding the long-term neural changes following VP birth, and how these might contribute to adverse episodic memory outcomes. Studies applying machine learning methods are critical for improving knowledge in the field. More broadly, such studies are a necessary first step towards understanding whether MRI data can aid in the identification of individual VP children who are at risk of adverse neurodevelopmental outcomes, and who may benefit from interventions to improve their outcomes. Future studies using larger datasets acquired as early as possible and testing a broader range of machine learning algorithms would be worthwhile for advancing knowledge in the field.