Introduction

Parkinson’s disease (PD) is a progressive disease that is characterized by degeneration of mesencephalic dopamine neurons. The prevalence of PD is increasing. The number of patients with PD has more than doubled globally from 1990 to 2015 [1], and worldwide in 2016, 6.1 million individuals had PD [2]. Although MRI is used for the diagnosis of many neurological disorders, PD does not show abnormal findings on conventional MRI. Patients with PD show extrapyramidal symptoms such as resting tremor, bradykinesia, rigidity, and postural reflex disturbance, and diagnosis of PD is made based on the clinical time course, symptoms, and physical findings.

Advancements in diffusion MRI techniques have allowed researchers to capture imaging findings of PD [3] by using diffusion tensor imaging (DTI) [4], diffusion kurtosis imaging (DKI) [5], and neurite orientation dispersion and density imaging (NODDI) [6]. Parameters derived from g-ratio analysis [7] may also capture microstructural changes in PD [8]. The structural connectome matrix, or number of streamlines (NOS)-based connectome matrix, is calculated from diffusion-weighted MRI, represents neural connections of the whole brain [9], and also has potential to reveal abnormalities in neural connections in some neurological and psychological disorders [10] including PD [11]. While NOS-based matrices represent only microstructural fiber counts, elements of the connectome matrix can also be weighted using parameters derived from advanced diffusion MRI techniques. Such parameter-weighted matrices have a potential to represent tissue properties and to have higher sensitivity to capture the characteristics of neurological diseases. For example, the usefulness of the g-ratio-weighted connectome matrix over the NOS-based matrix was reported in the diagnosis of multiple sclerosis [12]. Use of a parameter-weighted connectome matrix from DTI, DKI, NODDI, and g-ratio analyses has the potential to diagnose PD better than the conventional NOS-based matrix; however, to the best of our knowledge, no investigations have used such parameter-weighted connectome matrices to evaluate PD.

Deep learning, which is an artificial intelligence approach, allows automated analyses of large-volume data such as images without explicitly teaching knowledge to computers [13]. Convolutional neural network (CNN), a deep learning strategy, has shown high performance in image recognition tasks [14]. Therefore, application of CNN to radiological imaging diagnosis has been gaining wide attention [15,16,17], and researchers have successfully applied this technique to several imaging diagnosis tasks with radiography [18, 19], CT [20, 21], and MRI [22, 23]. Deep learning would also have a potential to handle connectome matrix which has a large data volume comprising thousands of independent elements. One of the weaknesses of deep learning is how it makes decisions has been difficult to interpret for us [15]. However, visualization of regions of images where deep learning models focus has become possible with gradient-weighted class activation mapping (Grad-CAM) [24]. This technique may allow visualizing alterations of neural circuits in PD patients when it is applied to CNNs which is trained with connectome matrices to diagnose PD.

The aim of this study was to investigate whether patients with PD can be differentiated from healthy controls by applying a deep learning technique to a parameter-weighted connectome matrix with higher performance than a NOS-based structural connectome matrix calculated from MRI obtained with a 3-Tesla MRI unit and to find neural circuit disorders associated with PD by using Grad-CAM techniques.

Methods

In this prospective study, which was approved by our institutional review board, MRI data of patients with PD and healthy controls were analyzed. There is subject overlap (64 out of 230) with previous studies [25, 26], which used different analyzing methods (tract-of-interest, tract-based spatial statistics, and gray matter-based spatial statistics analyses) from the current study. Those studies did not use deep learning technique or connectome matrices in the evaluations of PD. Written informed consent was obtained from each participant.

Participants

Patients with PD and healthy controls who visited Juntendo University Hospital (blinded for the review process) between February 2017 and October 2018 were enrolled in this study. Specialized neurologists made the diagnosis of PD according to clinical diagnostic criteria for PD of the Movement Disorder Society [27]. Patients showed parkinsonism and responded to antiparkinsonian therapy. Patients with apparent signs of other diseases which shows parkinsonism were not included. Disease duration from onset to MRI examination, disease severity, and dominant side are shown in Table 1. Healthy controls had no history of neurological diseases. Participants with a brain infarction or incomplete MRI examination were excluded. Age- and gender-matched participants in the PD and healthy control groups were selected before the deep learning and main analyses.

Table 1 Patient demographics data

MRI data acquisition

MRI examinations were performed with a 3-T MRI unit (MAGNETOM Prisma; Siemens Healthcare) using a 64-channel head coil. Multi-shell diffusion-weighted MRI, magnetization transfer saturation images, and T1-weighted images were obtained.

Multi-shell diffusion-weighted MRIs were obtained with b-values of 0, 1000, and 2000 s/mm2 along 64 uniformly distributed directions for each shell with spin-echo echo-planar imaging. We also acquired standard and reverse phase-encoded blipped images with no diffusion weighting (blip up and blip down) to correct for the magnetic susceptibility-induced distortions related to the echo-planar imaging acquisitions [28]. The data were corrected for eddy currents, susceptibility-induced geometric distortions, and inter-volume motion using the EDDY and TOPUP toolboxes [28]. Acquisition parameters for the multi-shell diffusion-weighted MRIs were the following: repetition time (TR) of 3300 ms, echo time (TE) of 70 ms, field of view (FOV) of 229 × 229 mm, voxel size of 1.8 × 1.8 × 1.8 mm3, matrix of 130 × 130, 65 slices, number of excitations of 1, and acquisition time of 7.29 min.

To calculate the magnetization transfer (MT) saturation index, dual excitation three-dimensional multi-echo fast low-angle shot sequences were performed with predominant T1, proton density, and MT weighting. The excitation of MT-weighted images was preceded by an off-resonance Gaussian-shaped radiofrequency pulse under the following conditions: frequency offset from water resonance, 1.2 kHz; pulse duration, 9.984 ms; and nominal flip angle, 500°. Other acquisition parameters for MT saturation imaging were as follows: MT-off and MT-on scanning (TR/TE = 24/2.53 ms, flip angle = 5°) and T1-weighted imaging (TR/TE = 10/2.53 ms, flip angle = 13°); parallel imaging using a generalized auto-calibrating partially parallel acquisition factor in the phase-encoding direction, 2; 7/8 partial Fourier acquisition in the partition direction; bandwidth, 260 Hz/pixel; slice thickness, 1.8 mm; number of slices, 104; FOV, 224 × 224 mm, and matrix, 128 × 128. The total acquisition time was 6.25 min.

As structural data, three-dimensional T1-weighted images using magnetization-prepared 180° radiofrequency pulses and rapid gradient-echo sequences were also acquired. Acquisition parameters for the T1-weighted images were the following: TR/TE = 15/3.54 ms, inversion time = 1100 ms, voxel size of 0.86 × 0.86 × 0.86 mm3, and acquisition time, 5.14 min.

Diffusion analyses (DTI, DKI, NODDI, and g-ratio analyses)

An ordinary least square was applied to the diffusion-weighted MRI data with b = 0 and 1000 s/mm2 to produce FA, mean diffusivity, axial diffusivity, and radial diffusivity based on standard formulae [4].

The diffusional kurtosis estimator [29] was implemented in MATLAB (Math-Works, Natick, MA, USA) to generate AK, MK, and RK maps.

The resulting diffusion-weighted MRIs were fitted to the NODDI model [6] using the NODDI Matlab Toolbox5 (http://www.nitrc.org/projects/noddi_toolbox) and Accelerated Microstructure Imaging via Convex Optimization [30]. Maps of the ICVF, orientation dispersion index, and isotropic volume fraction were generated.

Using an in-house MATLAB script, MT saturation data were analyzed to calculate the myelin volume fraction (MVF). To estimate MVF maps, a calibration factor of 0.1 was subsequently used to obtain a g-ratio of 0.7 in the corpus callosum as previously suggested [31]. The AVF was calculated based on parameters of the NODDI model (ICVF and isotropic volume fraction) and MT saturation (myelin volume fraction), as follows [7]:

$$ \mathrm{AVF}=\left(1-\mathrm{MVF}\right)\left(1-\mathrm{ISOVF}\right)\mathrm{ICVF} $$

where ISOVF denotes isotropic volume fraction. Finally, the g-ratio was calculated using the myelin volume fraction and AVF for each voxel using the following equation [7]:

$$ g-\mathrm{ratio}=\sqrt{\frac{\mathrm{AVF}}{\left(\mathrm{MVF}+\mathrm{AVF}\right)}} $$

Connectome pre-processing and construction

By using the functional MRI of the Brain Software Library, Version 5.0.9 [32], pre-processing (registration, removing non-brain tissue from three-dimensional T1-weighted images, segmentation [estimation of partial volume fractions of white matter/cortical gray matter/deep gray matter/cerebrospinal fluid], and obtaining gray matter-white matter interface mask) was performed. Then, nodes were obtained according to Desikan-Killiany cortical atlas segmentation [33]. With the MRtrix software package (Brain Research Institute, http://www.brain.org.au/software/), whole-brain tractograms were generated using probabilistic multi-shell, multi-tissue constrained spherical deconvolution tracking from multi-shell diffusion-weighted MRI data. For fiber tracking, data with b-values of 1000 and 2000 s/mm2 were used, and 5 × 107 streamlines were seeded from the white matter fiber orientation distribution function.

To map the NOS-based connectome for each individual, the total NOS interconnecting each pair of regions was enumerated and stored in a connectivity matrix. Then, the NOS interconnecting each pair of nodes was enumerated, resulting in a connectome matrix with 84 × 84 components (Fig. 1a). Furthermore, DTI-, DKI-, NODDI-, and g-ratio-weighted connectome matrices were obtained by multiplying the NOS by the tract-averaged values of each parameter [12]. Therefore, from each patient, a total of 14 connectome matrices (one NOS-based, four DTI parameter-weighted [fractional anisotropy (FA), mean diffusivity, axial diffusivity, and radial diffusivity], three DKI parameter-weighted [mean kurtosis (MK), axial kurtosis (AK), and radial kurtosis (RK)], three NODDI parameter-weighted [intracellular volume fraction (ICVF), orientation dispersion index, isotropic volume fraction], and three g-ratio parameter-weighted [axon volume fraction (AVF), myelin volume fraction, and g-ratio] matrices) were generated.

Fig. 1
figure 1

a NOS-based connectome matrix, which is presented as a color map, of a 68-year-old woman with PD. b Preparation of the input data for deep learning with the convolutional neural network (CNN). Numbers of elements are shown in parentheses. c Input data presented as a color map for the CNN of the same patient as in a. Red and blue colors indicate higher and lower values, respectively

Training of the CNN

Deep learning was performed with a computer equipped with the graphical processing unit of Quadro P5000 (NVIDIA), 64-GB random access memory, and a Core i9-9900K (Intel). To perform deep learning, the programming language, Python 3.6.4, and the deep learning framework, Chainer 4.0.0, were used.

Input data were preprocessed. To reduce the dimension of input data, the connectome matrices were preprocessed. From the original connectome matrix (84 × 84 elements) (Fig. 1a), diagonal elements that represent self-connection were excluded (84 × 84–84 elements) (Fig. 1b). We also extracted only the independent 3486 elements ((84 × 84–84)/2 = 3486). After adding 114 zeros (3486 + 114 = 3600), the data were transformed to 60 × 60 matrix data (Fig. 1c). This 60 × 60 matrix was used as input data for deep learning with CNN. The input data were normalized by dividing by the maximum value within the matrix.

As teaching data, two-element vector data (PD [0, 1] vs. healthy control [1, 0]) were used. The structure of the CNN is illustrated in Fig. 2. Errors between output data and teacher data (PD vs. healthy control) were calculated with softmax cross-entropy. The CNN was updated with a minibatch size of 15 and with an optimizer of AdaGrad [34] so that the error became small. The number of epochs for the training was 20. In the training phase, data augmentation was performed by adding Gaussian noise with a mean of 0 and a standard deviation of 0.002, 0.004, 0.006, 0.008, and 0.010 to the input data. Therefore, data were augmented 6-fold in the training phase of the CNN. The hyperparameters were determined before the main analyses by using NOS-based matrices of PD patients and healthy controls who were excluded from the study due to age and gender matching.

Fig. 2
figure 2

The CNN comprised four convolutional layers with channels of 8, 16, 16, and 32. Because the distance between two elements within the connectome matrix does not represent spatial distance as in general images, we customized the structure of the CNN: a a filter size of 1 was used for all convolutional layers and b max-pooling layers were omitted from the CNN. The processed data were further processed at three fully connected layers with the number of units of 4096, 4096, and 2. Conv, convolutional layer; FC, fully connected layer; ker, numbers of kernels of the convolutional layer; out, numbers of output values of the fully connected layer

Evaluating performance of the trained models

To evaluate the performance of the model, 5-fold cross-validation was performed. Data of patients with PD and healthy controls were randomly divided into five groups. Training and validation were repeated 10 times (trials) in each fold. The performance of the model that showed the best among the 10 trials was recorded for each fold. Then, by pooling the data across 5-fold, the performance of the model was evaluated.

Grad-CAM

To visualize the regions where the developed CNN focused in diagnosing PD, the Grad-CAM technique [24] was applied to the connectome matrices of patients with PD who were diagnosed correctly in the validation. The target layer was set at the 4th convolutional layer. The data for each patient with PD were normalized and then pooled. To assess whether the relations between the dominant side and the neural circuit alteration, subanalyses were also performed by separately pooling the data for patients according to the dominant side (left or right).

Statistics

Statistical analyses were performed with EZR version 1.37 (http://www.jichi.ac.jp/saitama-sct/SaitamaHP.files/statmedEN.html), which is a graphical user interface of R 2.4.0 (https://www.r-project.org/). Continuous and binominal data were compared with the Student t test and chi-square test, respectively. The performance of the CNN model for differentiating patients with PD from healthy controls was evaluated with receiver operating characteristic analyses. Areas under the receiver operating characteristic curve (AUCs) of the model that showed the best performance among each evaluation category (NOS, DTI, DKI, NODDI, g-ratio) were compared across categories with DeLong’s test. Because of multiple comparisons across five categories, a p value less than 0.005 (= 0.05/10, Bonferroni correction) was considered to indicate statistical significance. Sensitivity, specificity, and accuracy in diagnosing PD were calculated using CNN’s output value of 0.5 as a cut-off.

Results

Background of study participants

In this study, 115 patients with PD (mean age with standard deviation, 68.9 ± 6.9 years, 52 men and 63 women) and 115 healthy controls (mean age with standard deviation, 69.8 ± 3.0 years, 61 men and 54 women) were enrolled. Other background information is summarized in Table 1.

Performance of models for diagnosing PD

The diagnostic performance of the trained CNN models for differentiating patients with PD from healthy controls is summarized in Table 2. AK-weighted, MK-weighted, RK-weighted, ICVF-weighted, and AVF-weighted matrices were found to show AUCs of over 0.800 in diagnosing PD (AUC = 0.891, 0.878, 0.895, 0.801, and 0.836, respectively), while NOS-based matrix showed AUC of 0.761. Among the DTI, DKI, NODDI, and g-ratio parameters, FA, RK, ICVF, and AVF performed the best, respectively (AUCs = 0.733, 0.895, 0.801, and 0.836, respectively) (Fig. 3). We compared the AUCs of the model that showed the best performance among each evaluation category (NOS, DTI, DKI, NODDI, g-ratio) (Table 3). The RK-weighted connectome matrix (best among DKI) showed significantly better performance compared with ICVF-weighted (best among NODDI), NOS-based, and FA-weighted (best among DTI) matrices (p = 0.0004, p < 0.0001, and p < 0.0001, respectively). AVF-weighted connectome matrix (best among g-ratio) performed significantly better than the FA-weighted connectome matrix (best among DTI) for diagnosing PD (p = 0.0022).

Table 2 Areas under the receiver operating characteristic curve for diagnosing Parkinson’s disease with the convolutional neural network
Fig. 3
figure 3

Receiver operating characteristic curves to differentiate PD from healthy controls with a NOS-based, b FA-weighted, c RK-weighted, d ICVF-weighted, and e AVF-weighted connectome matrices

Table 3 Comparisons of the pooled areas under the receiver operating characteristic curve for diagnosing Parkinson’s disease with the convolutional neural network

Sensitivity, specificity, and accuracy of models that showed the best performance in each category are shown in Table 4.

Table 4 Sensitivity, specificity, and accuracy of the models (pooled across 5-fold) for diagnosing Parkinson’s disease

Grad-CAM results

Regions in which the trained CNN focused, which were revealed with Grad-CAM, are shown in Fig. 4. Generally, Grad-CAM result images are blurred due to the use of kernel size of more than 1 in convolutional layers and due to the use of max-pooling layers. In this study, the customized structure of our CNN (Fig. 2) allowed pin-point visualization in Grad-CAM analysis. The trained CNN model focused on many connections on the RK-weighted connectome matrix (Fig. 4c) and no specific trend in areas of focus for neural connections was found. On the contrary, ICVF-weighted and AVF-weighted connectome matrices focused on some specific neural connections, which are indicated as reddish color dots in Fig. 4d, e. These neural connections are summarized in Fig. 5 and Table 5. For these two matrices, connections between the basal ganglia and the cerebellum played important roles in discriminating patients with PD from healthy controls.

Fig. 4
figure 4

Results of Grad-CAM, which was reorganized to 84 × 84, for a NOS-based, b FA-weighted, c RK-weighted, d ICVF-weighted, and e AVF-weighted connectome matrices are shown. Left/top and right/bottom halves of the matrix indicate left and right brain regions, respectively. Red and blue colors indicate higher and lower values, respectively, obtained from Grad-CAM analysis

Fig. 5
figure 5

Top 30 neural connections focused on by Grad-CAM analysis for differentiating patients with PD from healthy controls with a ICVF-weighted and b AVF-weighted connectome matrices. The more intensely focused connections are represented in a more reddish color

Table 5 Neural circuit alterations indicated by Grad-CAM analysis in AVF-weighted and ICVF-weighted matrices

Discussion

In this study, by applying the deep learning technique to DKI-, NODDI-, and g-ratio-weighted structural connectome matrices obtained with a 3-T MRI unit, we found that PD could be diagnosed with moderate performance. Deep learning models that were trained with the DKI-weighted connectome matrix performed significantly better than those trained with the NOS-based connectome matrix. By using the Grad-CAM technique, we found that the trained CNN focused on connections between the basal ganglia and the cerebellum for ICVF- and AVF-weighted connectome matrices, whereas no specific trend was found in the areas of focus for neural connections for the RK-weighted connectome matrix for differentiating patients with PD from healthy controls.

Parameter-weighted connectome matrix has been gaining attention recently in the evaluations of neurological diseases [12]. While NOS-based connectome matrix represents only the strength of the connectivity between brain regions, the parameter-weighted matrix can represent other tissue properties. In the current study, DKI parameters-weighted, ICVF-weighted, and AVF-weighted matrices were found to show AUCs of over 0.800 in diagnosing PD, while NOS-based matrix showed AUC of 0.761. The RK-weighted connectome matrix (best among DKI) was significantly superior to the NOS-based connectome matrix. This means that parameter-weighted connectome matrix would be superior to conventional NOS-based matrix in diagnosing PD.

The RK-weighted connectome matrix (best among DKI) was also significantly superior to the FA-weighted (best among DTI) connectome matrices. This finding matches a previous study that reported that parameters derived from DKI analyses were significantly better than those from DTI analyses in diagnosing PD [35]. NODDI parameters represent the density and direction of neurites (axons and dendrites), and they are more specific markers of brain tissue microstructure [6]. There is a previous study which reported that both the DKI and NODDI parameters were useful for diagnosing PD [36]. However, the DKI parameter-weighted matrix performed significantly better than the NODDI parameter-weighted matrix in our study. The difference might have come from the difference in the evaluating methods; in that previous study regions of interest were placed on gray matter in measuring these values. In our study, the trained CNN based on RK-weighted matrix took into consideration many of the neural circuits comprehensively, which was indicated by the Grad-CAM result. According to Braak et al., as the disease progresses from stage 1 to stage 6 in PD patients, the deposition of Lewy bodies is thought to be spread to the upper brainstem and forebrain [37]. Because the PD patients included in this study had relatively long clinical course, we assume that such pathological conditions were present widely in the brain in many PD patients. DKI model, which quantifies non-Gaussian water diffusion [5], might be better to capture such pathological changes of PD than NODDI model when parameters derived from these models are used to weigh connectome matrix.

Little is known regarding g-ratio analysis for diagnosing PD. With g-ratio analysis, AVF, myelin volume fraction, and the ratio of the inner to outer myelinated axon diameter (g-ratio) can be derived [7]. AVF is an indicator of altered axons like ICVF in NODDI model [6, 7]. In this study, only ICVF and AVF showed performance with AUC of over 0.80 among the NODDI and g-ratio analysis, respectively. This does not conflict with a previous study [25], which reported reduction of ICVF in white matter indicating axonal degeneration in PD.

Traditionally, PD is characterized by degeneration of mesencephalic dopamine neurons. However, it has become known that several other brain regions and connections are known to be affected in PD [3]. A connection between basal ganglia and cerebellum is one of these connections in PD patients [38]. In previous studies, by using functional MRI in patients with PD, alterations in connectivity strength between basal ganglia and cerebellum have been suggested [39, 40]. In the current study, we found that neural connections between the basal ganglia and the cerebellum were focused on by ICVF and AVF-weighted connectome matrices with the Grad-CAM technique. The alterations of axons along those neural circuits might possibly be associated with alterations of function.

In this study, we customized the structure of the CNN, because we thought that the connectome matrix is different from general images in that distances between elements of matrices do not represent spatial distances. We used a filter size of 1 in all the convolutional layers. In addition, we avoided implementing max-pooling layers, which extract max values within H × H pixels (H = 2 or more). The customized CNN was found to be applicable to the imaging diagnosis with connectome matrices. While Grad-CAM result images are blurred generally [24], the customized structure of the CNN enabled pin-point visualization in Grad-CAM as shown in Fig. 4. The developed method would have a potential to identify neural circuit disorders in other neurological and psychiatric diseases.

This study has some limitations. First, although the Grad-CAM results indicated abnormalities in some of the neural circuits, the abnormalities were not proven histopathologically. Future analyses are required to consolidate the Grad-CAM findings obtained in this study. Second, because data from a large number of patients with PD were required to manage the overfitting problem associated with deep learning, patients with PD with variable disease durations (mean disease duration was over 10 years) were included in this study. Therefore, early diagnosis of PD is not possible with our models. It would also be possible that midbrain atrophy, which can be seen in some late-stage cases of PD, might have affected the performance of the model. Third, the model’s performance was not externally validated. Our study indicates that once the model is developed, it can be used to diagnose PD in the same institution. However, the model’s performance may not be as high as we reported when the method is applied to data obtained in other institutions. Fourth, connectome matrix can be constructed by using other data, such as functional MRI. However, we focused on analyzing connectome matrices based on diffusion MRI in this study. Application of this technique to connectome matrix based on functional MRI of PD patients’ needs to be investigated in the future. And finally, because we aimed to investigate the abnormalities in the neural circuits of PD compared with healthy controls, other diseases that show parkinsonism were not included in this study. Therefore, differentiation of PD from such diseases is not possible with our models. Diseases which show parkinsonism still needs to be differentiated by integrating clinical information and some test findings [41].

In conclusion, PD can be diagnosed by applying deep learning to DKI-, NODDI-, and g-ratio-weighted connectome matrices with moderate performance. Deep learning models trained with the DKI-weighted connectome matrix performed significantly better than those trained with the NOS-based connectome matrix. Alterations in neural connections between the basal ganglia on one side and the cerebellum on the contralateral side in PD patients were indicated by applying Grad-CAM technique to g-ratio-weighted and NODDI-weighted matrices.