Deep learning–based denoising of low-dose SPECT myocardial perfusion images: quantitative assessment and clinical performance

Aghakhan Olia, Narges; Kamali-Asl, Alireza; Hariri Tabrizi, Sanaz; Geramifar, Parham; Sheikhzadeh, Peyman; Farzanefar, Saeed; Arabi, Hossein; Zaidi, Habib

doi:10.1007/s00259-021-05614-7

Deep learning–based denoising of low-dose SPECT myocardial perfusion images: quantitative assessment and clinical performance

Original Article
Open access
Published: 15 November 2021

Volume 49, pages 1508–1522, (2022)
Cite this article

Download PDF

You have full access to this open access article

European Journal of Nuclear Medicine and Molecular Imaging Aims and scope Submit manuscript

Deep learning–based denoising of low-dose SPECT myocardial perfusion images: quantitative assessment and clinical performance

Download PDF

Narges Aghakhan Olia¹,
Alireza Kamali-Asl¹,
Sanaz Hariri Tabrizi¹,
Parham Geramifar²,
Peyman Sheikhzadeh³,
Saeed Farzanefar³,
Hossein Arabi⁴ &
…
Habib Zaidi ORCID: orcid.org/0000-0001-7559-5297^4,5,6,7

3892 Accesses
24 Citations
Explore all metrics

Abstract

Purpose

This work was set out to investigate the feasibility of dose reduction in SPECT myocardial perfusion imaging (MPI) without sacrificing diagnostic accuracy. A deep learning approach was proposed to synthesize full-dose images from the corresponding low-dose images at different dose reduction levels in the projection space.

Methods

Clinical SPECT-MPI images of 345 patients acquired on a dedicated cardiac SPECT camera in list-mode format were retrospectively employed to predict standard-dose from low-dose images at half-, quarter-, and one-eighth-dose levels. To simulate realistic low-dose projections, 50%, 25%, and 12.5% of the events were randomly selected from the list-mode data through applying binomial subsampling. A generative adversarial network was implemented to predict non-gated standard-dose SPECT images in the projection space at the different dose reduction levels. Well-established metrics, including peak signal-to-noise ratio (PSNR), root mean square error (RMSE), and structural similarity index metrics (SSIM) in addition to Pearson correlation coefficient analysis and clinical parameters derived from Cedars-Sinai software were used to quantitatively assess the predicted standard-dose images. For clinical evaluation, the quality of the predicted standard-dose images was evaluated by a nuclear medicine specialist using a seven-point (− 3 to + 3) grading scheme.

Results

The highest PSNR (42.49 ± 2.37) and SSIM (0.99 ± 0.01) and the lowest RMSE (1.99 ± 0.63) were achieved at a half-dose level. Pearson correlation coefficients were 0.997 ± 0.001, 0.994 ± 0.003, and 0.987 ± 0.004 for the predicted standard-dose images at half-, quarter-, and one-eighth-dose levels, respectively. Using the standard-dose images as reference, the Bland–Altman plots sketched for the Cedars-Sinai selected parameters exhibited remarkably less bias and variance in the predicted standard-dose images compared with the low-dose images at all reduced dose levels. Overall, considering the clinical assessment performed by a nuclear medicine specialist, 100%, 80%, and 11% of the predicted standard-dose images were clinically acceptable at half-, quarter-, and one-eighth-dose levels, respectively.

Conclusion

The noise was effectively suppressed by the proposed network, and the predicted standard-dose images were comparable to reference standard-dose images at half- and quarter-dose levels. However, recovery of the underlying signals/information in low-dose images beyond a quarter of the standard dose would not be feasible (due to very poor signal-to-noise ratio) which will adversely affect the clinical interpretation of the resulting images.

A Review of Deep Learning CT Reconstruction: Concepts, Limitations, and Promise in Clinical Practice

Article Open access 27 July 2022

The evolution of image reconstruction for CT—from filtered back projection to artificial intelligence

Article Open access 30 October 2018

Deep Learning for Image Enhancement and Correction in Magnetic Resonance Imaging—State-of-the-Art and Challenges

Article Open access 02 November 2022

Introduction

Single-photon emission computed tomography (SPECT) is a widely used molecular imaging modality in various clinical domains, including the assessment of cardiovascular diseases [1]. SPECT myocardial perfusion imaging (MPI) is an effective non-invasive method for the diagnosis of coronary artery disease, predicting disease progression, and evaluating acute coronary artery syndromes [2, 3]. To achieve high-quality images in nuclear medicine, a sufficient dose of radiopharmaceuticals should be injected. Reducing the injected dose beyond the prescribed limit would lead to poor signal-to-noise ratio (SNR) and low-quality images, thus hampering diagnostic performance [4, 5].

Since SPECT is considered the second leading contributor to radiation dose among medical imaging modalities (with approximately 90% stress imaging studies performed annually in the USA), concerns about the radiation risks of this imaging modality have increased [6,7,8]. Multiple studies have been conducted to cope with the challenge of reducing the injected activity of radiopharmaceuticals in nuclear medicine imaging without sacrificing the diagnostic/clinical value. The proposed strategies fall into four categories: statistical iterative image reconstruction, post-reconstruction filtering or post-processing, recent advances in hardware, and machine learning techniques [8, 9].

Iterative image reconstruction algorithms formulate low-dose image reconstruction as a convex optimization problem and suppress noise through statistical modeling of the signal formation and noise. Advanced iterative image reconstruction algorithms have shown that the injected dose or acquisition time could be decreased by a factor of two or higher in SPECT-MPI imaging [10,11,12,13,14]. In this regard, Ramon et al. quantified the accuracy of perfusion-defect detection in SPECT-MPI images as a function of the injected dose to minimize the administrated dose without sacrificing diagnostic performance [12]. The other approaches rely on different post-processing and/or post-reconstruction denoising techniques, including nonlocal mean (NLM) or bilateral filters to suppress the noise in low-dose images [15,16,17]. Recently, innovative designs of collimators and SPECT cameras as well as novel algorithms were mainly designed to reduce scanning time or injected activity while preserving underlying information and clinical values. Scintillation crystals equipped with PMTs and parallel-hole collimators employed on conventional dual-head SPECT systems have limited performance owing to low resolution and sensitivity, and commonly require long data acquisition time, high administrated dose, etc. Dedicated cardiac SPECT instrumentation has witnessed tremendous improvements over the last few years. New dedicated commercial ultrafast solid-state cardiac cameras (DSPECT and GE 530c/570c) enable low-dose diagnostic quality imaging [8, 18,19,20]. In addition to the aforementioned methods, which to some extent enable the recovery of the underlying signals/structures in low-dose images, deep learning algorithms have exhibited promising performance/potential in directly estimating/predicting high-quality standard-dose images from the corresponding low-dose images [21].

It has been shown that various types of deep neural networks are capable of suppressing the noise in low-dose computed tomography (CT) as well as positron emission tomography (PET) images leading to dependable estimation of the standard-dose images [9, 22,23,24,25,26,27,28,29]. Likewise, a number of studies have been conducted in the field of low-dose SPECT-MPI. In this regard, Ramon et al. demonstrated the feasibility of using several 3D convolutional denoising networks for SPECT-MPI denoising in the image domain at 1/2, 1/4, 1/8, and 1/16 of standard clinical dose levels [30]. Song et al. investigated a 3D residual convolutional neural network (CNN) model to predict standard-dose images from 1/4-dose gated SPECT-MPI images [31]. Shiri et al. evaluated the potential of acquisition time reduction in SPECT-MPI using a residual network (ResNet) [32]. They followed two different approaches, namely, reducing the number of projections and reducing the acquisition time per projection.

The aim of this study is to reduce the administrated activity while preserving crucial/underlying structures without losing diagnostic accuracy and clinical value of SPECT-MPI images. Taking advantage of the remarkable success of deep neural networks in the field of image processing/synthesis [33], we propose an end-to-end image translation approach to denoise low-dose SPECT-MPI in the projection domain. This work employs a deep generative adversarial network (GAN) model to estimate standard-dose images from the corresponding 1/2, 1/4, and 1/8 low-dose images in an attempt to determine which reduced dose-level could be recovered by the GAN model with minimal loss of image quality and clinical value. Moreover, a comprehensive clinical assessment is conducted to assess the clinical value of deep learning–assisted prediction of standard-dose from corresponding low-dose SPECT-MPI.

Materials and methods

Data acquisition

This prospective single-institution study was approved by the institutional ethics committee, and all patients gave written informed consent. SPECT-MPI data were acquired for 345 patients (193 female and 152 male) scanned on the ProSPECT (Parto Negar Persia, Iran), a dedicated cardiac SPECT camera with dual-head fixed 90° angle detectors. Each head in the ProSPECT camera consists of a 40 × 25 cm² thallium-activated sodium iodide (NaI(Tl)) scintillation crystal with 9.5-mm thickness and a lightweight low-energy high-resolution (LEHR) collimator with 35-mm thickness. The scintillation detector is coupled to a square array of 24 photomultiplier tubes (76 × 76 mm) which are optically connected to fused-quartz light-guide with a thickness of 20 mm. A silicon-based curing compound is employed as optical glue. Based on NEMA standards, the system spatial resolution without scatter with LEHR collimator at 10 cm from the surface of the detector, energy resolution within the useful field-of-view (UFOV), and sensitivity are 7.6 mm, 9.5%, and 79 cps/MBq, respectively [34]. To prevent radiopharmaceutical re-injection, data acquisition was carried out in list-mode format to simulate the corresponding low-dose images. Using a 2-day rest/stress acquisition protocol, image acquisition was conducted approximately 1 h after injection of 814 ± 111 MBq of ^99mTc-sestamibi. To reduce breast tissue and diaphragm attenuation, women and men underwent supine and prone imaging, respectively. The acquisition protocol consisted of 32 projections with 20 to 25 s per projection from the right anterior oblique (RAO) to the left posterior oblique (LPO). According to the synchronized electrocardiography (ECG) signal collected during acquisition, the detected photons were split into 8 gate intervals during a cardiac cycle.

To simulate half-dose, quarter-dose, and one-eighth-dose acquisitions, regardless of the temporal information, the number of detected photons was reduced by applying a binomial subsampling. In this subsampling method, each registered photon in the projection space would be either kept or rejected through a probability function mimicking the different low-dose levels.

The software provided with the ProSPECT camera was employed to convert the list-mode data to non-gated projection data (64 × 64 × 32 voxels) and gated projection data (64 × 64 × 256 voxels) with a voxel size of 6.4 × 6.4 × 6.4 mm³.

Data preparation

Since the count rate from the liver absorption in SPECT-MPI is relatively high, projection images were manually cropped by a nuclear medicine physician to exclude the liver from cardiac images. Fifteen patients were excluded from the dataset since it was not possible to distinguish between the heart and liver. Projection data of 295 patients were randomly selected as training dataset, whereas the remaining 35 patients were used as an external test dataset to assess the performance of the GAN model. According to the clinical indication and reporting of SPECT-MPI, the patients were divided into four groups: healthy, low-risk, intermediate-risk, and severe-risk. In this light, the test dataset included 8, 16, 6, and 5 samples from these groups, respectively, to fairly evaluate the network performance.

Deep network architecture

The GAN architecture is composed of a generator network to predict/estimate standard-dose images and a discriminator network that classifies the synthesized images as real or fake [35]. These networks are trained concurrently in an adversarial process to compete with each other. The discriminator weights are updated independently, while the generator model is updated via the discriminator feedback (Supplemental Figure 1).

Generator network

The generator network in this architecture is an encoder-decoder model (U-Net) (Fig. 1). This model utilizes low-dose images as input to estimate standard-dose images; it encodes the input image to the bottleneck layer, then decodes the data from the bottleneck layer to synthesize the output image. In this network, skip connections are used between the corresponding encoder and decoder layers.

In the encoding path, the input layer is followed by six encoder blocks. The numbers of 4 × 4 kernels with stride 2 in the encoder blocks are 64, 128, 256, 512, 512, and 512. In the second to fourth encoder blocks, the batch normalization layer is used after the convolutional layers. These layers are followed by the Leaky ReLU (with a slope of 0.2) activation function in the first five encoder blocks. Likewise, the ReLU activation function is used in the sixth encoder block.

After the bottleneck layer, six decoder blocks are used in the decoding path. In these blocks, the number of feature maps decreases from 512 to 1 according to the defined encoders. Each block consists of 4 × 4 kernel in the deconvolution layer by a stride of 2 in each direction, followed by a batch normalization layer. In the first five decoders, skip connections are used to concatenate the data from each layer in the encoder path to the corresponding layer in the decoder path. These shortcut connections are aimed to prevent the gradient vanishing issue that may occur in complicated deep neural networks. Finally, concatenated results are passed through a ReLU activation function. In the last decoder, the defined deconvolutional layer is followed by the sigmoid activation function. Empirically, in the first decoder block, we use a drop-out layer to prevent overfitting. Due to the fact that the pooling layers reduce the spatial resolution of the input images, these layers were not considered in this architecture to avoid any feature/information loss throughout the synthesis process.

The generator is updated via a weighted sum of both the adversarial loss and the L2-norm loss. The update of the trainable parameters is carried out to minimize the L2-norm loss calculated between the predicted standard-dose and the reference standard-dose images. The L2-norm loss was selected as it provided high-quality synthesis of the standard-dose SPECT images. Besides, through using adversarial loss, the generator weights are updated to minimize the loss of the discriminator (to better distinguish between real or fake samples) leading to overall better performance of the GAN model to produce more realistic images. Within the training process, a weighting factor of 100/1 was optimized in favor of the L2-norm loss, leading to overall peak performance of the GAN model.

Discriminator network

The discriminator network, serving as an image classifier, takes low-dose and standard-dose images (both reference and synthesized) as inputs to determine whether the input standard-dose image is real or fake translation of the low-dose image. Figure 2 illustrates the architecture of the discriminator. The network consists of a concatenate layer and five convolutional blocks. The number of 4 × 4 kernels with stride 2 applied in the first convolutional block is 48, and this number is doubled at each three following convolutional blocks, while the stride step in the fourth convolutional block becomes 1. The 2D convolutional layer is followed by the batch normalization layer and Leaky ReLU (with a slope of 0.2) activation function in each of the four convolutional blocks. Finally, the data is passed through a 1 × 1 single-filter convolutional layer, a batch normalization layer, and a sigmoid activation function. The binary cross-entropy loss function was used for the training of the model with about 50 epochs.

The network was implemented using the Keras deep learning framework based on the TensorFlow libraries in Python 3.7. All the experiments were carried out on NVIDIA GeForce GTX 1060 with a 6 GB memory graphical processing unit. Adaptive moment estimation (Adam) optimizer with a learning rate of 0.001 was used to minimize the loss functions.

Image reconstruction

The low-dose projection data (for 1/2, 1/4, and 1/8 levels) obtained from random sampling of the list-mode acquisition were employed for the training of the GAN model considering the standard-dose projection data as a reference. The model was trained and evaluated separately for non-gated half-dose to standard-dose, non-gated quarter-dose to standard-dose, non-gated one-eighth-dose to standard-dose, and gated half-dose to standard-dose.

Standard-dose, low-dose, and predicted standard-dose projection data from the test dataset were reconstructed using OSEM algorithm (8 iterations and 2 subsets) and the Cedars-Sinai software used to orient the images in three standard cardiac planes; short-axis (SA), vertical long-axis (VLA), and horizontal long-axis (HLA). Furthermore, we applied a post-smoothing Butterworth filter with order = 10 and cutoff = 0.45.

Assessment strategy

Quantitative analysis

The quality of predicted standard-dose data, either in the projection or image space, was assessed using standard quantitative metrics, including peak signal-to-noise ratio (PSNR), root mean square error (RMSE), and structural similarity index metrics (SSIM) given in Eqs. 1, 2, and 3, respectively, considering the standard-dose data as a reference. Moreover, these metrics were also calculated for the low-dose images to provide a baseline for performance assessment of the GAN model.

$$\mathrm{PSNR}\left(\mathrm{dB}\right)=20 {\mathrm{log}}_{10}\left(\frac{\mathrm{Peak}}{\mathrm{MSE}}\right)$$

(1)

$$\mathrm{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({y}_{i}-\tilde{y })}^{2}}$$

(2)

$$\mathrm{SSIM}=\frac{(2{\mu }_{y}{\mu }_{\tilde{y }}+{C}_{1})(2{\delta }_{y.\tilde{y }}+{C}_{2})}{({\mu }_{\mathrm{y}}^{2}+{\mu }_{\tilde{y }}^{2}+{C}_{1})({\delta }_{\mathrm{y}}^{2}+{\delta }_{\tilde{y }}^{2}+{C}_{2})}$$

(3)

In Eq. (1), Peak indicates the maximum count of either predicted standard-dose or low-dose data, and MSE stands for mean squared error. In Eq. (2), n and i denote the total number of voxels and voxel index, respectively. y indicates the standard-dose data and ỹ is either the synthetic or low-dose data.${\mu }_{y}$ and ${\mu }_{\tilde{y }}$ in Eq. (3) denote the mean values of the reference and synthetic/low-dose images, respectively. ${\delta }_{\mathrm{y}.\tilde{y }}$ indicates the covariance of ${\delta }_{y}$ and ${\delta }_{\tilde{y }}$, which in turn represent the variances of the standard-dose and predicted standard-dose/low-dose images, respectively. The constant parameters C₁ and C₂ (C₁ = 0.01 and C₂ = 0.02) were set to avoid division by very small values.

Cedars-Sinai quantitative analysis

Extent, summed stress percent (SS%) or summed rest percent (SR%), summed stress score (SSS) or summed rest score (SRS), total perfusion deficit (TPD%), volume, wall, shape eccentricity, and shape index were calculated using quantitative perfusion SPECT (QPS) package implemented in Cedars-Sinai software. The abovementioned metrics were calculated on the reconstructed reference, low-dose, and predicted standard-dose SPECT images using the standard reconstruction settings used in clinical routine. Bland–Altman plots were sketched to describe the agreement between the predicted standard-dose/low-dose and reference standard-dose images. Finally, the Pearson correlation coefficient was computed for the derived parameters according to Eq. (4).

$$\rho =\frac{\sum_{i=1}^{n}({y}_{i}-{\mu }_{y})({\tilde{y }}_{i}-{\mu }_{\tilde{y }})}{\sqrt{\sum_{i=1}^{n}{({y}_{i}-{\mu }_{y})}^{2}}\sqrt{\sum_{i=1}^{n}{({\tilde{y }}_{i}-{\mu }_{\tilde{y }})}^{2}}}$$

(4)

Clinical evaluation

The summed score (SS) parameter was calculated for the low-dose, predicted standard-dose, and reference standard-dose reconstructed images in the test dataset by a nuclear medicine physician. Subsequently, a scoring scheme ranging from − 3 to + 3 was employed to express diagnostic differences in the predicted standard-dose/low-dose SPECT images with respect to the standard-dose ground truth, wherein 0 is equivalent to no diagnostic changes, and ± 3 is equivalent to considerable changes compared to the reference standard-dose data. Positive numbers indicate higher tracer uptake, whereas negative numbers indicate lower tracer uptake compared to the reference standard-dose images. Finally, the Pearson correlation coefficient was calculated between the reference standard-dose and the predicted standard-dose /low-dose images.

Results

Qualitative assessment

The predicted standard-dose SPECT MPI in both projection and image domains exhibited considerable improvement in image quality compared to the low-dose images. Figure 3 depicts the predicted non-gated standard-dose projections for the different low-dose levels. The visual inspection revealed that at half-dose, compared to the quarter-dose and one-eighth-dose, the GAN model achieved nearly similar image quality as the reference standard-dose images. Image quality improvement is apparent for the predicted projections at quarter-dose level. However, increased signal loss is observed in the predicted projections from one-eighth-dose data. Figure 4 displays the SA, VLA, and HLA views of the reconstructed non-gated SPECT-MPI, including reference standard-dose, low-dose, and predicted standard-dose for a representative patient with severe-risk diagnosis. It can be seen that the noise is appropriately suppressed at different reduced dose levels, where the LV wall appears more uniform/natural. Overall, the predicted SPECT images exhibited good agreement with the reference standard-dose images, whereas notable signal loss and/or noise-induced pseudo-signals were observed in the low-dose images. The reconstructed non-gated images for patients diagnosed with normal perfusion, low-risk, and intermediate-risk are presented in Supplemental Figures 2-4.