1 Introduction

The coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2, originated in China in December 2019 and rapidly spread to 1,000 people in only 48 days [1]. As of June 2021, there are 175 million global cases and 3.75 million deaths spanning more than 192 countries/regions (http://coronavirus.jhu.edu/map.html).

Reverse transcription-polymerase chain reaction (RT-PCR) is the gold standard diagnostic method for COVID-19 [2]. However, RT-PCR may take several days to generate results [3], and in initial disease presentation and asymptomatic cases [4, 5], has shown relatively low sensitivity (\(71\%\)) [6]. Because of these challenges, researchers have explored other methods to classify COVID-19 patients. Chest computed tomography (CT) plays a key role in evaluating the extent of pulmonary involvement and prognostication [7]. Using visual inspection of 51 COVID-19 patients who underwent chest CT and RT-PCR testing within 3 days, Fang et al. [8] achieved classification sensitivity of \(98\%\) from chest CT. In another study [9] of 1014 patients suspected of COVID-19, \(59\%\) had positive RT-PCR results, while \(88\%\) had positive chest CT scans, determined by radiologists. The sensitivity of chest CT to classify COVID-19 was \(97\%\). Thus, CT may exhibit higher sensitivity than RT-PCR. Additionally, CT is widely accessible and already routinely used in hospitals for high-resolution views of lung features [10]. Given the many advantages, CT has become a promising complementary tool for identifying and monitoring COVID-19.

Numerous COVID-19 studies [9, 11,12,13,14] have proven that chest CT characteristics accurately correlate with patient conditions. Some CT features detected in COVID-19 cases are ground-glass opacities (GGO), consolidative opacities, crazy paving pattern, linear consolidations, and reverse halo sign. In advanced stages, the number of lesions increases significantly with newly formed GGO and superimposed consolidation. Also, predominant consolidation and crazy paving patterns can be suggestive of late/advanced stages [15] and GGOs with consolidation decrease in the recovery stage. Furthermore, in severe phases infected areas may appear in all segments of the lungs, expressing as “white lung” [16]. Since distinct chest CT features and extent of involvement relate to COVID-19 severity stage, quantitative assessment of patient severity may be useful for accurate diagnosis. However, to achieve applicable results, region of interest (ROI) segmentations must be precise and easily computed.

Several COVID-19 studies have used ROI segmentation to classify COVID-19 and determine COVID-19 severity [11,12,13] with deep learning (DL)-based systems [17]. Recently introduced DL methods for classification and segmentation have achieved outstanding results. Fan et al. [18] developed Semi-Inf-Net, pre-trained on 1600 CT images with pseudolabels and fine-tuned on 50 CT images with ground-truth (GT) labels, to segment infected regions from chest CT, yielding a dice similarity coefficient of \(73.9\%\) and a specificity rate of \(96.0\%\). MiniSeg [19], another DL method for automatic segmentation of ROI, was evaluated on 100 COVID-19 CT slices of 60 patients and achieved a dice similarity coefficient of \(75.91\%\) and a specificity rate of \(97.72\%\). Even though these DL methods show impressive advances in artificial intelligence-based systems, they are limited because GT lung ROI segmentations are often lacking: manual annotation and segmentation are time-consuming and expertise-dependent tasks and are susceptible to subjectivity driven by individual methodologies [17, 20]. As a result, the analysis of COVID-19 CT findings has been mainly limited to qualitative or semi-quantitative evaluations [14]. In addition, DL algorithms are often trained on datasets acquired by the same CT machine and are annotated by the same radiologists [21]. This limits the model’s usage because it becomes dataset-specific and lacks generalizability.

For these reasons, we propose a threshold-based semiautomatic segmentation method to generate ROI segmentations from lung CT available on COVID-19 Data Archive (COVID-ARC) (https://covid-arc.loni.usc.edu) which curates and disseminates multimodal and longitudinal datasets related to COVID-19 [22]. This segmentation method is used to calculate the percentage of lung abnormality (PLA) to determine COVID-19 severity and improve analysis of disease progression in follow-up CT. It is primarily based on the image segmentation method of thresholding. Using Yen [23], IJ Isodata (ImageJ’s [24] default thresholding method adapted from Ridler et al. [25]), and Region Adjacency Graph (RAG) methods in combination with gray-level binary thresholding, we developed an efficient image processing pipeline for ROI segmentation in lung CT. We compared our method with seven existing thresholding methods to determine the best fit methods.

Fig. 1
figure 1

Diagram of the pipeline: first a preprocessed lung segmentation of a CT image is inputted, and then, Yen thresholding is implemented on the preprocessed input image. After that the lung mask of the CT image is used to generate the percentage of lung abnormality (PLA). If the PLA is \(100\%\) then IJ Isodata thresholding is implemented on the lung segmentation image, else if PLA is less than \(100\%\) the Yen thresholded binary image is used for next steps. Then the Region Adjacency Graph method and gray-level binary thresholding are implemented to finally generate the ROI segmentation

2 Materials and Methods

2.1 Dataset

We applied our segmentation methodology to heterogeneous datasets to assess generalizability. We combined 2 datasets containing CT images of COVID-19-positive patients and segmented ROI images which we used as GT images to evaluate the results. These datasets were accessed from COVID-ARC (https://covid-arc.loni.usc.edu/). Dataset 1 was collected by the Italian Society of Medical Radiology and included 9 CT images, 9 lung masks, and 9 ROI masks (http://medicalsegmentation.com/covid19). Dataset 2 was collected from Wenzhou Medical University in Wenzhou, China, and Netanya, Israel: 19 CT images, 19 lung masks, and 19 ROI segmentation images [26]. Considering the two datasets, we used a total of 84 images from 28 patients: 28 CT images, 28 lung masks, and 28 ROI segmentation images. There are still limited publicly available annotations despite the drive for publicizing datasets; therefore, we evaluated the pipeline on a small, annotated dataset. Additionally, all GT ROI segmentations were performed by radiologists from each source.

2.2 Lung Segmentation and Preprocessing

Lung segmentation is an essential preprocessing step because the thoracic tissues surrounding the lungs and other anatomical structures on a CT image must be removed to allow for accurate analysis of lung features. To generate lung segmentation of each CT image, we first created lung segmentation masks using the lungmask command line tool that provided the trained U-net (R231CovidWeb) model [27]. The lung masks were then used to extract the lung areas from the CT image to process the lung segmentation. As shown in Fig. 1, which represents the different steps to obtain ROI segmentation for each image, the lung segmentation image is first inputted into the pipeline.

2.3 Image Thresholding

A threshold value is used to classify pixel values as either 0 or 1, and in our case, respectively, the normal lung tissue or ROI. We explored the intensity pixel distributions of the lung segmentations identifying which images had a unimodal intensity histogram and which ones had a multimodal intensity histogram. Therefore, we differentiated the images into two groups depending on their intensity histograms: images of type I (IM1) that have unimodal histograms and images of type II (IM2) that have multimodal histograms. To determine the best fit thresholding method for each image type (IM1 and IM2), we evaluated seven thresholding methods and compared the generated PLA with the corresponding GT PLA. The methods we evaluated are Huang [28], Ridler [25], Li [29], Kapur [30], Otsu [31], Yen, and IJ Isodata. These were chosen based on their outstanding performances in converting gray-scale images to binary, and more detailed information about each algorithm is discussed by Sezgin et al. [32]. In the first binarization, all seven thresholding methods are implemented onto the image using ImageJ. First in ImageJ, we uploaded an image in 8-bit format (Image/Types/8-bit) and used the Process/Binary/Options command to select Dark Background. Next to generate a threshold-based binary image, we used the threshold tool (Image/Adjust/Threshold) to select Black Background and then chose a thresholding method and selected Apply to generate the binarized image implemented with the automatically generated optimal method-based threshold value.

2.4 Region Adjacency Graph

After the first binarization, segmentation of the ROI is partially completed since there are other features of the lungs present along with the ROI, which would affect the calculation of PLA or other analyses of the ROI segmentation. To generate an accurate segmentation of only the ROI, RAG thresholding is implemented onto the binary image. This is a region connecting algorithm, which uses graph representation to describe relationships between different regions [33].

The graph connects different superpixels through an edge, and each superpixel region is considered a vertex. The edges between superpixels are colored depending on their weights corresponding to similarities between the regions [34, 35]. The regions covered with dark edges have similar pixel characteristics, while others covered with light-colored edges have different pixel characteristics. Each region \(R_i\) is characterized by two parameters given by Eqs. (1,2): \(\mu _{i}\) is the mean intensity of the set of pixels in the region and \(\sigma _{i}\) is the standard deviation:

$$\begin{aligned}&\mu = {\frac{1}{N} \sum _{i=1}^{N} x_i} \end{aligned}$$
(1)
$$\begin{aligned}&\sigma = \sqrt{\frac{1}{N} \sum _{i=1}^{N} (x_i - \mu )^2} \end{aligned}$$
(2)

At each edge \(e_{ij}\) corresponds to a pair of adjacent regions (\(R_i\),\(R_j\)) and an intensity distance \(d^2\)(\(R_i\),\(R_j\)), given by the Fisher distance in Eq. (3), which is used to measure the similarity between the intensity distributions of two regions and to define the weight of \(e_{ij}\) connecting two adjacent regions.

$$\begin{aligned} d^2(R_i, R_j)\! = d^2(\overrightarrow{\mu _{i}}\!,\overrightarrow{\mu _{j}}\!) = \frac{(N_{Ri} + N_{Rj}) \Vert \overrightarrow{\mu _{i}},\overrightarrow{\mu _{j}}\Vert ^2}{N_{Ri} \sigma _{i}^2 + N_{Rj} \sigma _{j}^2}\nonumber \\ \text{ if } \sigma _i \ne 0 \text{ and } \sigma _j \ne 0 \end{aligned}$$
(3)

where \(N_{Ri}\) and \(N_{Rj}\) border pixels that were removed are converted to gray-scale pixels. Since the gray-scaled image may cause pixel miscalculations, a gray-level binarization is used to generate accurate segmentations.

2.5 Calculations

Pixel densities are calculated using the image processing program, ImageJ [24], which calculates pixel density without manually segmenting the ROI. The lung mask is also used to calculate the pixel density of the total lung. After calculating the pixel densities, we used the mathematical expression shown in Eq. (4) to calculate COVID-19 PLA:

$$\begin{aligned} \text{ PLA } = \frac{\text{ ROI } \text{ pixel } \text{ density }}{\text{ total } \text{ lung } \text{ pixel } \text{ density }} \times {100} \end{aligned}$$
(4)

Using this equation, we calculated the PLA for each CT image in the dataset.

Several evaluation metrics were computed to evaluate the pixels correctly and not correctly classified in terms of true positive (TP), true negative (TN), false positive (FP), and false negative (FN), where positive pixels are the pixels affected by the infection and negative pixels are uninfected pixels. The Dice similarity coefficient is the overlap index which represents the similarity between the GT and the prediction maps [36]. The Matthews correlation coefficient (MCC) measures the similarity of the GT PLA and the calculated PLA with values between -1 and +1 [37]. Positive values indicate strong similarities between the GT PLA and the calculated PLA, thus corresponding to a good model, and negative values indicate weak similarities as the result of a poor model. Precision measures the reproducibility of the method and is the number of correct positive results divided by the number of positive results predicted by the method. Sensitivity is the proportion of positive pixels that are correctly considered positive with respect to all positive pixels. Finally, specificity is the proportion of negative pixels that are correctly considered negative with respect to all negative pixels.

3 Results

3.1 Evaluation of Best Fit ImageJ Thresholding Methods

Table 1 shows the optimal thresholds for accurate binarization results and the corresponding PLA difference from GT for the different thresholding methods and for the two types of images (IM1 and IM2) analyzed in this study. As shown in Fig. 2, when only observing the generated ROI segmentations, five distinct methods (Ridler, Kapur, Otsu, Yen, and IJ Isodata) seem to give images with indistinguishable physical characteristics from the others, as a result of the similar threshold values for IM1s. Indeed, as listed in Table 1, both Ridler and Otsu shared a threshold value of 87, while Kapur generated 97, Yen was 86, and IJ Isodata generated 117. After comparing the calculated PLA with the GT PLA for each method, we found that out of the seven methods, the Yen thresholding method generated the most accurate segmentations for IM1 images with a difference of \(1.53\%\) from the GT segmentations and IJ Isodata generated the most accurate segmentations for IM2 images with a difference of \(3.24\%\) from the GT segmentations. Therefore, to generate the initial binary images in the proposed method, Yen for IM1 images and IJ Isodata for IM2 images are the best fit methods generating the most accurate threshold value.

Table 1 Comparison of thresholding methods using ImageJ
Fig. 2
figure 2

Illustration of the performance of seven thresholding methods on image types I and II with threshold-based binary images in the first row and ROI segmentation images in the second row. On the far right column is the corresponding ground truth ROI segmentation

3.2 Quantitative Segmentation Results

To evaluate the performance of our method, we compared our segmentation method with three commonly used segmentation methods based on three different deep neural network-based models: Inf-Net [18], Semi-Inf-Net [18], MiniSeg [19], nCoVSegNet [38], and Oulefki [39]. Table 2 shows the statistics of the evaluations. Oulefki was only evaluated with 19 CT images from the dataset, and the other four methods were evaluated on all images from the dataset. Compared with the five models, the proposed method achieved the highest precision (\(47.49\%\)) and specificity (\(98.40\%\)) scores.

Table 2 Evaluation metrics (\(\%\))

3.3 Qualitative Segmentation Results

To further evaluate the performance of our method, we compared the qualitative ROI segmentation results of our method to the three other DL methods. Figure 3 shows six representative examples of segmentation results, and our method consistently generates results that outperform other DL methods’.

Fig. 3
figure 3

Comparison of ROI segmentation results

Fig. 4
figure 4

Graph with percentage of lung abnormality computed with ground truth segmentation images (blue) and segmentation images generated using the proposed method (orange) (Color figure online)

Fig. 5
figure 5

Display of three representative patients from the dataset. The first column shows a lung CT image and the second column shows the lung segmentations used to generate the ROI segmentations in the third column

3.4 Quantitative Validation

To validate the performance, we compared the PLA of the single-slice dataset computed from the GT-infected region masks and the ROI segmentations generated by the proposed method. In Fig. 4, the PLA of the GT images and our segmentation images are displayed for each patient. On average, the percentage difference between the PLA obtained with the proposed method and the PLA obtained with GT images is \(\pm 3.89\%\).

Additionally, throughout the dataset IM1s tended to have lower PLAs and IM2s had higher PLAs. To demonstrate the method’s effectiveness in analysis of severity, Fig. 5 shows a diagram of three cases of COVID-19 patients: Fig. 5a IM1 with a low PLA of \(3.33\%\), Fig. 5b IM1 with a medium PLA of \(11.64\%\), and Fig. 5c IM2 with a high PLA of \(67.96\%\). Beginning with the first column, each axial lung CT image of a COVID-19 patient from the dataset is followed by its lung mask in the second column. Lastly, in the third column, the ROI segmentations generated by our method showing the increasing progression of the infection are highlighted in red.

4 Discussion

In this study, we proposed a method for the quantification of lung abnormalities from COVID-19-patients CT images. Based on our analysis, the proposed method is highly effective in calculating accurate PLA for most cases of COVID-19-positive patients. On average, the proposed method generated infection rates within \(\pm 3.89\%\) precision from the GT PLAs. As of our knowledge, this study is the first to specifically validate a thresholding method to generate precise ROI segmentations for COVID-19 CT images. In this work, by evaluating seven thresholding methods: Huang, Ridler, Li, Kapur, Otsu, Yen, and IJ Isodata, we found that the Yen thresholding method generates the most accurate results compared with the other seven methods for IM1 images. In other words, the Yen method performed exceedingly well for the IM1 images by yielding infection rates with the lowest difference of \(1.53\%\) from the GT infection rate out of all the evaluated methods. Additionally, we found that IJ Isodata yields the best segmentations for IM2 images which generate multimodal intensity histograms. The fact that we obtain higher PLA for the IM2 images and lower PLA for the IM1 images may suggest that the two types of pixel intensity distributions (unimodal and multimodal) correspond to two different severity levels of COVID-19 infection. In this view, while the IJ Isodata method seems to be more appropriate for severe infection cases, the Yen method appears to be more appropriate for mild cases. However, further clinical investigations must be done to assess whether different image types correspond to different infection severity rates.

Several methods performed poorly due to large pixel intensities and physical variances of the ROIs. Since the cross-entropies of the images are not convex, Li generated under-segmentations. Ridler and Otsu are cluster-based methods that distinguish natural clusters, and the sharp valley of the gray-level histogram degrades when applied to the dataset. Huang generated several local maximas that resulted in under-segmentations. Kapur did not generate the best threshold value due to the large intensity differences of ROI and background and presence of outlying pixels. Conversely, Yen used the entropic correlation to obtain the maximizing threshold and IJ Isodata used a variation of the iterative intermeans method to compute the best thresholding values.

Additionally, to design a consistent pipeline to apply to both types of COVID-19 CT images we included a system to effectively implement either Yen or IJ Isodata thresholding on a lung segmentation image, as shown in Fig. 1. Yen thresholding mostly applies to mild cases and generates only lung masks, which corresponds to a PLA of 100, when implemented on IM2, as shown in Fig. 2. For this reason the user must first implement Yen thresholding and then either continue to the next step or implement RAG on the binary image if the calculated PLA is less than 100 or implement IJ Isodata thresholding on the lung segmentation CT image if PLA is equal to 100. This system ensures an objective technique to choose between implementing either Yen or IJ Isodata thresholding methods on lung CT images.

Particularly, we were interested to develop an unsupervised, image processing method to facilitate rapid COVID-19 diagnosis. Since annotated medical images of lung segmentation are lacking, unsupervised and semi-supervised methods are in high demand for COVID-19 studies [17]. After further analysis of the method’s potential, as discussed in introduction and results sections, we found that it is also effective in quantitatively analyzing the progression of the disease with information available in lung CT. To further explain, in addition to assisting medical professionals in the rapid diagnosis of COVID-19-positive patients, the proposed method has the potential to facilitate radiologists to quantify severity and visualize disease progression or remission.

A number of studies have established scoring systems useful for the diagnosis and follow-up of COVID-19 patients. For example, Li et al. [12] reviewed the clinical and CT features of 58 mild/ordinary COVID-19 cases and 25 severe/critical cases using a 0–5 thin-section CT scoring system for infection region involvement in each lobe. They found that a common CT finding associated with severe/critical patients has significantly more frequent consolidation compared with mild/ordinary groups. This results in the CT scores of severe/critical patients to be significantly higher than those of mild/ordinary patients. In essence, since the severity rate computed from a chest CT accurately corresponds to the patients’ condition, it may be useful in assisting level-of-care decisions for COVID-19 positive patients.

Additionally, Lessmann et al. [11] developed and validated an Artificial Intelligence algorithm (CORADS-AI) to score the likelihood and extent of COVID-19 on chest CT scans using the COVID-19 Reporting and Data System (CO-RADS) [40] and CT severity score (CTSS). The assigned standardized scores of 367 patients were found to be in good agreement with eight independent observers and generalized well to external data. In another study, Lieveld et al. [13] validated CO-RADS and evaluated its corresponding CTSS with 741 adult patients suspected COVID-19 positive from two tertiary centers in The Netherlands. Since there was a significant association between CTSS and hospital admission (stage 1), ICU admission (stage 2), and 30-day mortality (stage 3), the study findings support the use of CTSS in triage, diagnosis, and management decisions of potential COVID-19-positive patients at hospital emergency departments.

As proven by these studies, the quantified severity of COVID-19 lung abnormalities based on chest CT accurately correlates to patient conditions; therefore, the PLA estimated by our proposed method can potentially help in evaluating the prognosis and assessing treatment methods for rapid care. Indeed, our method gives outstanding performance in generating precise segmentations of lung abnormalities due to COVID-19 that correlate well with the GT PLA. As proven by various studies [12, 13], COVID-19 lung infection severity correlates with the patient condition; similarly, in our study each PLA directly describes the ROI segmentation. Our method could also be used to quantify follow-up CT images of patients since it generates precise PLA from accurate ROI segmentation. Also, when compared with other state-of-the-art methods, the proposed method performed significantly better to generate ROI segmentations. The method produced the highest precision and specificity scores.

Even though the proposed method achieved promising results, there are limitations to this study. First, the lack of COVID-19 datasets including ROI segmentation to use as GT restricted us from further validating this method on a variety of datasets instead of computing data-specific results varying two datasets. Second, since patient demographic information was not provided for either dataset, we were not aware of any underlying conditions which might have contributed to the patients’ lung CT characteristics. Finally, as future work, we consider extending this pipeline to quantify other types of pneumonia lung infections.

5 Conclusion

In this study, we proposed a semi-automatic threshold-based segmentation method to generate lung ROI segmentations from COVID-19 positive patients’ lung CT images. Our results demonstrated the method’ effectiveness in generating accurate ROI segmentations, hence calculating precise PLA to determine COVID-19 severity. Also, given the accurate ROI segmentations obtained, the proposed method may potentially be useful to assess the severity of pneumonia in initial imaging and disease progression or remission in follow-up CT which could assist medical professionals with patient management and prognosis of the disease course. In future experiments, this method could be utilized to develop a fully automatic model with advanced features to assist medical professionals in rapid patient care.