INTRODUCTION

COVID-19, a type of corona virus disease that affects the respiratory tract of human beings was first found in Wuhan, China in December, 2019. As of July 20, 2020, WHO reports that there have been 14 348 858 confirmed COVID-19 cases all over the world which includes 603 691 death reports. The disease spreads very easily from one person to another person through the water droplets that comes out during sneezing, coughing or exhale by the infected person. Almost all the countries are struggling very hard to slow down the rate of COVID-19 infection. Slowing down the infection rate through preventive action is possible only if the actual number of COVID-19 cases in a geographical region is known, which can be possible only if enough number of COVID-19 tests are taken.

The important factors that should be considered in selection of a COVID-19 test is its accuracy in correctly identifying the positive cases, time needed to get the result and cost for undertaking the test. WHO has recommended RT-PCR test as a gold standard for testing COVID-19. The advantage of this test is that it is able to achieve 100% accuracy in disease detection. But the drawback is its cost and time taken to get the result. Computed tomography (CT) scan of chest can be used for initial screening of COVID-19 patients [1]. The advantage of this method is the rapid diagnostic rate. But the accuracy is low compared to RT-PCR test. Loop mediated isothermal amplification (LAMP) method [2] works with a principle of amplifying the target genetic material and can produce result within an hour. But this method produces less accuracy hence not widely used. Antibody tests are simple in which the antibodies developed by the immune system was used for diagnosing COVID cases. But the problem is its low accuracy and is good at identifying persons who were infected previously by COVID but not the currently affected persons.

Deep learning was used to differentiate COVID-19 from other Pneumonia cases with an AUC of 0.87 by [3]. Social mimic optimization was used by [4] to extract efficient features from X-ray images with the help of CNN model for initial feature extraction to detect COVID cases. A customized deep Convolutional Neural Network called COVID-net was developed by [5] to detect the disease from X-ray images. X‑ray and CT scan images were used to detect the disease by [6] using convolutional neural network with transfer learning approach. The results show that accuracy was greater when CT scan images were used. DarkNet model was used in [7] to detect corona and non-corona cases with the help of X-ray images. The accuracy obtained was 87% by the DarkNet model when tried for multiple classes such as COVID, no findings, and Pneumonia.

Fractional Multichannel Exponent Moments (FrMEMs) method was used by [8] for feature extraction from chest X-ray images and classification of COVID and non-COVID cases using machine learning techniques. A deep Convolutional Neural Network based transfer learning model was used by [9] for detection of COVID patients from chest X-ray images. Deep learning-based system that uses VB-net neural network was developed by [10] for identification and quantification of infected region in the CT images. 2D and 3D deep learning models was used by [11] for categorization of Coronavirus cases and non-coronavirus cases from CT images. Transfer learning approach in which several pre trained CNN models were used by [12] for classification of COVID and non-COVID cases using chest X-ray dataset.

Some of the important problems faced by existing approaches in detecting COVID-19 patients are listed below.

1. Respiratory tract specimen has to be collected by the healthcare professionals. Nasopharyngeal swab sampling is one of the commonly used method in which nurse has to be in close contact with the patient [13]. This may increase the risk of cross infection.

2. RT-PCR kits suggested by WHO for testing COVID cases are costly and the quantity of kits available in developing countries are not sufficient for the large population. Hence there is a need for developing cost effective solutions for testing [14].

3. The sensitivity of rapid antigen tests done are not in such a way that it alone can be used for initial screening [15].

4. The time delay in getting the test result will lead to delay in tracing the contacts of the affected person with another healthy person [16].

The objective of the proposed research work is to perform a comparative analysis of the effectiveness of using various deep learning and machine learning approaches in detecting the COVID affected patients with the help of chest X-ray images. A comprehensive outline about the various tests used for identifying the COVID positive cases and their merits, demerits are discussed in introduction section. The dataset used and the methodology followed is described in Section 2 followed by experimental results in Section 3. Discussion on the performance of models considered with the help of evaluation metrics is given in Section 4 followed by conclusion derived from the research work in Section 5.

1 MATERIALS AND METHODS

a. Dataset

The data used for this research work was taken from Kaggle (https://www.kaggle.com/prashant268/chest-xray-covid19-pneumonia). The dataset contains the train and test directories with a total of 5144 and 1288 chest X-ray images, respectively. The train directory contains chest X-ray images of 1266 healthy persons, 3418 Pneumonia patients, and 460 COVID patients. The test directory contains chest X-ray images of 317 healthy persons, 855 Pneumonia patients, and 116 COVID patients. A sample chest X-ray image of healthy person, Pneumonia patient, and COVID patient are shown in Fig. 1.

Fig. 1.
figure 1

Sample images of healthy, Pneumonia, and COVID patient.

b. Methodology

A total of 6432 images of chest X-rays were used for the research work. Jupyter notebooks that are hosted by Google Colab (Google Colaboratory offers 12GB NVIDIA Tesla K80 GPU that can be used up to 12 h continuously for free) was used for the implementation of the deep learning and machine learning algorithms. The proposed research requires analysis of chest X-ray images which can be done faster using a GPU than a CPU. The flow diagram of the proposed work is shown in Fig. 2.

Fig. 2.
figure 2

Flow diagram of proposed work.

(i) Pre-processing. The images taken from the dataset are of different sizes and hence cannot be processed effectively by forthcoming algorithms. Henceforth, all the images are resized to a dimension of 224 × 224. Moreover, all the RGB images read are normalized by dividing it with 255, so that all the image intensity values are brought to a common range of [0 to 1].

(ii) Train, validation, and test split. In order to validate the model constructed, the training dataset was split with 80% for constructing the model and the remaining 20% for validation of the constructed model. The test dataset was not shown to the model during the training process and was used to verify the actual performance of the constructed model. The 80% training data was further divided into 5 subsets in order to perform fivefold cross validation.

(iii) Construction of Convolutional Neural Network (CNN) model. The CNN model was constructed with a sequence of following layers:

• input layer,

• convolutional layer,

• pooling layer,

• dense/fully connected layer.

Input layer. The image given to the input layer is of size 224 × 224 × 3, where 224 × 224 represents the X and Y dimensions and ‘3’ represents the RGB channels.

Convolutional layer. The CNN algorithm performs feature extraction from the input with the help of kernel / filter. The filter was made to stride over the entire image and a dot product was made between the portion of image covered by the filter and the filter itself. The multiplied values were summed up and outputs a single value at each stride. A 2D output called feature map was obtained as a result of this process when worked with a single filter. Rectified Linear Unit (ReLU) activation function was used in order to non-linearly transform the input received in a node. The feature map values are calculated according to the following formula, where the input image is denoted by f and the filter by h. The indexes of rows and columns of the result matrix are marked with m and n, respectively.

$$\begin{gathered} G[m,n] = (f \times h)[m,n] \\ = \mathop \sum \limits_j \,\mathop \sum \limits_k \,h[j,k]f[m - j,n - k]. \\ \end{gathered} $$

Pooling layer. This layer is added after convolutional layer in order to avoid the variance that may result in the feature map when the input is slightly shifted or rotated. Max pooling was used in the research work in which the maximum value covered by the pooling filter in the feature map was retained. Pooling layer also helps to minimize the number of parameters and computations required. A dropout layer was added after pooling to avoid overfitting that may happen during training.

Dense/fully connected layer. The output of the pooling layer was flattened to a 1D vector and given as input to fully connected layer. This layer updates the weights in such a way that it was able to predict the probabilities of each class to which the input belongs. Softmax activation was used at the final dense layer since probabilities of each class need to be found. The Softmax activation function is given by

$$Softmax({{x}_{i}}) = \frac{{\exp ({{x}_{i}})}}{{\sum\limits_j {{\text{exp}}({{x}_{j}})} }},$$

where \({{x}_{i}}\) is the ith value of the input vector x which has j elements. Summary of the CNN model used in the research work is given in Table 1.

Table 1. Summary of CNN model

The CNN model was compiled with categorical cross entropy loss using Adam optimizer. The Adam version of stochastic gradient descent is used to fit the deep learning model so that due to adaptive learning rates the model converges fast. The model was trained with a batch size of 128. The model trained was validated with the data allotted for validation. Finally, the constructed model was evaluated using test dataset.

(iv) Transfer learning model using VGG16. Transfer learning is a task in which a deep learning model which was trained on a different dataset will be used to solve a classification task of a new problem. VGG16 is a deep learning model which was trained with more than million images to categorize 1000 objects. The proposed research work made use of the pre-trained VGG16 deep learning model and its corresponding weights for classification of healthy, pneumonia and COVID patients. The advantage of this approach is that it reduces the time required to construct a new model from scratch since the weights are readily available in the pretrained model. In the proposed research work, the last fully connected layer of VGG16 which contains 1000 nodes to categorize 1000 objects are replaced with 3 nodes in order to classify the input X-ray image as healthy, Pneumonia or COVID patient. The model summary of VGG16 is shown in Table 2.

Table 2. Summary of VGG16 model

(v) Machine learning techniques with CNN as feature extractor. Convolutional Neural Network was used for extraction of features from the input image. The extracted features were used for classifying the input image into three classes: Healthy, Pneumonia and COVID using the following machine learning techniques.

Support Vector Machine (SVM). SVM is a simple and powerful machine learning technique in which the objective of the algorithm is to find a hyperplane that better separates the different classes. The separating plane should be in such a way that it maximizes the distance between the different classes. If the classes are non-linearly separable, then SVM transforms the lower dimensional data to a higher dimension using a non-linear Kernel function such as Radial Basis Function, Polynomial, and Sigmoid and then performs classification.

Gradient Boosted Decision Tree (XGBoost). This ensemble learning algorithm works in such a way that it adds new model to the existing model so that the residual errors made by the previous models are rectified. Decision trees are considered as weak learners because these algorithms work in a greedy manner to find a best split of feature with the help of Gini or entropy measures. A new decision tree is sequentially added to rectify the error made previously by a decision tree in correct classification.

(vi) Evaluation of constructed model. The models constructed using deep learning and machine learning algorithms are evaluated based on the following metrics with the help of test dataset that was not seen by the model before. The evaluation metrics computed with the help of confusion matrix are:

$${\text{Accuracy}},\quad Acc = \frac{{True~Positive + True~Negative}}{{True~Positive + True~Negative + False~Positive + False~Negative}};$$
$${\text{Precision}},\quad P = \frac{{True~Positive}}{{True~Positive + False~Positive~}};$$
$${\text{Recall}},\quad R = \frac{{True~Positive}}{{True~Positive + False~Negative~}};$$
$${\text{F1-Score}},\quad F1 = 2\frac{{P \times R}}{{P + R}}.$$

Five-fold cross validation is performed in order to estimate the actual performance of the model.

2 EXPERIMENTAL RESULTS

Experiments were carried out to classify the input images into three categories: healthy, Pneumonia, and COVID. The images obtained as input from the dataset is pre-processed and is shown in Fig. 3. The training dataset is divided into 5 folds such that at each fold 20% of training data was used for validation and remaining 80% for training the model. This step is repeated five times with different set of image samples for training and validation.

Fig. 3.
figure 3

Pre-processed chest X-ray images.

The validation and training accuracy obtained for Convolutional Neural Network (CNN) model for 50 epochs is shown in Fig. 4. The validation and training loss for the CNN model is shown in Fig. 5. The results show that as the number of epochs increases the training accuracy also increases. From the Figs. 4 and 5 it can also be viewed that there may be a chance for overfitting since training loss continue to decrease and training accuracy increase, but there is no significant improvement in the validation accuracy after 15 epochs.

Fig. 4.
figure 4

Validation and training accuracy for CNN model.

Fig. 5.
figure 5

Validation and training loss for CNN model.

The validation and training accuracy obtained for transfer learning model (VGG Net) for classification of healthy, Pneumonia, and COVID is shown in Fig. 6. The validation loss and training loss for the VGG Net model is shown in Fig. 7. A comparative graph on validation and training–loss, accuracy of CNN, and VGG Net model is shown in Fig. 8.

Fig. 6.
figure 6

Validation and training accuracy for transfer learning model.

Fig. 7.
figure 7

Validation and training loss for transfer learning model.

Fig. 8.
figure 8

Comparison of validation and training loss and accuracy of CNN and VGG model.

The normalized confusion matrix for fold 3 of CNN model, transfer learning model, SVM model with CNN feature extractor and XGBoost model with CNN feature extractor is shown in Figs. 9–12, respectively. Confusion matrices of the above-mentioned models without normalization is shown in Figs. 13–16.

Fig. 9.
figure 9

Normalized confusion matrix of CNN model.

Fig. 10.
figure 10

Normalized confusion matrix of transfer learning model.

Fig. 11.
figure 11

Normalized confusion matrix of CNN + SVM model.

Fig. 12.
figure 12

Normalized confusion matrix of CNN + XGBoost model.

Fig. 13.
figure 13

Confusion matrix of CNN model.

Fig. 14.
figure 14

Confusion matrix of transfer learning model.

Fig. 15.
figure 15

Confusion matrix of CNN + SVM model.

Fig. 16.
figure 16

Confusion matrix of CNN + XGBoost model.

The K-fold cross validation result (K = 5) for classification of input chest X-ray image into one of the categories as healthy, Pneumonia or COVID using CNN model is shown in Table 3. Similarly, the cross-validation results for transfer learning model, CNN + SVM model and CNN + XGBoost models are shown in Tables 4–6, respectively. The average precision, recall, F1-score and accuracy of the four different models constructed is also shown in the tables mentioned before.

Table 3. Precision, recall, F1-score, and accuracy of CNN model
Table 4. Precision, recall, F1-score, and accuracy of transfer learning model
Table 5. Precision, recall, F1-score, and accuracy of CNN + SVM model
Table 6. Precision, recall, F1-score, and accuracy of CNN + XGBoost model

DISCUSSION

Experimental results show that the accuracy of correctly predicting an input image as healthy, pneumonia or COVID by the constructed models is similar i.e., around 95% except for transfer learning model VGG Net (92.51%). The reason may be that the VGG Net model was originally developed to categorize 1000 different objects and hence it might not have the weights tuned to classify chest X-ray images. Hence in order to improve the performance of the transfer learning model (VGG Net), the weights can be made trainable and retaining only the network structure. In identification of whether a person is affected with COVID or not. Recall is the most important evaluation measure. The reason is that recall says about the percentage of actual positive cases identified by the model. High precision indicates low false positive rate and high recall indicate low false negative rate. The XGBoost model which uses feature extracted using CNN model outputs better recall compared to other models. SVM model with features extracted using CNN model outputs the best precision, F1-score and accuracy. On comparison with the DarkNet model developed by [7] which was able to produce 87.02% for multi class classification of healthy, pneumonia and COVID cases, the experimental results show that machine learning models with deep learning algorithm for feature extraction can perform better. A comparison of performance metrics of CNN, VGG Net, CNN + SVM, and CNN + XGBoost models is shown in Table 7.

Table 7. Comparison of performance metrics of CNN, VGGnet, CNN + SVM, and CNN + XGBoost models

CONCLUSIONS

Comparative analysis of the precision, recall, F1‑score and accuracy of four different models: convolutional neural network model, Transfer learning model (VGG Net), Support Vector Machine with features extracted using Convolutional Neural Networks, and XGBoost with CNN features were performed to classify the input chest X-ray image as healthy, pneumonia, and COVID. The results show that machine learning models with features automatically extracted using a deep learning model like CNN performs better with CNN+SVM giving an average precision, recall, F1-score and accuracy of 95.27, 94.52, 94.94, and 95.81%, respectively. Hence, this method of identifying COVID patients with chest X-ray images can be used for initial screening. The performance of the model constructed can be further improved by increasing the size of dataset, performing data augmentation and incorporating methods to handle data imbalance.