1 Introduction

Reports of Wuhan Municipal Health Commission, China, have mentioned the coronavirus evolution on Dec 31st, 2019. It was initially named SARS-CoV-2. Later on Jan 12th, 2020, World Health Organization (WHO) renamed this disease like the 2019 novel coronavirus (2019-nCoV). On Jan 30th, 2020, a health emergency was declared by WHO. Upon subsequent discussions on this disease outbreak, it was renamed coronavirus disease 2019 (COVID-19) on Feb 11th,2020. This COVID-19 pandemic has tremendously affected worldwide, and it faces an incredible threat to public health, food systems, psychology, and workplace safety.

According to the survey, COVID-19 is caused by the SARS-CoV-2 virus, which spreads from person to person, especially when they are in immediate contact. Furthermore, when people cough, sneeze, speak, sing, or breathe loudly, the virus can spread from an infected person to close contacted people. To deal with these critical pandemic situations, the government has promoted physical distancing by limiting close face-to-face contact with others. Further to reduce the disease spread, the government has established cantonment zones where positive cases have considerably increased. Hence it is highly essential to alarm the social organizations and government organizations to avoid the spread of disease to other regions that are not affected. Social media has taken an active step in developing contact with and through various sectors of people across the globe. Especially in critical times, Twitter content individuals can interact with each other during the lockdown period, update their knowledge about the disease, and take the necessary steps to get rid of the disease outbreak. During the lockdown era, precautions like physical separation, wearing a mask, keeping rooms adequately aired, avoiding crowds, washing hands, and coughing into a tissue or bent elbow were adopted. This information was updated to the public consistently by Twitter posts.

Table 1 Different COVID-19 disease related tweets

The COVID-19 pandemic has had a negative impact on the world in a variety of areas, including public health, tourism, business, economics, politics, education, and people’s lifestyle. In the last two years, researchers have paid more attention to COVID-19. Some researchers have concentrated on Natural Language Processing [1,2,3], which includes disease symptoms, medical reports of COVID patients, patient health conditions, information about pandemic preventions and precautions, and social media messages/tweets, among other things. Other researchers concentrate on image processing [4,5,6], which includes patient X-ray analysis to confirm whether the COVID-19 is positive or negative. During the COVID-19 outbreak, respiratory analysis research became popular [7,8,9,10]. Deep learning models were used to categorize the respiratory sounds of patients in this study, yielding better results. Mathematical researchers are more focused on COVID-19 statistical reports [11,12,13,14], such as the number of cases identified, the number of deaths, and the number of patients recovered, among other things.

Table 2 Summary for text classification based papers
Table 3 COVID-19 fake tweets detection papers summary

Twitter posts contain both fake and real news (source: COVID-19 FakeNews dataset) as shown as Table 1. In a real sense, all real news may not be informative. For example, let us consider an accurate report containing some predictable content along with COVID-19 disease information. Only COVID-19 related content brings much hype to the tweet posted in public, and hence it is considered informative. In our proposed work, our objective is to highlight such informative content from the tweets and predict the severity of disease in a particular location based on geolocation, age, gender, and time. In detail, what sort of gender and where the outbreak of illness tends to be serious is identified within a particular period.

The following are the highlights of this paper.:-

  1. 1.

    The ensemble transformer model with fusion vector multiplication technique was addressed.

  2. 2.

    The CT-BERT and RoBERTa transformers are utilised in a combination.

  3. 3.

    The FBEDL paradigm produces significant outcomes.

  4. 4.

    The dataset is based on the most recent COVID-19 labelled English fake tweets collection.

  5. 5.

    The model has a 98.93% F1-score and a 98.88% accuracy in identifying fake tweets.

Following on from the discussion of related work in Sect. 2, the Sect. 3 delves into methodology and data, following with a discussion of the experimental results in Sect. 4. In Sect. 4, we examine the results and look at the errors, and Sect. 5 bring the paper to a conclusion.

2 Related work

The authors Easwaramoorthy et al. [15] illustrated the transmission rate in both times by comparing and predicting the epidemic curves of the first and second waves of the COVID-19 pandemic. Kavitha et al. [16] have investigated the duration of the second and third waves in India and forecasts the outbreak’s future trend using SIR and fractal models. Gowrisankar et al. [17] have explained multifractal formalism on COVID-19 data, with the assumption that country-specific infection rates exhibit power law growth.

Minaee et al. [18] present a detailed quantitative analysis of over 100 DL models proposed after over 16 popular text classification datasets. Kadhim [19] automatically classified a collection of documents into one or more known categories. Discussed weighing methods and comparison of different classification techniques. Aggarwal and Zhai [20] have presented a survey of a broad range of text classification algorithms and have talked about classification in the database, machine learning, data mining and information retrieval communities, as well as target marketing, medical diagnosis, news group filtering, and document organisation. Kowsari et al. [21] have discussed different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods along with real-world problems. De Beer and Matthee [22] have pointed various language approaches like Topic-Agnostic, machine learning and knowledge based.

Uysal and Gunal [23] have discussed the impact of preprocessing on text classification in terms of classification accuracy, text domain, dimension reduction and text language. Wenet al. [24] employ a clarity map by using two-channel convolutional network and morphological filtering. The fusion image is created by combining the clear parts of the source images. Castillo Ossa et al. [25] have developed a hybrid model that combines the population dynamics of the SIR model of differential equations with recurrent neural network extrapolations. Wiysobunriet al. [26] have presented an ensemble deep learning system based on the Max (Majority) voting scheme with VGG-19, DenseNet201, MobileNet-V2, ResNet34, and ResNet50 for the automatic detection of COVID-19 disease using chest X-ray images.

Table 2, The identification and classification of tweets related to disaster and current disease pandemic COVID-19 have been discussed. It has covered the summary of text mining and sentiment analysis-based papers based on AI techniques. In this table, We have represented the published year, author with the cited article, the main content in that paper, and model or approach used in that paper. Machine Learning models have explained Naive Bayes and k-NN classifier for text classification with the help of the top-most frequency word features and low-level lexical features. Transformer pre-trained deep learning models CT-BERT, BERTweet, RoBERTa, and other models outperformed traditional machine learning models and neural networks (CNN)

Table 3 has explained the summary for COVID-19 fake news detection-based papers. As shown in the table, the authors have discussed the automatic fake news detection AI models (on the different dataset) with performance metrics as F1-score and accuracy. Transformer model papers have achieved good results than other Artificial Intelligence models.

Fig. 1
figure 1

Overview of the proposed (FBEDL) ensemble deep learning model

3 Framework methodology

During the COVID-19 epidemic, the FBEDL model detects fake COVID-19 tweets with an accuracy of 98.88% and an F1-score of 98.93%. Figure 1 depicts a high-level overview of the FBEDL model. The following subsections go over the FBEDL model in greater depth: The FBEDL model’s data collection and it’s pre-processing are described in Section A and B. Section C and D describes the pre-trained deep learning classifiers and section E had discussed fusion multiplication technique.

3.1 Tweets collection and data preprocessing

In the COVID-19 pandemic (2020), organizers provided the COVID-19 fake news English dataset [38] with the id, tweet, label (“Fake” and “Real”) in the tsv format. Data is collected from the organizers of the Constraint@AAAI2021 workshop [39]. The organisers considered only textual English contents and captured a generic corpus linked to the coronavirus epidemic using a predetermined list of ten keywords including: COVID-19, cases, coronavirus, deaths, tests, new, people, number and total. The attained tweets are preprocessed using the methods described below.

3.2 Preprocessing of data

In Twitter information, there is a lot of noise. As a result, pre-trained models may benefit from data preparation. The following data preprocessing steps were inspired primarily by [40].

  1. 1.

    Remove all English stop words and non alphanumeric characters.

  2. 2.

    Remove tabs, newlines and unnecessary spaces.

  3. 3.

    All links in the tweets (shown as HTTPURL) are replaced with URL.

Because the user handles in the tweets had already been replaced by @USER, no further processing was required.

3.3 RoBERTa

RoBERTa [41] improves on BERT by deleting the next-sentence pretraining target and train with considerably learning rates and huge mini-batches, as well as modifying important hyper parameters. Google announced transformer method, which has improved the NLP (Natural Language Processing) systems using encoder representations. RoBERTa enhanced the efficiency than BERT, which increased the benefit of the masked language modelling objective. Furthermore, when compared to the base BERT model, RoBERTa is explored with higher magnitude data.

RoBERTa is a retraining of BERT with improved training methodology, 1000% more data, and compute power. So it outperforms both BERT and XLNet. But generally, the text is derived from all sources of text (not only tweets).

For the given COVID-19 fake dataset, the model has trained using various hyperparameter combinations (learning rate and batch size). The four metric parameters used to evaluate the results obtained for each combination are accuracy, recall, precision and F1-score. This model has been trained on the COVID-19 English fake dataset with batch sizes of 8, 16, and 32. However, the model performs well when the batch size is 8 and the learning rate is 1.12e−05 as shown in Table 4. This results may vary from dataset to dataset. Finally, RoBERTa’s performance measures are accuracy of 98.55, F1-score of 98.62, recall of 98.84, and precision of 98.40, all of which improves the proposed FBEDL model’s performance.

Table 4 RoBERTa results have obtained using the test data set

3.4 CT-BERT

CT-BERT (COVID-Twitter-BERT) [40], a recent transformer based model, which has trained on a massive corpus of Twitter tweets on the issue of current on going COVID-19 outbreak. This model shows a better improvement of 05–10% when compared to its basic model, BERT-LARGE. The most substantial improvements have been made to the target domain. CT-BERT as well as other pretrained transformer models are trained on a specific target domain and can be used for a variety of NLP tasks, such as mining and analysis. CT-BERT was designed with COVID-19 content in mind.

Covid Twitter-BERT includes domain (COVID-19) as well as specific information, and it can better handle noisy texts like tweets. CT-BERT performs similarly well on other classification problems on COVID-19-related data sources, particularly on text derived from social media platforms.

For the given COVID-19 fake dataset, this model has trained using various hyperparameter combinations (batch size and learning rate). As indicated in Table 5, the best results were obtained when the batch size was equal to 8 and the learning rate was equal to 1.02e-06. The CT-BERT model’s results may vary from dataset to dataset. Finally, CT-performance BERT’s metrics are accuracy of 98.22, F1-score of 98.32, recall of 99.02, and precision of 97.62, all of which improve the performance of the proposed FBEDL model.

3.5 Fusion vector multiplication

To overcome the disadvantages of CT-BERT and RoBERTa models, an ensemble model is introduced. For concatenation of output for internal models, fusion techniques are more popular. These techniques include max, min, mean, avg, sum, difference, and product probability values.

The probability vector of a tweet is calculated using the fine-tuned RoBERTa model and the CT-BERT model. The multiplicative fusion technique [42] performs element-wise multiplication to combine both (array of the last layer) probability vectors into a single vector [27]. The predicted tweet label is based on the generated vector.

$$\begin{aligned} A= & {} \begin{bmatrix} a_1 &{}\quad b_1 \\ a_2 &{}\quad b_2 \\ . &{}\quad . \\ . &{}\quad . \\ a_n &{} b_n \end{bmatrix} \end{aligned}$$
(1)
$$\begin{aligned} B= & {} \begin{bmatrix} c_1 &{}\quad d_1 \\ c_2 &{}\quad d_2 \\ . &{}\quad . \\ . &{}\quad . \\ c_n &{}\quad d_n \end{bmatrix} \end{aligned}$$
(2)

where \(a_i+b_i=1\) and \(c_i+d_i=1\)

where \(a_i\) and \(b_i\) are the probabilities of fake and real news of \(i\mathrm{th}\) tweet from RoBERTa model respectively and \(c_i\) and \(d_i\) are the probabilities of fake and real news of \(i\mathrm{th}\) tweet from CT-BERT model respectively. Consider A is the probability vector (last layer) of the RoBERTa and B is the probability vector (last layer) of the CT-BERT. The fusion vector multiplication of A, B is FVM(AB)= AB (single vector)

$$\begin{aligned} \mathrm{FVM}(A,B)= & {} \begin{bmatrix} a_1*c_1 &{}\quad b_1*d_1 \\ a_2*c_2 &{}\quad b_2*d_2 \\ . &{}\quad . \\ . &{}\quad . \\ a_n*c_n &{}\quad b_n*d_n \end{bmatrix} \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{FBEDL}\_\mathrm{Test}(\mathrm{tweet}_i)= & {} {\left\{ \begin{array}{ll} \mathrm{Fake}, &{} \hbox { if}\ a_i*c_i>b_i*d_i\\ \mathrm{Real}, &{} \hbox { if}\ a_i*c_i<b_i*d_i \\ \mathrm{Neutral}, &{} \text {otherwise} \end{array}\right. } \nonumber \\ \end{aligned}$$
(4)

where \(a_i\), \(c_i\) are the first column \(i\mathrm{th}\) elements of A and B respectively.

where \(b_i, d_i\) are the second column \(i\mathrm{th}\) elements of A and B respectively.

From Eq. (4), The following possible observations are:

  1. 1.

    if \(a_i*c_i> b_i*d_i\) then the proposed model predicts the tweet as “Fake”.

  2. 2.

    if \(a_i*c_i< b_i*d_i\) then the proposed model predicts the tweet as “Real”.

  3. 3.

    Our proposed model is trained and tested by Fake news COVID-19 dataset (ie Fake, Real). In this case Neutral case is very rare to occur.

figure a
Table 5 CT-BERT results from the test dataset

4 Results and analysis

All of our experiments in this paper have been completed using the Google Colaboratory (CoLab) interface and the Chrome browser. This section covers data sets, model parameter explanations, and performance evaluations. Furthermore, the proposed solution is evaluated in comparison to existing methods. The Huggingface package [43] has used in the implementation through Python. The “ktrain” package [44] has been used to fine-tune our baseline models.

4.1 Fake news COVID-19 dataset

In the COVID-19 outbreak (2020), Constraint@AAAI2021 workshop organizers provided the COVID-19 fake news English dataset [38] with the id, tweet, label (“Fake” and “Real”) in the form of tsv. The above dataset, which contains fake news collected from tweets, instagram posts, facebook posts, press releases, or any other popular media content, has a size of 10,700 records. Using the Twitter API, real news was gathered from potential real tweets. Official accounts such as the Indian Council of Medical Research (ICMR), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), Covid India Seva, and others may have real tweets. They give valuable COVID-19 information such as vaccine progress, dates, hotspots, government policies, and so on.

Table 6 COVID-19 fake english data set details

The dataset is divided into three sections: 60% for train, 20% for validation, and 20% for testing. Table 6 illustrates the distribution of all data splits by class. The dataset with 52.34% of the samples containing legitimate news and 47.66% including fraudulent news.

Table 7 Machine learning models: results from the test data set
Table 8 Deep learning models: results from the test data set

4.2 Experiment setup

The outcome of the model is dependent on the use of a classifier. As a result, the following classifiers are used to conduct various tests.

  1. 1.

    CT-BERT transformer model.

  2. 2.

    RoBERTa transformer model.

  3. 3.

    Fusion vector multiplication technique.

4.3 Performance measures

The model performance is evaluated using the following parameters: Precision, F1-score, Accuracy, and Recall. These metrics have been depended on the confusion matrix.

4.3.1 Confusion matrix

The performance of a classification model has been evaluated by an \(N \times N\) matrix, where N indicates number of target classes. For binary classification N is equals to 2, so a 2 \(\times \) 2 matrix containing four values, as shown below.

True Positive (TP): the expected and actual values are identical. The model actual result was positive and anticipated a positive value. True Negative (TN): the expect value comparable to the real value. The model actual value is negative and the anticipated a negative value. False Positive (FP): The expected value has incorrectly predicted. Although the actual number is negative, the model projected that it would be positive.

False Negative (FN): the expected value is incorrectly predicted. Although the actual number was positive, the model predicted that it would be negative.

4.4 Performance analysis

There are three subsections in this section. The performance of the ML (machine learning) models are compared in the first subsection. In the second subsection, the performance of the deep learning models are compared. The proposed model’s performance is compared to existing approaches in the third subsection.

Fig. 2
figure 2

Deep learning models performance in terms of evaluation metrics

4.4.1 Performance metrics in machine learning models

The Constraint@AAAI2021 workshop organisers have provided baseline results for the English COVID-19 fake dataset. Logistic Regression, Decision Tree, Gradient Boost and SVM have been considered for baseline results for predicting fake news tweets. The Support Vector Machines (SVM) classifier has achieved an accuracy of 93.32%, F1-score of 93.32%, precision of 93.33%, and recall of 93.32%. As a result, the SVM classifier outperformed all metrics values as shown as Table 7.

4.4.2 Deep Learning models performance metrics evaluation

The transformer pretrained deep learning models like DistliBERT, ALBERT, BERT, BERTweet, RoBERTa and CT-BERT have been considered in this subsection. The MAX_LENGTH(tweet) has been fixed to 143 in order to train the model’s better with the English language corpus. The tweets that are being tested are in English. For training the models and learning the rate of values \(1\mathrm {e}{-4}\), \(1\mathrm {e}{-5}\), \(1\mathrm {e}{-6}\), \(1\mathrm {e}{-7}\), \(1\mathrm {e}{-8}\) and tested with batch sizes of 8, 16, and 32.

CT-BERT and RoBERTa have occupied first two places as shown in the Table 8 than the BERTweet, BERT, DistilBERT, and ALBERT models as exhibit in Fig. 2a–d. They outperformed the other competitors in the race, according to the experiment results, because they had higher TP (true positive) and FN (false negative) values. CT-BERT performed well because it has pre-trained on a large corpus of COVID-19-related Twitter messages.

Fig. 3
figure 3

Performance: proposed model versus state-of-art models

4.4.3 Ensemble deep learning models performance metrics

This segment examined modern ensemble deep learning models. The ensemble model with BiLSTM + SVM + Linear regression + Navaiy Baiyes + combination of LR+NB has obtained F1-score as 94% and accuracy as 93.90%. The combination of XLNet and LDA technique has given F1-score as 96.70% and accuracy as 96.60%. The ensemble model using CT-BERT and hard voting technique has given better performance than other ensemble models.

4.5 Performance comparison: proposed model versus ensemble deep learning techniques

The proposed model (FBEDL) is evaluated in terms of accuracy and F1-score to the machine learning models, deep learning models, and ensemble models. In comparison to existing models, our FBEDL model attained an F1 score of 98.93% and an accuracy of 98.88%, as shown in Tables 9 and 10 as well as Fig. 3. This indicates that the model was successful in distinguishing fake tweets/News about the COVID-19 disease outbreak.

Table 9 Performance comparison: proposed model versus existing models
Table 10 FBEDL model results from the test dataset

5 Conclusion

The principal goal of this work is to demonstrate how to use a novel NLP application to detect real or fake COVID-19 tweets. The conclusions of the paper assist individuals in avoiding hysteria about COVID-19 tweets. Our findings may also aid in the improvement of COVID-19 therapies and public health measures.

In this study, a fusion technique-based ensemble deep learning model is used to detect fraudulent tweets in the ongoing COVID-19 epidemic. The use of fusion vector multiplication is designed to help our model become more entrenched. We tried various deep learning model combinations to improve model performance, but COVID-Twitter BERT and RoBERTa deep learning models have achieved state-of-art performance. With 98.88% accuracy and a 98.93% F1-score, the proposed model outperforms traditional machine learning and deep learning models.

One of the disadvantages of our proposed model is that RoBERTa and CT-BERT are pre-trained models with a lot of memory for corpus training (657MB and 1.47GB, respectively). When compared to machine learning models, the models’ time complexity is likewise relatively high. To boost model performance, we plan to apply data compression techniques

This research focuses on COVID-19 pandemic English fake tweets for the time being. Our method may be able to predict fake tweets about diseases that are similar in the future. We can improve our results in the future by training other combinations on a sizeable COVID-19 dataset using alternative transformer-based models.