Abstract
The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic. In this paper, we present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English. In particular, we propose our approach using the transformer-based ensemble of COVID-Twitter-BERT (CT-BERT) models. We describe the models used, the ways of text preprocessing and adding extra data. As a result, our best model achieved the weighted F1-score of 98.69 on the test set (the first place in the leaderboard) of this shared task that attracted 166 submitted teams in total.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. arXiv preprint arXiv:2005.00033 (2020)
Alkhalifa, R. et al.: QMUL-SDS at CheckThat! 2020: determining COVID-19 tweet check-worthiness using an enhanced CT-BERT with numeric expressions. arXiv preprint arXiv:2008.13160 (2020)
Apuke, O.D., Omar, B.: Fake news and COVID-19: modelling the predictors of fake news sharing among social media users. Telematics Inform. 56, 101475 (2020)
Elsayed, T., et al.: Overview of the CLEF-2019 CheckThat! lab: automatic identification and verification of claims. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 301–321. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_25
Buda, J., Bolonyai, F.: An ensemble model using N-grams and statistical features to identify fake news spreaders on Twitter. In: CLEF (2020)
Chernyaev, A., Spryiskov, A., Ivashko, A., Bidulya, Y.: A rumor detection in Russian tweets. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 108–118. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_11
Cui, L., Lee, D.: CoAID: COVID-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885 (2020)
Da San Martino, G. et al.: SemEval-2020 task 11: detection of propaganda techniques in news articles. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1377–1414 (2020)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Elhadad, M.K., Li, K.F., Gebali, F.: COVID-19-FAKES: a Twitter (Arabic/English) dataset for detecting misleading information on COVID-19. In: Barolli, L., Li, K.F., Miwa, H. (eds.) INCoS 2020. AISC, vol. 1263, pp. 256–268. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57796-4_25
emoji 0.6.0. https://pypi.org/project/tweet-emoji/. Accessed 14 Dec 2020
g2tmn at Constraint@AAAI2021 - COVID19 fake news detection in English. https://github.com/oldaandozerskaya/covid_news. Accessed 14 Dec 2020
Jwa, H., et al.: exBAKE: automatic fake news detection model based on bidirectional encoder representations from transformers (BERT). Appl. Sci. 919, 4062 (2019)
Kar, D. et al.: No rumours please! a multi-indic-lingual approach for COVID fake-tweet detection. arXiv preprint arXiv:2010.06906 (2020)
Kim, D., Graham, T., Wan, Z., Rizoiu, M.-A.: Analysing user identity via time-sensitive semantic edit distance (t-SED): a case study of Russian trolls on Twitter. J. Comput. Soc. Sci. 2(2), 331–351 (2019). https://doi.org/10.1007/s42001-019-00051-x
Kruspe, A. et al.: Cross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic. arXiv preprint arXiv:2008.12172 (2020)
Kula, S., Choraś, M., Kozik, R.: Application of the BERT-based architecture in fake news detection. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds.) CISIS 2019. AISC, vol. 1267, pp. 239–249. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57805-3_23
Kumar, P., Singh, A.: NutCracker at WNUT-2020 Task 2: robustly identifying informative COVID-19 Tweets using ensembling and adversarial training. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 404–408 (2020)
Liu, Y. et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 63–70 (2002)
Loshchilov I., Hutter F.: Fixing weight decay regularization in Adam. arXiv preprint arXiv:1711.05101 (2017)
Mazza, C., et al.: A nationwide survey of psychological distress among Italian people during the COVID-19 pandemic: immediate psychological responses and associated factors. Int. J. Environ. Res. Public Health 179, 3165 (2020)
Mikhalkova, E., et al.: UTMN at SemEval-2020 Task 11: a kitchen solution to automatic propaganda detection. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1858–1864 (2020)
Morio, G., et al.: Hitachi at SemEval-2020 Task 11: an empirical study of pre-trained transformer family for propaganda detection. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1739–1748 (2020)
Moscadelli, A., et al.: Fake news and COVID-19 in Italy: results of a quantitative observational study. Int. J. Environ. Res. Public Health 1716, 5850 (2020)
Müller, M., Salathé, M., Kummervold, P.E.: COVID-Twitter-BERT: a natural language processing model to analyse COVID-19 content on Twitter. arXiv preprint arXiv:2005.07503 (2020)
Nguyen, D.Q., et al.: WNUT-2020 Task 2: identification of informative COVID-19 English tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 314–318 (2020)
Patwa, P., et al.: Fighting an infodemic: COVID-19 fake news dataset. arXiv preprint arXiv:2011.03327 (2020)
Patwa P. et al.: Overview of CONSTRAINT 2021 Shared Tasks: Detecting English COVID-19 Fake News and Hindi Hostile Posts. In: Chakraborty, T., Shu, K., Bernard, R., Liu, H., Akhtar, M.S. (eds.) Proceedings of the First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, CONSTRAINT 2021, CCIS, vol. 1402, pp. 42–53. Springer, Cham (2021)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peinelt, N., Nguyen, D., Liakata, M. tBERT: topic models and BERT joining forces for semantic similarity detection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7047–7055 (2020)
Pizarro, J.: Using N-grams to detect fake news spreaders on Twitter. In: CLEF (2020)
Rangel, F., et al.: Overview of the 8th author profiling task at PAN 2020: profiling fake news spreaders on Twitter. In: CLEF (2020)
Reis, J.C.S., et al.: Supervised learning for fake news detection. IEEE Intell. Syst. 234, 76–81 (2019)
Shaar, S., et al.: Overview of CheckThat! 2020 English: automatic identification and verification of claims in social media. arXiv preprint arXiv:2007.07997 (2020)
Shahi, G.K., Nandini, D.: FakeCovid-a multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343 (2020)
Shu, K., et al.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 119, 22–36 (2017)
Tang, L.: UZH at SemEval-2020 task 3: combining BERT with WordNet sense embeddings to predict graded word similarity changes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 166–170 (2020)
Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1(Long Papers), pp. 809–819 (2018)
Thorne, J., et al.: The FEVER2.0 shared task. In: Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), pp. 1–6 (2019)
Tran, K.V., et al.: UIT-HSE at WNUT-2020 task 2: exploiting CT-BERT for identifying COVID-19 information on the Twitter social network. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 383–387 (2020)
tweet-preprocessor 0.6.0. https://pypi.org/project/tweet-preprocessor/. Accessed 14 Dec 2020
Vijjali, R., et al.: Two stage transformer model for COVID-19 fake news detection and fact checking. arXiv preprint arXiv:2011.13253 (2020)
Williams, E., Rodrigues, P., Novak, V.: Accenture at CheckThat! 2020: if you say so: post-hoc fact-checking of claims using transformer-based models. arXiv preprint arXiv:2009.02431 (2020)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Wu, S.H., Chien, S.L.: A BERT based two-stage fake news spreaders profiling system. In: CLEF (2020)
Yang, C., Zhou, X., Zafarani, R.: CHECKED: Chinese COVID-19 fake news dataset. arXiv preprint arXiv:2010.09029 (2020)
Zhang, T., et al.: BDANN: BERT-based domain adaptation neural network for multi-modal fake news detection. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)
Zhou, X., et al.: Fake news: fundamental theories, detection strategies and challenges. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 836–837 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Glazkova, A., Glazkov, M., Trifonov, T. (2021). g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-73696-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73695-8
Online ISBN: 978-3-030-73696-5
eBook Packages: Computer ScienceComputer Science (R0)