Skip to main content

g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

  • Conference paper
  • First Online:
Combating Online Hostile Posts in Regional Languages during Emergency Situation (CONSTRAINT 2021)

Abstract

The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic. In this paper, we present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English. In particular, we propose our approach using the transformer-based ensemble of COVID-Twitter-BERT (CT-BERT) models. We describe the models used, the ways of text preprocessing and adding extra data. As a result, our best model achieved the weighted F1-score of 98.69 on the test set (the first place in the leaderboard) of this shared task that attracted 166 submitted teams in total.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. arXiv preprint arXiv:2005.00033 (2020)

  2. Alkhalifa, R. et al.: QMUL-SDS at CheckThat! 2020: determining COVID-19 tweet check-worthiness using an enhanced CT-BERT with numeric expressions. arXiv preprint arXiv:2008.13160 (2020)

  3. Apuke, O.D., Omar, B.: Fake news and COVID-19: modelling the predictors of fake news sharing among social media users. Telematics Inform. 56, 101475 (2020)

    Article  Google Scholar 

  4. Elsayed, T., et al.: Overview of the CLEF-2019 CheckThat! lab: automatic identification and verification of claims. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 301–321. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_25

    Chapter  Google Scholar 

  5. Buda, J., Bolonyai, F.: An ensemble model using N-grams and statistical features to identify fake news spreaders on Twitter. In: CLEF (2020)

    Google Scholar 

  6. Chernyaev, A., Spryiskov, A., Ivashko, A., Bidulya, Y.: A rumor detection in Russian tweets. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 108–118. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_11

    Chapter  Google Scholar 

  7. Cui, L., Lee, D.: CoAID: COVID-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885 (2020)

  8. Da San Martino, G. et al.: SemEval-2020 task 11: detection of propaganda techniques in news articles. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1377–1414 (2020)

    Google Scholar 

  9. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  10. Elhadad, M.K., Li, K.F., Gebali, F.: COVID-19-FAKES: a Twitter (Arabic/English) dataset for detecting misleading information on COVID-19. In: Barolli, L., Li, K.F., Miwa, H. (eds.) INCoS 2020. AISC, vol. 1263, pp. 256–268. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57796-4_25

    Chapter  Google Scholar 

  11. emoji 0.6.0. https://pypi.org/project/tweet-emoji/. Accessed 14 Dec 2020

  12. g2tmn at Constraint@AAAI2021 - COVID19 fake news detection in English. https://github.com/oldaandozerskaya/covid_news. Accessed 14 Dec 2020

  13. Jwa, H., et al.: exBAKE: automatic fake news detection model based on bidirectional encoder representations from transformers (BERT). Appl. Sci. 919, 4062 (2019)

    Article  Google Scholar 

  14. Kar, D. et al.: No rumours please! a multi-indic-lingual approach for COVID fake-tweet detection. arXiv preprint arXiv:2010.06906 (2020)

  15. Kim, D., Graham, T., Wan, Z., Rizoiu, M.-A.: Analysing user identity via time-sensitive semantic edit distance (t-SED): a case study of Russian trolls on Twitter. J. Comput. Soc. Sci. 2(2), 331–351 (2019). https://doi.org/10.1007/s42001-019-00051-x

    Article  Google Scholar 

  16. Kruspe, A. et al.: Cross-language sentiment analysis of European Twitter messages during the COVID-19 pandemic. arXiv preprint arXiv:2008.12172 (2020)

  17. Kula, S., Choraś, M., Kozik, R.: Application of the BERT-based architecture in fake news detection. In: Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., Corchado, E. (eds.) CISIS 2019. AISC, vol. 1267, pp. 239–249. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57805-3_23

    Chapter  Google Scholar 

  18. Kumar, P., Singh, A.: NutCracker at WNUT-2020 Task 2: robustly identifying informative COVID-19 Tweets using ensembling and adversarial training. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 404–408 (2020)

    Google Scholar 

  19. Liu, Y. et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  20. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 63–70 (2002)

    Google Scholar 

  21. Loshchilov I., Hutter F.: Fixing weight decay regularization in Adam. arXiv preprint arXiv:1711.05101 (2017)

  22. Mazza, C., et al.: A nationwide survey of psychological distress among Italian people during the COVID-19 pandemic: immediate psychological responses and associated factors. Int. J. Environ. Res. Public Health 179, 3165 (2020)

    Article  Google Scholar 

  23. Mikhalkova, E., et al.: UTMN at SemEval-2020 Task 11: a kitchen solution to automatic propaganda detection. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1858–1864 (2020)

    Google Scholar 

  24. Morio, G., et al.: Hitachi at SemEval-2020 Task 11: an empirical study of pre-trained transformer family for propaganda detection. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1739–1748 (2020)

    Google Scholar 

  25. Moscadelli, A., et al.: Fake news and COVID-19 in Italy: results of a quantitative observational study. Int. J. Environ. Res. Public Health 1716, 5850 (2020)

    Article  Google Scholar 

  26. Müller, M., Salathé, M., Kummervold, P.E.: COVID-Twitter-BERT: a natural language processing model to analyse COVID-19 content on Twitter. arXiv preprint arXiv:2005.07503 (2020)

  27. Nguyen, D.Q., et al.: WNUT-2020 Task 2: identification of informative COVID-19 English tweets. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 314–318 (2020)

    Google Scholar 

  28. Patwa, P., et al.: Fighting an infodemic: COVID-19 fake news dataset. arXiv preprint arXiv:2011.03327 (2020)

  29. Patwa P. et al.: Overview of CONSTRAINT 2021 Shared Tasks: Detecting English COVID-19 Fake News and Hindi Hostile Posts. In: Chakraborty, T., Shu, K., Bernard, R., Liu, H., Akhtar, M.S. (eds.) Proceedings of the First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation, CONSTRAINT 2021, CCIS, vol. 1402, pp. 42–53. Springer, Cham (2021)

    Google Scholar 

  30. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)

    Google Scholar 

  31. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  32. Peinelt, N., Nguyen, D., Liakata, M. tBERT: topic models and BERT joining forces for semantic similarity detection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7047–7055 (2020)

    Google Scholar 

  33. Pizarro, J.: Using N-grams to detect fake news spreaders on Twitter. In: CLEF (2020)

    Google Scholar 

  34. Rangel, F., et al.: Overview of the 8th author profiling task at PAN 2020: profiling fake news spreaders on Twitter. In: CLEF (2020)

    Google Scholar 

  35. Reis, J.C.S., et al.: Supervised learning for fake news detection. IEEE Intell. Syst. 234, 76–81 (2019)

    Article  Google Scholar 

  36. Shaar, S., et al.: Overview of CheckThat! 2020 English: automatic identification and verification of claims in social media. arXiv preprint arXiv:2007.07997 (2020)

  37. Shahi, G.K., Nandini, D.: FakeCovid-a multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343 (2020)

  38. Shu, K., et al.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 119, 22–36 (2017)

    Article  Google Scholar 

  39. Tang, L.: UZH at SemEval-2020 task 3: combining BERT with WordNet sense embeddings to predict graded word similarity changes. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 166–170 (2020)

    Google Scholar 

  40. Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1(Long Papers), pp. 809–819 (2018)

    Google Scholar 

  41. Thorne, J., et al.: The FEVER2.0 shared task. In: Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER), pp. 1–6 (2019)

    Google Scholar 

  42. Tran, K.V., et al.: UIT-HSE at WNUT-2020 task 2: exploiting CT-BERT for identifying COVID-19 information on the Twitter social network. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pp. 383–387 (2020)

    Google Scholar 

  43. tweet-preprocessor 0.6.0. https://pypi.org/project/tweet-preprocessor/. Accessed 14 Dec 2020

  44. Vijjali, R., et al.: Two stage transformer model for COVID-19 fake news detection and fact checking. arXiv preprint arXiv:2011.13253 (2020)

  45. Williams, E., Rodrigues, P., Novak, V.: Accenture at CheckThat! 2020: if you say so: post-hoc fact-checking of claims using transformer-based models. arXiv preprint arXiv:2009.02431 (2020)

  46. Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)

    Google Scholar 

  47. Wu, S.H., Chien, S.L.: A BERT based two-stage fake news spreaders profiling system. In: CLEF (2020)

    Google Scholar 

  48. Yang, C., Zhou, X., Zafarani, R.: CHECKED: Chinese COVID-19 fake news dataset. arXiv preprint arXiv:2010.09029 (2020)

  49. Zhang, T., et al.: BDANN: BERT-based domain adaptation neural network for multi-modal fake news detection. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

    Google Scholar 

  50. Zhou, X., et al.: Fake news: fundamental theories, detection strategies and challenges. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 836–837 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Glazkova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Glazkova, A., Glazkov, M., Trifonov, T. (2021). g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds) Combating Online Hostile Posts in Regional Languages during Emergency Situation. CONSTRAINT 2021. Communications in Computer and Information Science, vol 1402. Springer, Cham. https://doi.org/10.1007/978-3-030-73696-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73696-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73695-8

  • Online ISBN: 978-3-030-73696-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics