Skip to main content

Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2021)

Abstract

We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F\(_1\) for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively.

B. Hamdan—Independent Researcher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://huggingface.co/dbmdz/bert-base-turkish-cased.

  2. 2.

    http://huggingface.co/dccuchile/bert-base-spanish-wwm-cased.

  3. 3.

    http://huggingface.co/iarfmoose/roberta-base-bulgarian.

References

  1. Abumansour, A., Zubiaga, A.: QMUL-SDS at CheckThat! 2021: enriching pre-trained language models for the estimation of check-worthiness of Arabic tweets. In: Faggioli et al. [33]

    Google Scholar 

  2. Agirre, E., et al.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016, pp. 497–511 (2016)

    Google Scholar 

  3. Alam, F., et al.: Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In: Proceedings of the International AAAI Conference on Web and Social Media. ICWSM 2021, vol. 15, pp. 913–922 (2021)

    Google Scholar 

  4. Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. ArXiv preprint 2005.00033 (2020)

    Google Scholar 

  5. Ali, Z.S., Mansour, W., Elsayed, T., Al-Ali, A.: AraFacts: the first large Arabic dataset of naturally occurring claims. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, ANLP 2021, pp. 231–236 (2021)

    Google Scholar 

  6. Althabiti, S., Alsalka, M., Atwell, E.: An AraBERT model for check-worthiness of Arabic tweets. In: Faggioli et al. [33]

    Google Scholar 

  7. Ashik, S.S., Apu, A.R., Marjana, N.J., Hasan, M.A., Islam, M.S.: M82B at CheckThat! 2021: multiclass fake news detection using BiLSTM based RNN model. In: Faggioli et al. [33]

    Google Scholar 

  8. Ashraf, N., Butt, S., Sidorov, G., Gelbukh, A.: Fake news detection using machine learning and data augmentation - CLEF2021. In: Faggioli et al. [33]

    Google Scholar 

  9. Atanasova, P., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims. Task 1: check-worthiness. In: Cappellato et al. [21]

    Google Scholar 

  10. Atanasova, P., Nakov, P., Karadzhov, G., Mohtarami, M., Da San Martino, G.: Overview of the CLEF-2019 CheckThat! Lab on automatic identification and verification of claims. Task 1: check-worthiness. In: Cappellato et al. [20]

    Google Scholar 

  11. Ba, M.L., Berti-Equille, L., Shah, K., Hammady, H.M.: VERA: a platform for veracity estimation over web data. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 159–162 (2016)

    Google Scholar 

  12. Balouchzahi, F., Shashirekha, H., Sidorov, G.: MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling. In: Faggioli et al. [33]

    Google Scholar 

  13. Baly, R., et al.: What was written vs. who read it: news media profiling using text analysis and social media context. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 3364–3374 (2020)

    Google Scholar 

  14. Baris Schlicht, I., Magnossão de Paula, A., Rosso, P.: UPV at CheckThat! 2021: mitigating cultural differences for identifying multilingual check-worthy claims. In: Faggioli et al. [33]

    Google Scholar 

  15. Barrón-Cedeño, A., et al.: Overview of CheckThat! 2020: automatic identification and verification of claims in social media. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 215–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_17

    Chapter  Google Scholar 

  16. Barrón-Cedeño, A., et al.: Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media. In: Arampatzis, A., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction – 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, 22–25 September 2020, Proceedings. LNCS, vol. 12260, pp. 215–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_17

  17. Barrón-Cedeño, A., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims. Task 2: factuality. In: Cappellato et al. [21]

    Google Scholar 

  18. Bouziane, M., Perrin, H., Cluzeau, A., Mardas, J., Sadeq, A.: Buster.AI at CheckThat! 2020: insights and recommendations to improve fact-checking. In: Cappellato et al. [19]

    Google Scholar 

  19. Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.): CLEF 2020 Working Notes. CEUR Workshop Proceedings. CEUR-WS.org (2020)

    Google Scholar 

  20. Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.): Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2019)

    Google Scholar 

  21. Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.): Working Notes of CLEF 2018-Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2018)

    Google Scholar 

  22. Carik, B., Yeniterzi, R.: SU-NLP at CheckThat! 2021: check-worthiness of Turkish tweets. In: Faggioli et al. [33]

    Google Scholar 

  23. Cazalens, S., Lamarre, P., Leblay, J., Manolescu, I., Tannier, X.: A content management perspective on fact-checking. In: Proceedings of the International Conference on World Wide Web, WWW 2018, pp. 565–574 (2018)

    Google Scholar 

  24. Cheema, G.S., Hakimov, S., Ewerth, R.: Check\_square at CheckThat! 2020: claim detection in social media via fusion of transformer and syntactic features. In: Cappellato et al. [19]

    Google Scholar 

  25. Chernyavskiy, A., Ilvovsky, D., Nakov, P.: Aschern at CLEF CheckThat! 2021: lambda-calculus of fact-checked claims. In: Faggioli et al. [33]

    Google Scholar 

  26. Cusmuliuc, C.G., Amarandei, M.A., Pelin, I., Cociorva, V.I., Iftene, A.: UAICS at CheckThat! 2021: fake news detection. In: Faggioli et al. [33]

    Google Scholar 

  27. Da San Martino, G., Barrón-Cedeno, A., Wachsmuth, H., Petrov, R., Nakov, P.: SemEval-2020 task 11: detection of propaganda techniques in news articles. In: Proceedings of the 14th Workshop on Semantic Evaluation, SemEval 2020, pp. 1377–1414 (2020)

    Google Scholar 

  28. Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Wong Sak Hoi, G., Zubiaga, A.: SemEval-2017 task 8: RumourEval: determining rumour veracity and support for rumours. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval 2017, pp. 69–76 (2017)

    Google Scholar 

  29. Dimitrov, D., et al.: SemEval-2021 task 6: detection of persuasion techniques in texts and images. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval 2021 (2021)

    Google Scholar 

  30. Dumani, L., Neumann, P.J., Schenkel, R.: A framework for argument retrieval - ranking argument clusters by frequency and specificity. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 431–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_29

    Chapter  Google Scholar 

  31. Elsayed, T., et al.: CheckThat! at CLEF 2019: automatic identification and verification of claims. In: Advances in Information Retrieval, pp. 309–315 (2019)

    Google Scholar 

  32. Elsayed, T., et al.: Overview of the CLEF-2019 CheckThat! lab: automatic identification and verification of claims. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 301–321. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_25

    Chapter  Google Scholar 

  33. Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.): CLEF 2021 Working Notes. Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. CEUR-WS.org (2021)

    Google Scholar 

  34. Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware approach for detecting worth-checking claims in political debates. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 267–276 (2017)

    Google Scholar 

  35. Ghanem, B., Glavaš, G., Giachanou, A., Ponzetto, S., Rosso, P., Rangel, F.: UPV-UMA at CheckThat! lab: verifying Arabic claims using cross lingual approach. In: Cappellato et al. [20]

    Google Scholar 

  36. Gorrell, G., et al: SemEval-2019 task 7: RumourEval, determining rumour veracity and support for rumours. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval 2019, pp. 845–854 (2019)

    Google Scholar 

  37. Gupta, A., Kumaraguru, P., Castillo, C., Meier, P.: TweetCred: real-time credibility assessment of content on Twitter. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 228–243. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_16

    Chapter  Google Scholar 

  38. Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pp. 1859–1874 (2018)

    Google Scholar 

  39. Hansen, C., Hansen, C., Simonsen, J., Lioma, C.: The Copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 fact checking lab. In: Cappellato et al. [21]

    Google Scholar 

  40. Hansen, C., Hansen, C., Simonsen, J., Lioma, C.: Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In: Cappellato et al. [20]

    Google Scholar 

  41. Hartl, P., Kruschwitz, U.: University of Regensburg at CheckThat! 2021: exploring text summarization for fake newsdetection. In: Faggioli et al. [33]

    Google Scholar 

  42. Hasanain, M., Elsayed, T.: bigIR at CheckThat! 2020: multilingual BERT for ranking Arabic tweets by check-worthiness. In: Cappellato et al. [19]

    Google Scholar 

  43. Hasanain, M., et al.: Overview of CheckThat! 2020 Arabic: automatic identification and verification of claims in social media. In: Cappellato et al. [19]

    Google Scholar 

  44. Hasanain, M., Suwaileh, R., Elsayed, T., Barrón-Cedeño, A., Nakov, P.: Overview of the CLEF-2019 CheckThat! Lab on automatic identification and verification of claims. Task 2: evidence and factuality. In: Cappellato et al. [20]

    Google Scholar 

  45. Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in presidential debates. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1835–1838 (2015)

    Google Scholar 

  46. Hassan, N., Tremayne, M., Arslan, F., Li, C.: Comparing automated factual claim detection against judgments of journalism organizations. In: Computation Journalism Symposium, pp. 1–5 (2016)

    Google Scholar 

  47. Hassan, N., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10(12), 1945–1948 (2017)

    Article  Google Scholar 

  48. Álvaro Huertas-Garcıia, Huertas-Tato, J., Martín, A., Camacho, D.: CIVIC-UPM at CheckThat! 2021: integration of transformers in misinformation detection and topic classification. In: Faggioli et al. [33]

    Google Scholar 

  49. Juan R. Martinez-Rico, J.M.R., Araujo, L.: NLP&IR@UNED at CheckThat! 2021: check-worthiness estimation and fake news detection using transformer models. In: Faggioli et al. [33]

    Google Scholar 

  50. Kannan, R., R, R.: DLRG@CLEF2021: an ensemble approach for fake detection on news articles. In: Faggioli et al. [33]

    Google Scholar 

  51. Karadzhov, G., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: Fully automated fact checking using external sources. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 344–353 (2017)

    Google Scholar 

  52. Kartal, Y.S., Kutlu, M.: TOBB ETU at CheckThat! 2020: prioritizing English and Arabic claims based on check-worthiness. In: Cappellato et al. [19]

    Google Scholar 

  53. Kartal, Y.S., Kutlu, M.: TrClaim-19: the first collection for Turkish check-worthy claim detection with annotator rationales. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 386–395 (2020)

    Google Scholar 

  54. Kazemi, A., Garimella, K., Shahi, G.K., Gaffney, D., Hale, S.A.: Tiplines to combat misinformation on encrypted platforms: a case study of the 2019 Indian election on WhatsApp. arXiv:2106.04726 (2021)

  55. Kovachevich, N.: BERT fine-tuning approach to CLEF CheckThat! Fake news detection. In: Faggioli et al. [33]

    Google Scholar 

  56. Kumari, S.: NoFake at CheckThat! 2021: fake news detection using BERT. arXiv:2108.05419 (2021)

  57. 3 L, H.R., M, A.: NITK\_NLP at CLEF CheckThat! 2021: ensemble transformer model for fake news classification. In: Faggioli et al. [33]

    Google Scholar 

  58. Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 3818–3824 (2016)

    Google Scholar 

  59. Martinez-Rico, J., Araujo, L., Martinez-Romo, J.: NLP&IR@UNED at CheckThat! 2020: a preliminary approach for check-worthiness and claim retrieval tasks using neural networks and graphs. In: Cappellato et al. [19]

    Google Scholar 

  60. Mihaylova, S., Borisova, I., Chemishanov, D., Hadzhitsanev, P., Hardalov, M., Nakov, P.: DIPS at CheckThat! 2021: verified claim retrieval. In: Faggioli et al. [33]

    Google Scholar 

  61. Mihaylova, T., Karadzhov, G., Atanasova, P., Baly, R., Mohtarami, M., Nakov, P.: SemEval-2019 task 8: fact checking in community question answering forums. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval 2019, pp. 860–869 (2019)

    Google Scholar 

  62. Mitra, T., Gilbert, E.: CREDBANK: a large-scale social media corpus with associated credibility annotations. In: Proceedings of the Ninth International AAAI Conference on Web and Social Media, ICWSM 2015, pp. 258–267 (2015)

    Google Scholar 

  63. Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016, pp. 31–41 (2016)

    Google Scholar 

  64. Mukherjee, S., Weikum, G.: Leveraging joint interactions for credibility analysis in news communities. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, pp. 353–362 (2015)

    Google Scholar 

  65. Nakov, P., et al.: Overview of the CLEF-2018 lab on automatic identification and verification of claims in political debates. In: Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum. CLEF 2018 (2018)

    Google Scholar 

  66. Nakov, P., et al.: Automated fact-checking for assisting human fact-checkers. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 (2021)

    Google Scholar 

  67. Nakov, P., et al.: SemEval-2016 Task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2015, pp. 525–545 (2016)

    Google Scholar 

  68. Nakov, P., et al.: The CLEF-2021 CheckThat! Lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021, Part II. LNCS, vol. 12657, pp. 639–649. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_75

    Chapter  Google Scholar 

  69. Nguyen, V.H., Sugiyama, K., Nakov, P., Kan, M.Y.: FANG: Leveraging social context for fake news detection using graph representation. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 1165–1174 (2020)

    Google Scholar 

  70. Nikolov, A., Da San Martino, G., Koychev, I., Nakov, P.: Team\_Alex at CheckThat! 2020: identifying check-worthy tweets with transformer models. In: Cappellato et al. [19]

    Google Scholar 

  71. Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. In: Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, pp. 6086–6093 (2020)

    Google Scholar 

  72. Pogorelov, K., et al.: FakeNews: Corona virus and 5G conspiracy task at MediaEval 2020. In: Proceedings of the MediaEval workshop, MediaEval 2020 (2020)

    Google Scholar 

  73. Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Credibility assessment of textual claims on the web. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, pp. 2173–2178 (2016)

    Google Scholar 

  74. Pritzkau, A.: NLytics at CheckThat! 2021: check-worthiness estimation as a regression problem on transformers. In: Faggioli et al. [33]

    Google Scholar 

  75. Pritzkau, A.: NLytics at CheckThat! 2021: multi-class fake news detection of news articles and domain identification with RoBERTa - a baseline model. In: Faggioli et al. [33]

    Google Scholar 

  76. Sardar, A.A.M., Salma, S.A., Islam, M.S., Hasan, M.A., Bhuiyan, T.: Team Sigmoid at CheckThat! 2021: multiclass fake news detection with machine learning. In: Faggioli et al. [33]

    Google Scholar 

  77. Sepúlveda-Torres, R., Saquete, E.: GPLSI team at CLEF CheckThat! 2021: fine-tuning BETO and RoBERTa. In: Faggioli et al. [33]

    Google Scholar 

  78. Shaar, S., Alam, F., Martino, G.D.S., Nakov, P.: The role of context in detecting previously fact-checked claims. arXiv preprint arXiv:2104.07423 (2021)

  79. Shaar, S., Babulkov, N., Da San Martino, G., Nakov, P.: That is a known lie: detecting previously fact-checked claims. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 3607–361 (2020)

    Google Scholar 

  80. Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 2 on detect previously fact-checked claims in tweets and political debates. In: Faggioli et al. [33]

    Google Scholar 

  81. Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 1 on check-worthiness estimation in tweets and political debates. In: Faggioli et al. [33]

    Google Scholar 

  82. Shaar, S., et al.: Overview of CheckThat! 2020 English: automatic identification and verification of claims in social media. In: Cappellato et al. [19]

    Google Scholar 

  83. Shahi, G.K.: AMUSED: an annotation framework of multi-modal social media data. arXiv:2010.00502 (2020)

  84. Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of COVID-19 misinformation on Twitter. Online Soc. Netw. Media 22, 100104 (2021). https://doi.org/10.1016/j.osnem.2020.100104. https://www.sciencedirect.com/science/article/pii/S2468696420300458

    Article  Google Scholar 

  85. Shahi, G.K., Majchrzak, T.A.: Exploring the spread of COVID-19 misinformation on Twitter (2021)

    Google Scholar 

  86. Shahi, G.K.: A multilingual domain identification using fact-checked articles: a case study on COVID-19 misinformation. arXiv preprint (2021)

    Google Scholar 

  87. Shahi, G.K., Nandini, D.: FakeCovid – a multilingual cross-domain fact check news dataset for COVID-19. In: Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media (2020)

    Google Scholar 

  88. Shahi, G.K., Struß, J.M., Mandl, T.: CT-FAN-21 corpus: a dataset for fake news detection, April 2021. https://doi.org/10.5281/zenodo.4714517

  89. Shahi, G.K., Struß, J.M., Mandl, T.: Overview of the CLEF-2021 CheckThat! Lab: task 3 on fake news detection. In: Faggioli et al. [33]

    Google Scholar 

  90. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. SIGKDD Explor. Newsl. 19(1), 22–36 (2017)

    Article  Google Scholar 

  91. Skuczyńska, B., Shaar, S., Spenader, J., Nakov, P.: BeaSku at CheckThat! 2021: fine-tuning sentence BERT with triplet loss and limited data. In: Faggioli et al. [33]

    Google Scholar 

  92. Sohan, S., Rajon, H.S., Khusbu, A., Islam, M.S., Hasan, M.A.: Black Ops at CheckThat! 2021: user profiles analyze of intelligent detection on fake tweets notebook in shared task. In: Faggioli et al. [33]

    Google Scholar 

  93. Tchechmedjiev, A., et al.: ClaimsKG: a knowledge graph of fact-checked claims. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_20

    Chapter  Google Scholar 

  94. Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, pp. 809–819 (2018)

    Google Scholar 

  95. Touahri, I., Mazroui, A.: EvolutionTeam at CheckThat! 2020: integration of linguistic and sentimental features in a fake news detection approach. In: Cappellato et al. [19]

    Google Scholar 

  96. Tsoplefack, W.K.: Classifier for fake news detection and topical domain of news articles. In: Faggioli et al. [19]

    Google Scholar 

  97. Utsha, R.S., Keya, M., Hasan, M.A., Islam, M.S.: Qword at CheckThat! 2021: an extreme gradient boosting approach for multiclass fake news detection. In: Faggioli et al. [33]

    Google Scholar 

  98. Vasileva, S., Atanasova, P., Màrquez, L., Barrón-Cedeño, A., Nakov, P.: It takes nine to smell a rat: neural multi-task learning for check-worthiness prediction. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2019, pp. 1229–1239 (2019)

    Google Scholar 

  99. Williams, E., Rodrigues, P., Novak, V.: Accenture at CheckThat! 2020: if you say so: post-hoc fact-checking of claims using transformer-based models. In: Cappellato et al. [19]

    Google Scholar 

  100. Williams, E., Rodrigues, P., Tran, S.: Accenture at CheckThat! 2021: interesting claim identification and ranking with contextually sensitive lexical training data augmentation. In: Faggioli et al. [33]

    Google Scholar 

  101. Zengin, M.S., Kartal, Y.S., Kutlu, M.: TOBB ETU at CheckThat! 2021: data engineering for detecting check-worthy claims. In: Faggioli et al. [33]

    Google Scholar 

  102. Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: early detection of rumors in social media from enquiry posts. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 1395–1405 (2015)

    Google Scholar 

  103. Zhou, X., Wu, B., Fung, P.: Fight for 4230 at CLEF CheckThat! 2021: domain-specific preprocessing and pretrained model for ranking claims by check-worthiness. In: Faggioli et al. [33]

    Google Scholar 

  104. Zubiaga, A., Liakata, M., Procter, R., Hoi, G.W.S., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 11(3), e0150989 (2016)

    Article  Google Scholar 

  105. Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims using heuristics and supervised learning. In: Cappellato et al. [21]

    Google Scholar 

Download references

Acknowledgments

The work of Tamer Elsayed and Maram Hasanain is made possible by NPRP grant #NPRP-11S-1204-170060 from the Qatar National Research Fund (a member of Qatar Foundation). The work of Fatima Haouari is supported by GSRA grant #GSRA6-1-0611-19074 from the Qatar National Research Fund. The statements made herein are solely the responsibility of the authors.

This research is also part of the Tanbih mega-project, developed at the Qatar Computing Research Institute, HBKU, which aims to limit the impact of “fake news”, propaganda, and media bias, thus promoting digital literacy and critical thinking.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Firoj Alam .

Editor information

Editors and Affiliations

Appendices

Appendix

ASystems for Task 1

The positions in the task ranking appear after each team name. See Tables 5, 6 and 7 for further details.

Team Accenture [100] used BERT and RoBERTa with data augmentation. They further generated additional synthetic training data using lexical substitution. To find the most probable substitutions, they used BERT-based contextual embedding to create synthetic examples for the positive class. They further added a mean-pooling layer and a dropout layer on top of the model before the final classification layer.

Team Fight for 4230 [103] focused its efforts mostly on two fronts: the creation of a pre-processing module able to properly normalize the tweets and the augmentation of the data by means of machine translation and WordNet-based substitutions. The pre-processing included link removal and punctuation cleaning, as well as quantities and contractions expansion. All hashtags related to COVID-19 were normalized into one and the hashtags were expanded. Their best approach was based on BERTweet with a dropout layer and the above-mentioned pre-processing.

Team GPLSI [77] applied the RoBERTa and the BETO transformers together with different manually engineered features, such as the occurrence of dates and numbers or words from LIWC. A thorough exploration of parameters was made using weighting and bias techniques. They also tried to split the four-way classification into two binary classifications and one three-way classification. They further tried oversampling and undersampling.

Team iCompass used several prepossessing steps, including (i) English word removal, (ii) removing URLs and mentions, and (iii) data normalization, removing tashkeel and the letter madda from texts, as well as duplicates, and replacing some characters to prevent mixing. They proposed a simple ensemble of two BERT-based models, which include AraBERT and Arabic-ALBERT.

Team NLP&IR@UNED [49] used several transformer models, such as BERT, ALBERT, RoBERTa, DistilBERT, and Funnel-Transformer, for the experiments to compare the performance. For English, they obtained better results using BERT trained with tweets. For Spanish, they used Electra.

Team NLytics [74] used RoBERTa with a regression function in the final layer, approaching the problem as a ranking task.

Team QMUL-SDS [1] used the AraBERT preprocessing function to (i) replace URLs, email addressees, and user mentions with standard words, (ii) removed line breaks, HTML markup, repeated characters, and unwanted characters, such as emotion icons, and (iii) handled white spaces between words and digits (non-Arabic, or English), and/or a combination of both, and before and after two brackets, and also (iv) removed unnecessary punctuation. They addressed the task as a ranking problem, and fine-tuned an Arabic transformer (AraBERTv0.2-base) on a combination of the data from this year and the data from the CheckThat! lab 2020 (the CT20-AR dataset).

Team SCUoL [6] used typical pre-processing steps, including cleaning the text, segmentation, and tokenization. Their experiments consists of fine-tuning different AraBERT models, and their final results were obtained using AraBERTv2-base.

Team SU-NLP [22] also used several pre-possessing steps, including (i) removing emojis, hashtags, and (ii) replacing all mentions with a special token (@USER), and all URLs with the respective website’s domain. If the URL is for a tweet, they replaced the URL with TWITTER and the respective user account name. They reported that this URL expansion method improved the performance. Subsequently, they used an ensemble of BERTurk models fine-tuned using different seed values.

Team TOBB ETU [101] investigated different approaches to fine-tune transformer models including data augmentation using machine translation, weak supervision, and cross-lingual training. For their submission, they removed URLs and user mentions from the tweets, and fine-tuned a separate BERT-based models for each language. In particular, they fine-tuned BERTurkFootnote 1, AraBERT, BETOFootnote 2, and the BERT-base model for Turkish, Arabic, Spanish, and English, respectively. For Bulgarian, they fine-tune a RoBERTa model pre-trained with Bulgarian documents.Footnote 3

Team UPV [14] used a multilingual sentence transformer representation (S-BERT) with knowledge distillation, originally intended for question answering. They further introduced an auxiliary language identification task, aside the downstream check-worthiness task.

Systems for Task 2

Team Aschern [25] used TF.IDF, fine-tuned pre-trained S-BERT, and the reranking LambdaMART model.

Team BeaSku [91] used triplet loss training to fine-tune S-BERT. Then, they used the scores predicted by the fine-tuned model along with BM25 scores as features to train a rankSVM re-ranker. They further discussed the impact of applying online mining of triplets. They also experimented with data augmentation.

Team DIPS [60] calculated S-BERT embeddings for all claims, then computed a cosine similarity for each pair of an input claim and a verified claim. The prediction is made by passing a sorted list of cosine similarities to a neural network.

Team NLytics approached the problem as a regression task, and used RoBERTa with a regression function in the final layer.

Systems for Task 3

Team Black Ops [92] performed data pre-processing by removing stop-words and punctuation marks. Then, they experimented with decision trees, random forest, and gradient boosting classifiers for Task 3A, and found the latter to perform best.

Team CIC [8] experimented with logistic regression, multi-layer perceptron, support vector machines, and random forest. Their experiments consisted of using stratified 5-fold cross-validation on the training data. Their best results were obtained using logistic regression for task 3A, and a multi-layer perceptron for task 3B.

Team CIC experimented with a decision tree, a random forest, and a gradient boosting algorithms. They found the latter to perform best.

Team CIVIC-UPM [48] participated in the two subtasks of task 3. They performed pre-processing, using a number of tools: (iftfy to repair Unicode and emoji errors, (iiekphrasis to perform lower-casing, normalizing percentages, time, dates, emails, phones, and numbers, (iiicontractions for abbreviation expansion, and (ivNLTK for word tokenization, stop-words removal, punctuation removal and word lemmatization. Then, they combined doc2vec with transformer representations (Electra base, T5 small and T5 base, Longformer base, RoBERTa base and DistilRoBERTa base). They further used additional data from Kaggle’s Ag News task, Kaggle’s KDD2020, and Clickbait news detection competitions. Finally, they experimented with a number of classifiers such as Naïve Bayes, Random Forest, Logistic Regression with L1 and L2 regularization, Elastic Net, and SVMs. The best system for subtask 3A used DistilRoBERTa-base on the text body with oversampling and a sliding window for dealing with long texts. Their best system for task 3B used RoBERTa-base on the title+body text with oversampling but no sliding window.

Team DLRG experimented with a number of traditional approaches like Random Forest, Naïve Bayes and Logistic Regression as well as an online passive-aggressive classifier and different ensembles thereof. The best result was achieved by an ensemble of Naïve Bayes, Logistic Regression, and the Passive Aggressive classifier for task 3A. For task 3B, the Online Passive-Aggressive classifier outperformed all other approaches, including the considered ensembles.

Team GPLSI [77] applied the RoBERTa transformer together with different manually-engineered features, such as the occurrence of dates and numbers or words from LIWC. Both the title and the body were concatenated as a single sequence of words. Rather than going for a single multi-class setting, they used two binary models considering the most frequent classes: false vs. other, and true vs. other, followed by one three-class model.

Team MUCIC [12] used a majority voting ensemble with three BERT variants. They applied BERT, Distilbert, and RoBERTa, and fine-tuned the pre-trained models.

Team NITK_NLP [57] proposed an approach, that included pre-processing and tokenization of the news article, and then experimented with multiple transformer models. The final prediction was made by an ensemble.

Team NKovachevich [55] created lexical features. They extracted the 500 most frequent word stems in the dataset, and calculated the TF.IDF values, which they used in a multinomial Naïve Bayes classifier. A much better performance was achieved with an LSTM model that used GloVe embeddings. A little lower F1 value was achieved using BERT. They further found RoBERTa to perform worse than BERT.

Team NLP&IR@UNED [49] experimented with four transformer architectures and input sizes of 150 and 200 words. In the preliminary tests, the best performance was achieved by ALBERT with 200 words. They also experimented with combining TF.IDF values from the text, all the features provided by the LIWC tool, and the TF.IDF values from the first 20 domain names returned by a query to a search engine. Unlike what was obtained in the dev dataset, in the official competition, the best results were obtained with the approach based on TF.IDF, LIWC, and domain names.

Team NLytics fined-tuned RoBERTa on the dataset for each of the sub-tasks. Since the data is unbalanced, they used under-sampling. They also truncated the documents to 512 words to fit into the RoBERTa input size.

Team NoFake [56] applied BERT without fine-tuning, but used an extensive amount of additional data for training, downloaded from various fact-checking websites.

Team Pathfinder [96] participated in both tasks and used multinomial Naïve Bayes and random forest. The former performed better for both tasks. For task 3A, the they merged the classed false and partially false into one class, which boosted the model performance by 41% (a non-official score mentioned in the paper).

Team Probity addressed the multiclass fake news detection subtask, they used a simple LSTM architecture where they adopted word2vec embeddings to represent the news articles.

Team Qword [97] applied pre-processing techniques, which included stop-word removal, punctuation removal and lemmatization using a Porter stemmer. The TF.IDF values were calculated for the words. For these features, four classification algorithms were applied. The best result was given by Extreme Gradient Boosting.

Team SAUD used an SVM with TF.IDF. They tried Logistic Regression, Multinomial Naïve Bayes, and Random Forest, and found SVM to work best.

Team Sigmoid [76] experimented with different traditional machine learning approaches, with multinomial Naïve Bayes performing best, and one deep learning approach, namely an LSTM with the Adam optimizer. The latter outperformed the more traditional approaches.

Team Spider applies an LSTM, after a pre-processing consisting of stop-word removal and stemming.

Team UAICS [26] experimented with various models including BERT, LSTM, Bi-LSTM, and feature-based models. Their submitted model is a Gradient Boosting with a weighted combination of three feature groups: bi-grams, POS tags, and lexical categories of words.

Team University of Regensburg [41] used different fine-tuned variants of BERT with a linear layer on top and applied different approaches to address the maximum sequence length of BERT. Besides hierarchical transformer representations, they also experimented with different summarization techniques like extractive and abstractive summarization. They performed oversampling to address the class imbalance, as well as extractive (using DistilBERT) and abstractive summarization (using distil-BART-CNN-12-6), before performing classification using fine-tuned BERT with a hierarchical transformer representation.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakov, P. et al. (2021). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85251-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85250-4

  • Online ISBN: 978-3-030-85251-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics