Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News

Nakov, Preslav; Da San Martino, Giovanni; Elsayed, Tamer; Barrón-Cedeño, Alberto; Míguez, Rubén; Shaar, Shaden; Alam, Firoj; Haouari, Fatima; Hasanain, Maram; Mansour, Watheq; Hamdan, Bayan; Ali, Zien Sheikh; Babulkov, Nikolay; Nikolov, Alex; Shahi, Gautam Kishore; Struß, Julia Maria; Mandl, Thomas; Kutlu, Mucahid; Kartal, Yavuz Selim

doi:10.1007/978-3-030-85251-1_19

Preslav Nakov¹⁸,
Giovanni Da San Martino¹⁹,
Tamer Elsayed²⁰,
Alberto Barrón-Cedeño²¹,
Rubén Míguez²²,
Shaden Shaar¹⁸,
Firoj Alam¹⁸,
Fatima Haouari²⁰,
Maram Hasanain²⁰,
Watheq Mansour²⁰,
Bayan Hamdan²⁸,
Zien Sheikh Ali²⁰,
Nikolay Babulkov²³,
Alex Nikolov²³,
Gautam Kishore Shahi²⁴,
Julia Maria Struß²⁵,
Thomas Mandl²⁶,
Mucahid Kutlu²⁷ &
…
Yavuz Selim Kartal²⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1390 Accesses
22 Citations

Abstract

We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F\(_1\) for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively.

B. Hamdan—Independent Researcher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abumansour, A., Zubiaga, A.: QMUL-SDS at CheckThat! 2021: enriching pre-trained language models for the estimation of check-worthiness of Arabic tweets. In: Faggioli et al. [33]
Google Scholar
Agirre, E., et al.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016, pp. 497–511 (2016)
Google Scholar
Alam, F., et al.: Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In: Proceedings of the International AAAI Conference on Web and Social Media. ICWSM 2021, vol. 15, pp. 913–922 (2021)
Google Scholar
Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. ArXiv preprint 2005.00033 (2020)
Google Scholar
Ali, Z.S., Mansour, W., Elsayed, T., Al-Ali, A.: AraFacts: the first large Arabic dataset of naturally occurring claims. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, ANLP 2021, pp. 231–236 (2021)
Google Scholar
Althabiti, S., Alsalka, M., Atwell, E.: An AraBERT model for check-worthiness of Arabic tweets. In: Faggioli et al. [33]
Google Scholar
Ashik, S.S., Apu, A.R., Marjana, N.J., Hasan, M.A., Islam, M.S.: M82B at CheckThat! 2021: multiclass fake news detection using BiLSTM based RNN model. In: Faggioli et al. [33]
Google Scholar
Ashraf, N., Butt, S., Sidorov, G., Gelbukh, A.: Fake news detection using machine learning and data augmentation - CLEF2021. In: Faggioli et al. [33]
Google Scholar
Atanasova, P., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims. Task 1: check-worthiness. In: Cappellato et al. [21]
Google Scholar
Atanasova, P., Nakov, P., Karadzhov, G., Mohtarami, M., Da San Martino, G.: Overview of the CLEF-2019 CheckThat! Lab on automatic identification and verification of claims. Task 1: check-worthiness. In: Cappellato et al. [20]
Google Scholar
Ba, M.L., Berti-Equille, L., Shah, K., Hammady, H.M.: VERA: a platform for veracity estimation over web data. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 159–162 (2016)
Google Scholar
Balouchzahi, F., Shashirekha, H., Sidorov, G.: MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling. In: Faggioli et al. [33]
Google Scholar
Baly, R., et al.: What was written vs. who read it: news media profiling using text analysis and social media context. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 3364–3374 (2020)
Google Scholar
Baris Schlicht, I., Magnossão de Paula, A., Rosso, P.: UPV at CheckThat! 2021: mitigating cultural differences for identifying multilingual check-worthy claims. In: Faggioli et al. [33]
Google Scholar
Barrón-Cedeño, A., et al.: Overview of CheckThat! 2020: automatic identification and verification of claims in social media. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 215–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_17
Chapter Google Scholar
Barrón-Cedeño, A., et al.: Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media. In: Arampatzis, A., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction – 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, 22–25 September 2020, Proceedings. LNCS, vol. 12260, pp. 215–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_17
Barrón-Cedeño, A., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims. Task 2: factuality. In: Cappellato et al. [21]
Google Scholar
Bouziane, M., Perrin, H., Cluzeau, A., Mardas, J., Sadeq, A.: Buster.AI at CheckThat! 2020: insights and recommendations to improve fact-checking. In: Cappellato et al. [19]
Google Scholar
Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.): CLEF 2020 Working Notes. CEUR Workshop Proceedings. CEUR-WS.org (2020)
Google Scholar
Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.): Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2019)
Google Scholar
Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.): Working Notes of CLEF 2018-Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2018)
Google Scholar
Carik, B., Yeniterzi, R.: SU-NLP at CheckThat! 2021: check-worthiness of Turkish tweets. In: Faggioli et al. [33]
Google Scholar
Cazalens, S., Lamarre, P., Leblay, J., Manolescu, I., Tannier, X.: A content management perspective on fact-checking. In: Proceedings of the International Conference on World Wide Web, WWW 2018, pp. 565–574 (2018)
Google Scholar
Cheema, G.S., Hakimov, S., Ewerth, R.: Check\_square at CheckThat! 2020: claim detection in social media via fusion of transformer and syntactic features. In: Cappellato et al. [19]
Google Scholar
Chernyavskiy, A., Ilvovsky, D., Nakov, P.: Aschern at CLEF CheckThat! 2021: lambda-calculus of fact-checked claims. In: Faggioli et al. [33]
Google Scholar
Cusmuliuc, C.G., Amarandei, M.A., Pelin, I., Cociorva, V.I., Iftene, A.: UAICS at CheckThat! 2021: fake news detection. In: Faggioli et al. [33]
Google Scholar
Da San Martino, G., Barrón-Cedeno, A., Wachsmuth, H., Petrov, R., Nakov, P.: SemEval-2020 task 11: detection of propaganda techniques in news articles. In: Proceedings of the 14th Workshop on Semantic Evaluation, SemEval 2020, pp. 1377–1414 (2020)
Google Scholar
Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Wong Sak Hoi, G., Zubiaga, A.: SemEval-2017 task 8: RumourEval: determining rumour veracity and support for rumours. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval 2017, pp. 69–76 (2017)
Google Scholar
Dimitrov, D., et al.: SemEval-2021 task 6: detection of persuasion techniques in texts and images. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval 2021 (2021)
Google Scholar
Dumani, L., Neumann, P.J., Schenkel, R.: A framework for argument retrieval - ranking argument clusters by frequency and specificity. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 431–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_29
Chapter Google Scholar
Elsayed, T., et al.: CheckThat! at CLEF 2019: automatic identification and verification of claims. In: Advances in Information Retrieval, pp. 309–315 (2019)
Google Scholar
Elsayed, T., et al.: Overview of the CLEF-2019 CheckThat! lab: automatic identification and verification of claims. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 301–321. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_25
Chapter Google Scholar
Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.): CLEF 2021 Working Notes. Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. CEUR-WS.org (2021)
Google Scholar
Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware approach for detecting worth-checking claims in political debates. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 267–276 (2017)
Google Scholar
Ghanem, B., Glavaš, G., Giachanou, A., Ponzetto, S., Rosso, P., Rangel, F.: UPV-UMA at CheckThat! lab: verifying Arabic claims using cross lingual approach. In: Cappellato et al. [20]
Google Scholar
Gorrell, G., et al: SemEval-2019 task 7: RumourEval, determining rumour veracity and support for rumours. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval 2019, pp. 845–854 (2019)
Google Scholar
Gupta, A., Kumaraguru, P., Castillo, C., Meier, P.: TweetCred: real-time credibility assessment of content on Twitter. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 228–243. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_16
Chapter Google Scholar
Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pp. 1859–1874 (2018)
Google Scholar
Hansen, C., Hansen, C., Simonsen, J., Lioma, C.: The Copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 fact checking lab. In: Cappellato et al. [21]
Google Scholar
Hansen, C., Hansen, C., Simonsen, J., Lioma, C.: Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In: Cappellato et al. [20]
Google Scholar
Hartl, P., Kruschwitz, U.: University of Regensburg at CheckThat! 2021: exploring text summarization for fake newsdetection. In: Faggioli et al. [33]
Google Scholar
Hasanain, M., Elsayed, T.: bigIR at CheckThat! 2020: multilingual BERT for ranking Arabic tweets by check-worthiness. In: Cappellato et al. [19]
Google Scholar
Hasanain, M., et al.: Overview of CheckThat! 2020 Arabic: automatic identification and verification of claims in social media. In: Cappellato et al. [19]
Google Scholar
Hasanain, M., Suwaileh, R., Elsayed, T., Barrón-Cedeño, A., Nakov, P.: Overview of the CLEF-2019 CheckThat! Lab on automatic identification and verification of claims. Task 2: evidence and factuality. In: Cappellato et al. [20]
Google Scholar
Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in presidential debates. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1835–1838 (2015)
Google Scholar
Hassan, N., Tremayne, M., Arslan, F., Li, C.: Comparing automated factual claim detection against judgments of journalism organizations. In: Computation Journalism Symposium, pp. 1–5 (2016)
Google Scholar
Hassan, N., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10(12), 1945–1948 (2017)
Article Google Scholar
Álvaro Huertas-Garcıia, Huertas-Tato, J., Martín, A., Camacho, D.: CIVIC-UPM at CheckThat! 2021: integration of transformers in misinformation detection and topic classification. In: Faggioli et al. [33]
Google Scholar
Juan R. Martinez-Rico, J.M.R., Araujo, L.: NLP&IR@UNED at CheckThat! 2021: check-worthiness estimation and fake news detection using transformer models. In: Faggioli et al. [33]
Google Scholar
Kannan, R., R, R.: DLRG@CLEF2021: an ensemble approach for fake detection on news articles. In: Faggioli et al. [33]
Google Scholar
Karadzhov, G., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: Fully automated fact checking using external sources. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 344–353 (2017)
Google Scholar
Kartal, Y.S., Kutlu, M.: TOBB ETU at CheckThat! 2020: prioritizing English and Arabic claims based on check-worthiness. In: Cappellato et al. [19]
Google Scholar
Kartal, Y.S., Kutlu, M.: TrClaim-19: the first collection for Turkish check-worthy claim detection with annotator rationales. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 386–395 (2020)
Google Scholar
Kazemi, A., Garimella, K., Shahi, G.K., Gaffney, D., Hale, S.A.: Tiplines to combat misinformation on encrypted platforms: a case study of the 2019 Indian election on WhatsApp. arXiv:2106.04726 (2021)
Kovachevich, N.: BERT fine-tuning approach to CLEF CheckThat! Fake news detection. In: Faggioli et al. [33]
Google Scholar
Kumari, S.: NoFake at CheckThat! 2021: fake news detection using BERT. arXiv:2108.05419 (2021)
3 L, H.R., M, A.: NITK\_NLP at CLEF CheckThat! 2021: ensemble transformer model for fake news classification. In: Faggioli et al. [33]
Google Scholar
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 3818–3824 (2016)
Google Scholar
Martinez-Rico, J., Araujo, L., Martinez-Romo, J.: NLP&IR@UNED at CheckThat! 2020: a preliminary approach for check-worthiness and claim retrieval tasks using neural networks and graphs. In: Cappellato et al. [19]
Google Scholar
Mihaylova, S., Borisova, I., Chemishanov, D., Hadzhitsanev, P., Hardalov, M., Nakov, P.: DIPS at CheckThat! 2021: verified claim retrieval. In: Faggioli et al. [33]
Google Scholar
Mihaylova, T., Karadzhov, G., Atanasova, P., Baly, R., Mohtarami, M., Nakov, P.: SemEval-2019 task 8: fact checking in community question answering forums. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval 2019, pp. 860–869 (2019)
Google Scholar
Mitra, T., Gilbert, E.: CREDBANK: a large-scale social media corpus with associated credibility annotations. In: Proceedings of the Ninth International AAAI Conference on Web and Social Media, ICWSM 2015, pp. 258–267 (2015)
Google Scholar
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016, pp. 31–41 (2016)
Google Scholar
Mukherjee, S., Weikum, G.: Leveraging joint interactions for credibility analysis in news communities. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, pp. 353–362 (2015)
Google Scholar
Nakov, P., et al.: Overview of the CLEF-2018 lab on automatic identification and verification of claims in political debates. In: Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum. CLEF 2018 (2018)
Google Scholar
Nakov, P., et al.: Automated fact-checking for assisting human fact-checkers. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 (2021)
Google Scholar
Nakov, P., et al.: SemEval-2016 Task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2015, pp. 525–545 (2016)
Google Scholar
Nakov, P., et al.: The CLEF-2021 CheckThat! Lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021, Part II. LNCS, vol. 12657, pp. 639–649. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_75
Chapter Google Scholar
Nguyen, V.H., Sugiyama, K., Nakov, P., Kan, M.Y.: FANG: Leveraging social context for fake news detection using graph representation. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 1165–1174 (2020)
Google Scholar
Nikolov, A., Da San Martino, G., Koychev, I., Nakov, P.: Team\_Alex at CheckThat! 2020: identifying check-worthy tweets with transformer models. In: Cappellato et al. [19]
Google Scholar
Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. In: Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, pp. 6086–6093 (2020)
Google Scholar
Pogorelov, K., et al.: FakeNews: Corona virus and 5G conspiracy task at MediaEval 2020. In: Proceedings of the MediaEval workshop, MediaEval 2020 (2020)
Google Scholar
Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Credibility assessment of textual claims on the web. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, pp. 2173–2178 (2016)
Google Scholar
Pritzkau, A.: NLytics at CheckThat! 2021: check-worthiness estimation as a regression problem on transformers. In: Faggioli et al. [33]
Google Scholar
Pritzkau, A.: NLytics at CheckThat! 2021: multi-class fake news detection of news articles and domain identification with RoBERTa - a baseline model. In: Faggioli et al. [33]
Google Scholar
Sardar, A.A.M., Salma, S.A., Islam, M.S., Hasan, M.A., Bhuiyan, T.: Team Sigmoid at CheckThat! 2021: multiclass fake news detection with machine learning. In: Faggioli et al. [33]
Google Scholar
Sepúlveda-Torres, R., Saquete, E.: GPLSI team at CLEF CheckThat! 2021: fine-tuning BETO and RoBERTa. In: Faggioli et al. [33]
Google Scholar
Shaar, S., Alam, F., Martino, G.D.S., Nakov, P.: The role of context in detecting previously fact-checked claims. arXiv preprint arXiv:2104.07423 (2021)
Shaar, S., Babulkov, N., Da San Martino, G., Nakov, P.: That is a known lie: detecting previously fact-checked claims. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 3607–361 (2020)
Google Scholar
Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 2 on detect previously fact-checked claims in tweets and political debates. In: Faggioli et al. [33]
Google Scholar
Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 1 on check-worthiness estimation in tweets and political debates. In: Faggioli et al. [33]
Google Scholar
Shaar, S., et al.: Overview of CheckThat! 2020 English: automatic identification and verification of claims in social media. In: Cappellato et al. [19]
Google Scholar
Shahi, G.K.: AMUSED: an annotation framework of multi-modal social media data. arXiv:2010.00502 (2020)
Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of COVID-19 misinformation on Twitter. Online Soc. Netw. Media 22, 100104 (2021). https://doi.org/10.1016/j.osnem.2020.100104. https://www.sciencedirect.com/science/article/pii/S2468696420300458
Article Google Scholar
Shahi, G.K., Majchrzak, T.A.: Exploring the spread of COVID-19 misinformation on Twitter (2021)
Google Scholar
Shahi, G.K.: A multilingual domain identification using fact-checked articles: a case study on COVID-19 misinformation. arXiv preprint (2021)
Google Scholar
Shahi, G.K., Nandini, D.: FakeCovid – a multilingual cross-domain fact check news dataset for COVID-19. In: Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media (2020)
Google Scholar
Shahi, G.K., Struß, J.M., Mandl, T.: CT-FAN-21 corpus: a dataset for fake news detection, April 2021. https://doi.org/10.5281/zenodo.4714517
Shahi, G.K., Struß, J.M., Mandl, T.: Overview of the CLEF-2021 CheckThat! Lab: task 3 on fake news detection. In: Faggioli et al. [33]
Google Scholar
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. SIGKDD Explor. Newsl. 19(1), 22–36 (2017)
Article Google Scholar
Skuczyńska, B., Shaar, S., Spenader, J., Nakov, P.: BeaSku at CheckThat! 2021: fine-tuning sentence BERT with triplet loss and limited data. In: Faggioli et al. [33]
Google Scholar
Sohan, S., Rajon, H.S., Khusbu, A., Islam, M.S., Hasan, M.A.: Black Ops at CheckThat! 2021: user profiles analyze of intelligent detection on fake tweets notebook in shared task. In: Faggioli et al. [33]
Google Scholar
Tchechmedjiev, A., et al.: ClaimsKG: a knowledge graph of fact-checked claims. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_20
Chapter Google Scholar
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, pp. 809–819 (2018)
Google Scholar
Touahri, I., Mazroui, A.: EvolutionTeam at CheckThat! 2020: integration of linguistic and sentimental features in a fake news detection approach. In: Cappellato et al. [19]
Google Scholar
Tsoplefack, W.K.: Classifier for fake news detection and topical domain of news articles. In: Faggioli et al. [19]
Google Scholar
Utsha, R.S., Keya, M., Hasan, M.A., Islam, M.S.: Qword at CheckThat! 2021: an extreme gradient boosting approach for multiclass fake news detection. In: Faggioli et al. [33]
Google Scholar
Vasileva, S., Atanasova, P., Màrquez, L., Barrón-Cedeño, A., Nakov, P.: It takes nine to smell a rat: neural multi-task learning for check-worthiness prediction. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2019, pp. 1229–1239 (2019)
Google Scholar
Williams, E., Rodrigues, P., Novak, V.: Accenture at CheckThat! 2020: if you say so: post-hoc fact-checking of claims using transformer-based models. In: Cappellato et al. [19]
Google Scholar
Williams, E., Rodrigues, P., Tran, S.: Accenture at CheckThat! 2021: interesting claim identification and ranking with contextually sensitive lexical training data augmentation. In: Faggioli et al. [33]
Google Scholar
Zengin, M.S., Kartal, Y.S., Kutlu, M.: TOBB ETU at CheckThat! 2021: data engineering for detecting check-worthy claims. In: Faggioli et al. [33]
Google Scholar
Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: early detection of rumors in social media from enquiry posts. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 1395–1405 (2015)
Google Scholar
Zhou, X., Wu, B., Fung, P.: Fight for 4230 at CLEF CheckThat! 2021: domain-specific preprocessing and pretrained model for ranking claims by check-worthiness. In: Faggioli et al. [33]
Google Scholar
Zubiaga, A., Liakata, M., Procter, R., Hoi, G.W.S., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 11(3), e0150989 (2016)
Article Google Scholar
Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims using heuristics and supervised learning. In: Cappellato et al. [21]
Google Scholar

Download references

Acknowledgments

The work of Tamer Elsayed and Maram Hasanain is made possible by NPRP grant #NPRP-11S-1204-170060 from the Qatar National Research Fund (a member of Qatar Foundation). The work of Fatima Haouari is supported by GSRA grant #GSRA6-1-0611-19074 from the Qatar National Research Fund. The statements made herein are solely the responsibility of the authors.

This research is also part of the Tanbih mega-project, developed at the Qatar Computing Research Institute, HBKU, which aims to limit the impact of “fake news”, propaganda, and media bias, thus promoting digital literacy and critical thinking.

Author information

Authors and Affiliations

Qatar Computing Research Institute, HBKU, Doha, Qatar
Preslav Nakov, Shaden Shaar & Firoj Alam
University of Padova, Padova, Italy
Giovanni Da San Martino
Qatar University, Doha, Qatar
Tamer Elsayed, Fatima Haouari, Maram Hasanain, Watheq Mansour & Zien Sheikh Ali
DIT, Università di Bologna, Bologna, Italy
Alberto Barrón-Cedeño
Newtral Media Audiovisual, Madrid, Spain
Rubén Míguez
Sofia University, Sofia, Bulgaria
Nikolay Babulkov & Alex Nikolov
University of Duisburg-Essen, Duisburg, Germany
Gautam Kishore Shahi
University of Applied Sciences Potsdam, Potsdam, Germany
Julia Maria Struß
University of Hildesheim, Hildesheim, Germany
Thomas Mandl
TOBB University of Economics and Technology, Ankara, Turkey
Mucahid Kutlu & Yavuz Selim Kartal
Amman, Jordan
Bayan Hamdan

Authors

Preslav Nakov
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Da San Martino
View author publications
You can also search for this author in PubMed Google Scholar
Tamer Elsayed
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Barrón-Cedeño
View author publications
You can also search for this author in PubMed Google Scholar
Rubén Míguez
View author publications
You can also search for this author in PubMed Google Scholar
Shaden Shaar
View author publications
You can also search for this author in PubMed Google Scholar
Firoj Alam
View author publications
You can also search for this author in PubMed Google Scholar
Fatima Haouari
View author publications
You can also search for this author in PubMed Google Scholar
Maram Hasanain
View author publications
You can also search for this author in PubMed Google Scholar
Watheq Mansour
View author publications
You can also search for this author in PubMed Google Scholar
Bayan Hamdan
View author publications
You can also search for this author in PubMed Google Scholar
Zien Sheikh Ali
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Babulkov
View author publications
You can also search for this author in PubMed Google Scholar
Alex Nikolov
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Kishore Shahi
View author publications
You can also search for this author in PubMed Google Scholar
Julia Maria Struß
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Mandl
View author publications
You can also search for this author in PubMed Google Scholar
Mucahid Kutlu
View author publications
You can also search for this author in PubMed Google Scholar
Yavuz Selim Kartal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Firoj Alam .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Appendices

Appendix

ASystems for Task 1

The positions in the task ranking appear after each team name. See Tables 5, 6 and 7 for further details.

Team Accenture [100] used BERT and RoBERTa with data augmentation. They further generated additional synthetic training data using lexical substitution. To find the most probable substitutions, they used BERT-based contextual embedding to create synthetic examples for the positive class. They further added a mean-pooling layer and a dropout layer on top of the model before the final classification layer.

Team Fight for 4230 [103] focused its efforts mostly on two fronts: the creation of a pre-processing module able to properly normalize the tweets and the augmentation of the data by means of machine translation and WordNet-based substitutions. The pre-processing included link removal and punctuation cleaning, as well as quantities and contractions expansion. All hashtags related to COVID-19 were normalized into one and the hashtags were expanded. Their best approach was based on BERTweet with a dropout layer and the above-mentioned pre-processing.

Team GPLSI [77] applied the RoBERTa and the BETO transformers together with different manually engineered features, such as the occurrence of dates and numbers or words from LIWC. A thorough exploration of parameters was made using weighting and bias techniques. They also tried to split the four-way classification into two binary classifications and one three-way classification. They further tried oversampling and undersampling.

Team iCompass used several prepossessing steps, including (i) English word removal, (ii) removing URLs and mentions, and (iii) data normalization, removing tashkeel and the letter madda from texts, as well as duplicates, and replacing some characters to prevent mixing. They proposed a simple ensemble of two BERT-based models, which include AraBERT and Arabic-ALBERT.

Team NLP&IR@UNED [49] used several transformer models, such as BERT, ALBERT, RoBERTa, DistilBERT, and Funnel-Transformer, for the experiments to compare the performance. For English, they obtained better results using BERT trained with tweets. For Spanish, they used Electra.

Team NLytics [74] used RoBERTa with a regression function in the final layer, approaching the problem as a ranking task.

Team QMUL-SDS [1] used the AraBERT preprocessing function to (i) replace URLs, email addressees, and user mentions with standard words, (ii) removed line breaks, HTML markup, repeated characters, and unwanted characters, such as emotion icons, and (iii) handled white spaces between words and digits (non-Arabic, or English), and/or a combination of both, and before and after two brackets, and also (iv) removed unnecessary punctuation. They addressed the task as a ranking problem, and fine-tuned an Arabic transformer (AraBERTv0.2-base) on a combination of the data from this year and the data from the CheckThat! lab 2020 (the CT20-AR dataset).

Team SCUoL [6] used typical pre-processing steps, including cleaning the text, segmentation, and tokenization. Their experiments consists of fine-tuning different AraBERT models, and their final results were obtained using AraBERTv2-base.

Team SU-NLP [22] also used several pre-possessing steps, including (i) removing emojis, hashtags, and (ii) replacing all mentions with a special token (@USER), and all URLs with the respective website’s domain. If the URL is for a tweet, they replaced the URL with TWITTER and the respective user account name. They reported that this URL expansion method improved the performance. Subsequently, they used an ensemble of BERTurk models fine-tuned using different seed values.

Team TOBB ETU [101] investigated different approaches to fine-tune transformer models including data augmentation using machine translation, weak supervision, and cross-lingual training. For their submission, they removed URLs and user mentions from the tweets, and fine-tuned a separate BERT-based models for each language. In particular, they fine-tuned BERTurk^{Footnote 1}, AraBERT, BETO^{Footnote 2}, and the BERT-base model for Turkish, Arabic, Spanish, and English, respectively. For Bulgarian, they fine-tune a RoBERTa model pre-trained with Bulgarian documents.^{Footnote 3}

Team UPV [14] used a multilingual sentence transformer representation (S-BERT) with knowledge distillation, originally intended for question answering. They further introduced an auxiliary language identification task, aside the downstream check-worthiness task.

Systems for Task 2

Team Aschern [25] used TF.IDF, fine-tuned pre-trained S-BERT, and the reranking LambdaMART model.

Team BeaSku [91] used triplet loss training to fine-tune S-BERT. Then, they used the scores predicted by the fine-tuned model along with BM25 scores as features to train a rankSVM re-ranker. They further discussed the impact of applying online mining of triplets. They also experimented with data augmentation.

Team DIPS [60] calculated S-BERT embeddings for all claims, then computed a cosine similarity for each pair of an input claim and a verified claim. The prediction is made by passing a sorted list of cosine similarities to a neural network.

Team NLytics approached the problem as a regression task, and used RoBERTa with a regression function in the final layer.

Systems for Task 3

Team Black Ops [92] performed data pre-processing by removing stop-words and punctuation marks. Then, they experimented with decision trees, random forest, and gradient boosting classifiers for Task 3A, and found the latter to perform best.

Team CIC [8] experimented with logistic regression, multi-layer perceptron, support vector machines, and random forest. Their experiments consisted of using stratified 5-fold cross-validation on the training data. Their best results were obtained using logistic regression for task 3A, and a multi-layer perceptron for task 3B.

Team CIC experimented with a decision tree, a random forest, and a gradient boosting algorithms. They found the latter to perform best.

Team CIVIC-UPM [48] participated in the two subtasks of task 3. They performed pre-processing, using a number of tools: (i) ftfy to repair Unicode and emoji errors, (ii) ekphrasis to perform lower-casing, normalizing percentages, time, dates, emails, phones, and numbers, (iii) contractions for abbreviation expansion, and (iv) NLTK for word tokenization, stop-words removal, punctuation removal and word lemmatization. Then, they combined doc2vec with transformer representations (Electra base, T5 small and T5 base, Longformer base, RoBERTa base and DistilRoBERTa base). They further used additional data from Kaggle’s Ag News task, Kaggle’s KDD2020, and Clickbait news detection competitions. Finally, they experimented with a number of classifiers such as Naïve Bayes, Random Forest, Logistic Regression with L1 and L2 regularization, Elastic Net, and SVMs. The best system for subtask 3A used DistilRoBERTa-base on the text body with oversampling and a sliding window for dealing with long texts. Their best system for task 3B used RoBERTa-base on the title+body text with oversampling but no sliding window.

Team DLRG experimented with a number of traditional approaches like Random Forest, Naïve Bayes and Logistic Regression as well as an online passive-aggressive classifier and different ensembles thereof. The best result was achieved by an ensemble of Naïve Bayes, Logistic Regression, and the Passive Aggressive classifier for task 3A. For task 3B, the Online Passive-Aggressive classifier outperformed all other approaches, including the considered ensembles.

Team GPLSI [77] applied the RoBERTa transformer together with different manually-engineered features, such as the occurrence of dates and numbers or words from LIWC. Both the title and the body were concatenated as a single sequence of words. Rather than going for a single multi-class setting, they used two binary models considering the most frequent classes: false vs. other, and true vs. other, followed by one three-class model.

Team MUCIC [12] used a majority voting ensemble with three BERT variants. They applied BERT, Distilbert, and RoBERTa, and fine-tuned the pre-trained models.

Team NITK_NLP [57] proposed an approach, that included pre-processing and tokenization of the news article, and then experimented with multiple transformer models. The final prediction was made by an ensemble.

Team NKovachevich [55] created lexical features. They extracted the 500 most frequent word stems in the dataset, and calculated the TF.IDF values, which they used in a multinomial Naïve Bayes classifier. A much better performance was achieved with an LSTM model that used GloVe embeddings. A little lower F1 value was achieved using BERT. They further found RoBERTa to perform worse than BERT.

Team NLP&IR@UNED [49] experimented with four transformer architectures and input sizes of 150 and 200 words. In the preliminary tests, the best performance was achieved by ALBERT with 200 words. They also experimented with combining TF.IDF values from the text, all the features provided by the LIWC tool, and the TF.IDF values from the first 20 domain names returned by a query to a search engine. Unlike what was obtained in the dev dataset, in the official competition, the best results were obtained with the approach based on TF.IDF, LIWC, and domain names.

Team NLytics fined-tuned RoBERTa on the dataset for each of the sub-tasks. Since the data is unbalanced, they used under-sampling. They also truncated the documents to 512 words to fit into the RoBERTa input size.

Team NoFake [56] applied BERT without fine-tuning, but used an extensive amount of additional data for training, downloaded from various fact-checking websites.

Team Pathfinder [96] participated in both tasks and used multinomial Naïve Bayes and random forest. The former performed better for both tasks. For task 3A, the they merged the classed false and partially false into one class, which boosted the model performance by 41% (a non-official score mentioned in the paper).

Team Probity addressed the multiclass fake news detection subtask, they used a simple LSTM architecture where they adopted word2vec embeddings to represent the news articles.

Team Qword [97] applied pre-processing techniques, which included stop-word removal, punctuation removal and lemmatization using a Porter stemmer. The TF.IDF values were calculated for the words. For these features, four classification algorithms were applied. The best result was given by Extreme Gradient Boosting.

Team SAUD used an SVM with TF.IDF. They tried Logistic Regression, Multinomial Naïve Bayes, and Random Forest, and found SVM to work best.

Team Sigmoid [76] experimented with different traditional machine learning approaches, with multinomial Naïve Bayes performing best, and one deep learning approach, namely an LSTM with the Adam optimizer. The latter outperformed the more traditional approaches.

Team Spider applies an LSTM, after a pre-processing consisting of stop-word removal and stemming.

Team UAICS [26] experimented with various models including BERT, LSTM, Bi-LSTM, and feature-based models. Their submitted model is a Gradient Boosting with a weighted combination of three feature groups: bi-grams, POS tags, and lexical categories of words.

Team University of Regensburg [41] used different fine-tuned variants of BERT with a linear layer on top and applied different approaches to address the maximum sequence length of BERT. Besides hierarchical transformer representations, they also experimented with different summarization techniques like extractive and abstractive summarization. They performed oversampling to address the class imbalance, as well as extractive (using DistilBERT) and abstractive summarization (using distil-BART-CNN-12-6), before performing classification using fine-tuned BERT with a hierarchical transformer representation.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakov, P. et al. (2021). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_19
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News