Abstract
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F\(_1\) for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively.
B. Hamdan—Independent Researcher.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abumansour, A., Zubiaga, A.: QMUL-SDS at CheckThat! 2021: enriching pre-trained language models for the estimation of check-worthiness of Arabic tweets. In: Faggioli et al. [33]
Agirre, E., et al.: SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016, pp. 497–511 (2016)
Alam, F., et al.: Fighting the COVID-19 infodemic in social media: a holistic perspective and a call to arms. In: Proceedings of the International AAAI Conference on Web and Social Media. ICWSM 2021, vol. 15, pp. 913–922 (2021)
Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. ArXiv preprint 2005.00033 (2020)
Ali, Z.S., Mansour, W., Elsayed, T., Al-Ali, A.: AraFacts: the first large Arabic dataset of naturally occurring claims. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, ANLP 2021, pp. 231–236 (2021)
Althabiti, S., Alsalka, M., Atwell, E.: An AraBERT model for check-worthiness of Arabic tweets. In: Faggioli et al. [33]
Ashik, S.S., Apu, A.R., Marjana, N.J., Hasan, M.A., Islam, M.S.: M82B at CheckThat! 2021: multiclass fake news detection using BiLSTM based RNN model. In: Faggioli et al. [33]
Ashraf, N., Butt, S., Sidorov, G., Gelbukh, A.: Fake news detection using machine learning and data augmentation - CLEF2021. In: Faggioli et al. [33]
Atanasova, P., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims. Task 1: check-worthiness. In: Cappellato et al. [21]
Atanasova, P., Nakov, P., Karadzhov, G., Mohtarami, M., Da San Martino, G.: Overview of the CLEF-2019 CheckThat! Lab on automatic identification and verification of claims. Task 1: check-worthiness. In: Cappellato et al. [20]
Ba, M.L., Berti-Equille, L., Shah, K., Hammady, H.M.: VERA: a platform for veracity estimation over web data. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 159–162 (2016)
Balouchzahi, F., Shashirekha, H., Sidorov, G.: MUCIC at CheckThat! 2021: FaDo-fake news detection and domain identification using transformers ensembling. In: Faggioli et al. [33]
Baly, R., et al.: What was written vs. who read it: news media profiling using text analysis and social media context. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 3364–3374 (2020)
Baris Schlicht, I., Magnossão de Paula, A., Rosso, P.: UPV at CheckThat! 2021: mitigating cultural differences for identifying multilingual check-worthy claims. In: Faggioli et al. [33]
Barrón-Cedeño, A., et al.: Overview of CheckThat! 2020: automatic identification and verification of claims in social media. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 215–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_17
Barrón-Cedeño, A., et al.: Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media. In: Arampatzis, A., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction – 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, 22–25 September 2020, Proceedings. LNCS, vol. 12260, pp. 215–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_17
Barrón-Cedeño, A., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims. Task 2: factuality. In: Cappellato et al. [21]
Bouziane, M., Perrin, H., Cluzeau, A., Mardas, J., Sadeq, A.: Buster.AI at CheckThat! 2020: insights and recommendations to improve fact-checking. In: Cappellato et al. [19]
Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.): CLEF 2020 Working Notes. CEUR Workshop Proceedings. CEUR-WS.org (2020)
Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.): Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2019)
Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.): Working Notes of CLEF 2018-Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2018)
Carik, B., Yeniterzi, R.: SU-NLP at CheckThat! 2021: check-worthiness of Turkish tweets. In: Faggioli et al. [33]
Cazalens, S., Lamarre, P., Leblay, J., Manolescu, I., Tannier, X.: A content management perspective on fact-checking. In: Proceedings of the International Conference on World Wide Web, WWW 2018, pp. 565–574 (2018)
Cheema, G.S., Hakimov, S., Ewerth, R.: Check\_square at CheckThat! 2020: claim detection in social media via fusion of transformer and syntactic features. In: Cappellato et al. [19]
Chernyavskiy, A., Ilvovsky, D., Nakov, P.: Aschern at CLEF CheckThat! 2021: lambda-calculus of fact-checked claims. In: Faggioli et al. [33]
Cusmuliuc, C.G., Amarandei, M.A., Pelin, I., Cociorva, V.I., Iftene, A.: UAICS at CheckThat! 2021: fake news detection. In: Faggioli et al. [33]
Da San Martino, G., Barrón-Cedeno, A., Wachsmuth, H., Petrov, R., Nakov, P.: SemEval-2020 task 11: detection of propaganda techniques in news articles. In: Proceedings of the 14th Workshop on Semantic Evaluation, SemEval 2020, pp. 1377–1414 (2020)
Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Wong Sak Hoi, G., Zubiaga, A.: SemEval-2017 task 8: RumourEval: determining rumour veracity and support for rumours. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval 2017, pp. 69–76 (2017)
Dimitrov, D., et al.: SemEval-2021 task 6: detection of persuasion techniques in texts and images. In: Proceedings of the International Workshop on Semantic Evaluation, SemEval 2021 (2021)
Dumani, L., Neumann, P.J., Schenkel, R.: A framework for argument retrieval - ranking argument clusters by frequency and specificity. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 431–445. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_29
Elsayed, T., et al.: CheckThat! at CLEF 2019: automatic identification and verification of claims. In: Advances in Information Retrieval, pp. 309–315 (2019)
Elsayed, T., et al.: Overview of the CLEF-2019 CheckThat! lab: automatic identification and verification of claims. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 301–321. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_25
Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.): CLEF 2021 Working Notes. Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum. CEUR-WS.org (2021)
Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware approach for detecting worth-checking claims in political debates. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 267–276 (2017)
Ghanem, B., Glavaš, G., Giachanou, A., Ponzetto, S., Rosso, P., Rangel, F.: UPV-UMA at CheckThat! lab: verifying Arabic claims using cross lingual approach. In: Cappellato et al. [20]
Gorrell, G., et al: SemEval-2019 task 7: RumourEval, determining rumour veracity and support for rumours. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval 2019, pp. 845–854 (2019)
Gupta, A., Kumaraguru, P., Castillo, C., Meier, P.: TweetCred: real-time credibility assessment of content on Twitter. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 228–243. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_16
Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pp. 1859–1874 (2018)
Hansen, C., Hansen, C., Simonsen, J., Lioma, C.: The Copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 fact checking lab. In: Cappellato et al. [21]
Hansen, C., Hansen, C., Simonsen, J., Lioma, C.: Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In: Cappellato et al. [20]
Hartl, P., Kruschwitz, U.: University of Regensburg at CheckThat! 2021: exploring text summarization for fake newsdetection. In: Faggioli et al. [33]
Hasanain, M., Elsayed, T.: bigIR at CheckThat! 2020: multilingual BERT for ranking Arabic tweets by check-worthiness. In: Cappellato et al. [19]
Hasanain, M., et al.: Overview of CheckThat! 2020 Arabic: automatic identification and verification of claims in social media. In: Cappellato et al. [19]
Hasanain, M., Suwaileh, R., Elsayed, T., Barrón-Cedeño, A., Nakov, P.: Overview of the CLEF-2019 CheckThat! Lab on automatic identification and verification of claims. Task 2: evidence and factuality. In: Cappellato et al. [20]
Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in presidential debates. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1835–1838 (2015)
Hassan, N., Tremayne, M., Arslan, F., Li, C.: Comparing automated factual claim detection against judgments of journalism organizations. In: Computation Journalism Symposium, pp. 1–5 (2016)
Hassan, N., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10(12), 1945–1948 (2017)
Álvaro Huertas-Garcıia, Huertas-Tato, J., Martín, A., Camacho, D.: CIVIC-UPM at CheckThat! 2021: integration of transformers in misinformation detection and topic classification. In: Faggioli et al. [33]
Juan R. Martinez-Rico, J.M.R., Araujo, L.: NLP&IR@UNED at CheckThat! 2021: check-worthiness estimation and fake news detection using transformer models. In: Faggioli et al. [33]
Kannan, R., R, R.: DLRG@CLEF2021: an ensemble approach for fake detection on news articles. In: Faggioli et al. [33]
Karadzhov, G., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: Fully automated fact checking using external sources. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 344–353 (2017)
Kartal, Y.S., Kutlu, M.: TOBB ETU at CheckThat! 2020: prioritizing English and Arabic claims based on check-worthiness. In: Cappellato et al. [19]
Kartal, Y.S., Kutlu, M.: TrClaim-19: the first collection for Turkish check-worthy claim detection with annotator rationales. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 386–395 (2020)
Kazemi, A., Garimella, K., Shahi, G.K., Gaffney, D., Hale, S.A.: Tiplines to combat misinformation on encrypted platforms: a case study of the 2019 Indian election on WhatsApp. arXiv:2106.04726 (2021)
Kovachevich, N.: BERT fine-tuning approach to CLEF CheckThat! Fake news detection. In: Faggioli et al. [33]
Kumari, S.: NoFake at CheckThat! 2021: fake news detection using BERT. arXiv:2108.05419 (2021)
3 L, H.R., M, A.: NITK\_NLP at CLEF CheckThat! 2021: ensemble transformer model for fake news classification. In: Faggioli et al. [33]
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 3818–3824 (2016)
Martinez-Rico, J., Araujo, L., Martinez-Romo, J.: NLP&IR@UNED at CheckThat! 2020: a preliminary approach for check-worthiness and claim retrieval tasks using neural networks and graphs. In: Cappellato et al. [19]
Mihaylova, S., Borisova, I., Chemishanov, D., Hadzhitsanev, P., Hardalov, M., Nakov, P.: DIPS at CheckThat! 2021: verified claim retrieval. In: Faggioli et al. [33]
Mihaylova, T., Karadzhov, G., Atanasova, P., Baly, R., Mohtarami, M., Nakov, P.: SemEval-2019 task 8: fact checking in community question answering forums. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval 2019, pp. 860–869 (2019)
Mitra, T., Gilbert, E.: CREDBANK: a large-scale social media corpus with associated credibility annotations. In: Proceedings of the Ninth International AAAI Conference on Web and Social Media, ICWSM 2015, pp. 258–267 (2015)
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2016, pp. 31–41 (2016)
Mukherjee, S., Weikum, G.: Leveraging joint interactions for credibility analysis in news communities. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, pp. 353–362 (2015)
Nakov, P., et al.: Overview of the CLEF-2018 lab on automatic identification and verification of claims in political debates. In: Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum. CLEF 2018 (2018)
Nakov, P., et al.: Automated fact-checking for assisting human fact-checkers. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 (2021)
Nakov, P., et al.: SemEval-2016 Task 3: community question answering. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval 2015, pp. 525–545 (2016)
Nakov, P., et al.: The CLEF-2021 CheckThat! Lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021, Part II. LNCS, vol. 12657, pp. 639–649. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_75
Nguyen, V.H., Sugiyama, K., Nakov, P., Kan, M.Y.: FANG: Leveraging social context for fake news detection using graph representation. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 1165–1174 (2020)
Nikolov, A., Da San Martino, G., Koychev, I., Nakov, P.: Team\_Alex at CheckThat! 2020: identifying check-worthy tweets with transformer models. In: Cappellato et al. [19]
Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. In: Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, pp. 6086–6093 (2020)
Pogorelov, K., et al.: FakeNews: Corona virus and 5G conspiracy task at MediaEval 2020. In: Proceedings of the MediaEval workshop, MediaEval 2020 (2020)
Popat, K., Mukherjee, S., Strötgen, J., Weikum, G.: Credibility assessment of textual claims on the web. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, pp. 2173–2178 (2016)
Pritzkau, A.: NLytics at CheckThat! 2021: check-worthiness estimation as a regression problem on transformers. In: Faggioli et al. [33]
Pritzkau, A.: NLytics at CheckThat! 2021: multi-class fake news detection of news articles and domain identification with RoBERTa - a baseline model. In: Faggioli et al. [33]
Sardar, A.A.M., Salma, S.A., Islam, M.S., Hasan, M.A., Bhuiyan, T.: Team Sigmoid at CheckThat! 2021: multiclass fake news detection with machine learning. In: Faggioli et al. [33]
Sepúlveda-Torres, R., Saquete, E.: GPLSI team at CLEF CheckThat! 2021: fine-tuning BETO and RoBERTa. In: Faggioli et al. [33]
Shaar, S., Alam, F., Martino, G.D.S., Nakov, P.: The role of context in detecting previously fact-checked claims. arXiv preprint arXiv:2104.07423 (2021)
Shaar, S., Babulkov, N., Da San Martino, G., Nakov, P.: That is a known lie: detecting previously fact-checked claims. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, pp. 3607–361 (2020)
Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 2 on detect previously fact-checked claims in tweets and political debates. In: Faggioli et al. [33]
Shaar, S., et al.: Overview of the CLEF-2021 CheckThat! Lab task 1 on check-worthiness estimation in tweets and political debates. In: Faggioli et al. [33]
Shaar, S., et al.: Overview of CheckThat! 2020 English: automatic identification and verification of claims in social media. In: Cappellato et al. [19]
Shahi, G.K.: AMUSED: an annotation framework of multi-modal social media data. arXiv:2010.00502 (2020)
Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of COVID-19 misinformation on Twitter. Online Soc. Netw. Media 22, 100104 (2021). https://doi.org/10.1016/j.osnem.2020.100104. https://www.sciencedirect.com/science/article/pii/S2468696420300458
Shahi, G.K., Majchrzak, T.A.: Exploring the spread of COVID-19 misinformation on Twitter (2021)
Shahi, G.K.: A multilingual domain identification using fact-checked articles: a case study on COVID-19 misinformation. arXiv preprint (2021)
Shahi, G.K., Nandini, D.: FakeCovid – a multilingual cross-domain fact check news dataset for COVID-19. In: Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media (2020)
Shahi, G.K., Struß, J.M., Mandl, T.: CT-FAN-21 corpus: a dataset for fake news detection, April 2021. https://doi.org/10.5281/zenodo.4714517
Shahi, G.K., Struß, J.M., Mandl, T.: Overview of the CLEF-2021 CheckThat! Lab: task 3 on fake news detection. In: Faggioli et al. [33]
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. SIGKDD Explor. Newsl. 19(1), 22–36 (2017)
Skuczyńska, B., Shaar, S., Spenader, J., Nakov, P.: BeaSku at CheckThat! 2021: fine-tuning sentence BERT with triplet loss and limited data. In: Faggioli et al. [33]
Sohan, S., Rajon, H.S., Khusbu, A., Islam, M.S., Hasan, M.A.: Black Ops at CheckThat! 2021: user profiles analyze of intelligent detection on fake tweets notebook in shared task. In: Faggioli et al. [33]
Tchechmedjiev, A., et al.: ClaimsKG: a knowledge graph of fact-checked claims. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 309–324. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_20
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and VERification. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, pp. 809–819 (2018)
Touahri, I., Mazroui, A.: EvolutionTeam at CheckThat! 2020: integration of linguistic and sentimental features in a fake news detection approach. In: Cappellato et al. [19]
Tsoplefack, W.K.: Classifier for fake news detection and topical domain of news articles. In: Faggioli et al. [19]
Utsha, R.S., Keya, M., Hasan, M.A., Islam, M.S.: Qword at CheckThat! 2021: an extreme gradient boosting approach for multiclass fake news detection. In: Faggioli et al. [33]
Vasileva, S., Atanasova, P., Màrquez, L., Barrón-Cedeño, A., Nakov, P.: It takes nine to smell a rat: neural multi-task learning for check-worthiness prediction. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2019, pp. 1229–1239 (2019)
Williams, E., Rodrigues, P., Novak, V.: Accenture at CheckThat! 2020: if you say so: post-hoc fact-checking of claims using transformer-based models. In: Cappellato et al. [19]
Williams, E., Rodrigues, P., Tran, S.: Accenture at CheckThat! 2021: interesting claim identification and ranking with contextually sensitive lexical training data augmentation. In: Faggioli et al. [33]
Zengin, M.S., Kartal, Y.S., Kutlu, M.: TOBB ETU at CheckThat! 2021: data engineering for detecting check-worthy claims. In: Faggioli et al. [33]
Zhao, Z., Resnick, P., Mei, Q.: Enquiring minds: early detection of rumors in social media from enquiry posts. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 1395–1405 (2015)
Zhou, X., Wu, B., Fung, P.: Fight for 4230 at CLEF CheckThat! 2021: domain-specific preprocessing and pretrained model for ranking claims by check-worthiness. In: Faggioli et al. [33]
Zubiaga, A., Liakata, M., Procter, R., Hoi, G.W.S., Tolmie, P.: Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 11(3), e0150989 (2016)
Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims using heuristics and supervised learning. In: Cappellato et al. [21]
Acknowledgments
The work of Tamer Elsayed and Maram Hasanain is made possible by NPRP grant #NPRP-11S-1204-170060 from the Qatar National Research Fund (a member of Qatar Foundation). The work of Fatima Haouari is supported by GSRA grant #GSRA6-1-0611-19074 from the Qatar National Research Fund. The statements made herein are solely the responsibility of the authors.
This research is also part of the Tanbih mega-project, developed at the Qatar Computing Research Institute, HBKU, which aims to limit the impact of “fake news”, propaganda, and media bias, thus promoting digital literacy and critical thinking.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
ASystems for Task 1
The positions in the task ranking appear after each team name. See Tables 5, 6 and 7 for further details.
Team Accenture [100] used BERT and RoBERTa with data augmentation. They further generated additional synthetic training data using lexical substitution. To find the most probable substitutions, they used BERT-based contextual embedding to create synthetic examples for the positive class. They further added a mean-pooling layer and a dropout layer on top of the model before the final classification layer.
Team Fight for 4230 [103] focused its efforts mostly on two fronts: the creation of a pre-processing module able to properly normalize the tweets and the augmentation of the data by means of machine translation and WordNet-based substitutions. The pre-processing included link removal and punctuation cleaning, as well as quantities and contractions expansion. All hashtags related to COVID-19 were normalized into one and the hashtags were expanded. Their best approach was based on BERTweet with a dropout layer and the above-mentioned pre-processing.
Team GPLSI [77] applied the RoBERTa and the BETO transformers together with different manually engineered features, such as the occurrence of dates and numbers or words from LIWC. A thorough exploration of parameters was made using weighting and bias techniques. They also tried to split the four-way classification into two binary classifications and one three-way classification. They further tried oversampling and undersampling.
Team iCompass used several prepossessing steps, including (i) English word removal, (ii) removing URLs and mentions, and (iii) data normalization, removing tashkeel and the letter madda from texts, as well as duplicates, and replacing some characters to prevent mixing. They proposed a simple ensemble of two BERT-based models, which include AraBERT and Arabic-ALBERT.
Team NLP&IR@UNED [49] used several transformer models, such as BERT, ALBERT, RoBERTa, DistilBERT, and Funnel-Transformer, for the experiments to compare the performance. For English, they obtained better results using BERT trained with tweets. For Spanish, they used Electra.
Team NLytics [74] used RoBERTa with a regression function in the final layer, approaching the problem as a ranking task.
Team QMUL-SDS [1] used the AraBERT preprocessing function to (i) replace URLs, email addressees, and user mentions with standard words, (ii) removed line breaks, HTML markup, repeated characters, and unwanted characters, such as emotion icons, and (iii) handled white spaces between words and digits (non-Arabic, or English), and/or a combination of both, and before and after two brackets, and also (iv) removed unnecessary punctuation. They addressed the task as a ranking problem, and fine-tuned an Arabic transformer (AraBERTv0.2-base) on a combination of the data from this year and the data from the CheckThat! lab 2020 (the CT20-AR dataset).
Team SCUoL [6] used typical pre-processing steps, including cleaning the text, segmentation, and tokenization. Their experiments consists of fine-tuning different AraBERT models, and their final results were obtained using AraBERTv2-base.
Team SU-NLP [22] also used several pre-possessing steps, including (i) removing emojis, hashtags, and (ii) replacing all mentions with a special token (@USER), and all URLs with the respective website’s domain. If the URL is for a tweet, they replaced the URL with TWITTER and the respective user account name. They reported that this URL expansion method improved the performance. Subsequently, they used an ensemble of BERTurk models fine-tuned using different seed values.
Team TOBB ETU [101] investigated different approaches to fine-tune transformer models including data augmentation using machine translation, weak supervision, and cross-lingual training. For their submission, they removed URLs and user mentions from the tweets, and fine-tuned a separate BERT-based models for each language. In particular, they fine-tuned BERTurkFootnote 1, AraBERT, BETOFootnote 2, and the BERT-base model for Turkish, Arabic, Spanish, and English, respectively. For Bulgarian, they fine-tune a RoBERTa model pre-trained with Bulgarian documents.Footnote 3
Team UPV [14] used a multilingual sentence transformer representation (S-BERT) with knowledge distillation, originally intended for question answering. They further introduced an auxiliary language identification task, aside the downstream check-worthiness task.
Systems for Task 2
Team Aschern [25] used TF.IDF, fine-tuned pre-trained S-BERT, and the reranking LambdaMART model.
Team BeaSku [91] used triplet loss training to fine-tune S-BERT. Then, they used the scores predicted by the fine-tuned model along with BM25 scores as features to train a rankSVM re-ranker. They further discussed the impact of applying online mining of triplets. They also experimented with data augmentation.
Team DIPS [60] calculated S-BERT embeddings for all claims, then computed a cosine similarity for each pair of an input claim and a verified claim. The prediction is made by passing a sorted list of cosine similarities to a neural network.
Team NLytics approached the problem as a regression task, and used RoBERTa with a regression function in the final layer.
Systems for Task 3
Team Black Ops [92] performed data pre-processing by removing stop-words and punctuation marks. Then, they experimented with decision trees, random forest, and gradient boosting classifiers for Task 3A, and found the latter to perform best.
Team CIC [8] experimented with logistic regression, multi-layer perceptron, support vector machines, and random forest. Their experiments consisted of using stratified 5-fold cross-validation on the training data. Their best results were obtained using logistic regression for task 3A, and a multi-layer perceptron for task 3B.
Team CIC experimented with a decision tree, a random forest, and a gradient boosting algorithms. They found the latter to perform best.
Team CIVIC-UPM [48] participated in the two subtasks of task 3. They performed pre-processing, using a number of tools: (i) ftfy to repair Unicode and emoji errors, (ii) ekphrasis to perform lower-casing, normalizing percentages, time, dates, emails, phones, and numbers, (iii) contractions for abbreviation expansion, and (iv) NLTK for word tokenization, stop-words removal, punctuation removal and word lemmatization. Then, they combined doc2vec with transformer representations (Electra base, T5 small and T5 base, Longformer base, RoBERTa base and DistilRoBERTa base). They further used additional data from Kaggle’s Ag News task, Kaggle’s KDD2020, and Clickbait news detection competitions. Finally, they experimented with a number of classifiers such as Naïve Bayes, Random Forest, Logistic Regression with L1 and L2 regularization, Elastic Net, and SVMs. The best system for subtask 3A used DistilRoBERTa-base on the text body with oversampling and a sliding window for dealing with long texts. Their best system for task 3B used RoBERTa-base on the title+body text with oversampling but no sliding window.
Team DLRG experimented with a number of traditional approaches like Random Forest, Naïve Bayes and Logistic Regression as well as an online passive-aggressive classifier and different ensembles thereof. The best result was achieved by an ensemble of Naïve Bayes, Logistic Regression, and the Passive Aggressive classifier for task 3A. For task 3B, the Online Passive-Aggressive classifier outperformed all other approaches, including the considered ensembles.
Team GPLSI [77] applied the RoBERTa transformer together with different manually-engineered features, such as the occurrence of dates and numbers or words from LIWC. Both the title and the body were concatenated as a single sequence of words. Rather than going for a single multi-class setting, they used two binary models considering the most frequent classes: false vs. other, and true vs. other, followed by one three-class model.
Team MUCIC [12] used a majority voting ensemble with three BERT variants. They applied BERT, Distilbert, and RoBERTa, and fine-tuned the pre-trained models.
Team NITK_NLP [57] proposed an approach, that included pre-processing and tokenization of the news article, and then experimented with multiple transformer models. The final prediction was made by an ensemble.
Team NKovachevich [55] created lexical features. They extracted the 500 most frequent word stems in the dataset, and calculated the TF.IDF values, which they used in a multinomial Naïve Bayes classifier. A much better performance was achieved with an LSTM model that used GloVe embeddings. A little lower F1 value was achieved using BERT. They further found RoBERTa to perform worse than BERT.
Team NLP&IR@UNED [49] experimented with four transformer architectures and input sizes of 150 and 200 words. In the preliminary tests, the best performance was achieved by ALBERT with 200 words. They also experimented with combining TF.IDF values from the text, all the features provided by the LIWC tool, and the TF.IDF values from the first 20 domain names returned by a query to a search engine. Unlike what was obtained in the dev dataset, in the official competition, the best results were obtained with the approach based on TF.IDF, LIWC, and domain names.
Team NLytics fined-tuned RoBERTa on the dataset for each of the sub-tasks. Since the data is unbalanced, they used under-sampling. They also truncated the documents to 512 words to fit into the RoBERTa input size.
Team NoFake [56] applied BERT without fine-tuning, but used an extensive amount of additional data for training, downloaded from various fact-checking websites.
Team Pathfinder [96] participated in both tasks and used multinomial Naïve Bayes and random forest. The former performed better for both tasks. For task 3A, the they merged the classed false and partially false into one class, which boosted the model performance by 41% (a non-official score mentioned in the paper).
Team Probity addressed the multiclass fake news detection subtask, they used a simple LSTM architecture where they adopted word2vec embeddings to represent the news articles.
Team Qword [97] applied pre-processing techniques, which included stop-word removal, punctuation removal and lemmatization using a Porter stemmer. The TF.IDF values were calculated for the words. For these features, four classification algorithms were applied. The best result was given by Extreme Gradient Boosting.
Team SAUD used an SVM with TF.IDF. They tried Logistic Regression, Multinomial Naïve Bayes, and Random Forest, and found SVM to work best.
Team Sigmoid [76] experimented with different traditional machine learning approaches, with multinomial Naïve Bayes performing best, and one deep learning approach, namely an LSTM with the Adam optimizer. The latter outperformed the more traditional approaches.
Team Spider applies an LSTM, after a pre-processing consisting of stop-word removal and stemming.
Team UAICS [26] experimented with various models including BERT, LSTM, Bi-LSTM, and feature-based models. Their submitted model is a Gradient Boosting with a weighted combination of three feature groups: bi-grams, POS tags, and lexical categories of words.
Team University of Regensburg [41] used different fine-tuned variants of BERT with a linear layer on top and applied different approaches to address the maximum sequence length of BERT. Besides hierarchical transformer representations, they also experimented with different summarization techniques like extractive and abstractive summarization. They performed oversampling to address the class imbalance, as well as extractive (using DistilBERT) and abstractive summarization (using distil-BART-CNN-12-6), before performing classification using fine-tuned BERT with a hierarchical transformer representation.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Nakov, P. et al. (2021). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-85251-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)