Propagation2Vec: Embedding partial propagation networks for explainable fake news early detection
Introduction
While the growing popularity of social media has greatly facilitated the exchange of information, it also provides an ideal platform to spread fake news, especially intentional misinformation, which has already and will continue to cause significant damage. For example, it has been estimated that at least 800 people died and 5800 were admitted to hospital as a result of false information related to the COVID-19 pandemic that alcohol-based cleaning products are a cure for the virus.1
Even though many independent fact-checking organisations have emerged globally over recent years,2 the sheer volume of fake news makes it infeasible to rely entirely on human investigation. In addition, what makes the task even more challenging is that fake news needs to be detected at an early stage before it becomes widespread, since it is difficult to correct people’s perception towards an issue once it is formed, even if the previous impression is inaccurate (keersmaecker & Roets, 2017). Therefore, this work focuses on early detection of fake news: verifying the validity of a news item within a certain time limit from when it is published online. Here we use the definition in Zhou and Zafarani (2018) that fake news is intentionally and verifiably false news published by a news outlet—similar definitions have also been used in previous studies on fake news detection (Monti et al., 2019, Ruchansky et al., 2017, Shu, Cui et al., 2019, Shu et al., 2017).
It has been demonstrated that the propagation pattern of news on social media, e.g., tweets and retweets of news on Twitter, can facilitate the detection of fake news (Liu and Wu, 2018, Ma et al., 2017, Shu, Mahudeswaran et al., 2019, Wu and Liu, 2018, Zhou and Zafarani, 2019), since the propagation pattern of fake news exhibits distinctive characteristics. Therefore, this work studies how the propagation patterns of news records can be effectively used to identify the veracity of a news record. Specifically, the propagation pattern of a news record refers to the corresponding tweets and retweets (see Section 3 for more details): as shown in Fig. 1, each propagation pattern can be considered as a tree, which consists of multiple cascades (sequences) and each cascade includes a sequence of tweets/retweets with the corresponding user profiles. Fig. 1 also shows that some nodes and cascades have distinctive characteristics, e.g., nodes corresponding to verified users, or the length of a cascade, which could be useful to identify fake news. To effectively exploit these types of knowledge, there should be a way to jointly emphasise informative nodes and cascades when identifying fake news using propagation patterns.
Another challenge is that the propagation network of a news record could take days or even weeks to complete. Hence, relying on the entire propagation network of a news record to identify its veracity is not ideal for fake news early detection. To address this challenge, models for fake news early detection should be able to recover the news label using a partially available propagation network at an early detection deadline.
Another limitation that hinders the applicability of existing fake news detection models is the lack of explainability. Most propagation network-based fake news detection models adopt deep neural architectures such as Graph Neural Networks (GNN), Graph Recurrent Neural Networks (GRNN), which are known as black-box models. Hence, how to explain the predictions made by a fake news detection model is another important problem that has not been well-addressed in previous work.
Contributions. The contributions of our work are as follows.
- •
We initially conduct an extensive empirical study to highlight the importance of the aforementioned research gaps and our various design decisions.
- •
We propose Propagation2Vec, a novel propagation network-based fake news detection technique that is capable of:
- –
assigning varying attention for the nodes and cascades in propagation patterns using a hierarchical attention mechanism to emphasise informative nodes and cascades for fake news detection;
- –
reconstructing useful knowledge available in complete propagation networks using their early propagation networks, which enables early detection of fake news; and
- –
explaining the underlying logic of the model, which provides useful insights for future research on propagation network-based fake news detection.
- –
- •
We evaluate our approach using two publicly available datasets. Our experimental results show that the proposed framework outperforms state-of-the-art fake news detection models by as much as 5.55% in F1-score, while revealing the fake news labels at an early detection deadline. In addition, we construct general explanations for the underlying logic of our model based on the attention weights assigned for the nodes and cascades in the propagation patterns.
Paper outline. The rest of the paper is structured as follows. In Section 2, we discuss previous work related to Propagation2Vec. Section 3 defines the problem statement. We conduct an empirical study in Section 4 to highlight the importance of our contribution. Section 5 provides the technical details of Propagation2Vec. We evaluate Propagation2Vec in Section 6 and conclude the manuscript in Section 7.
Section snippets
Related work
Detecting fake news on social media has been a popular research problem over recent years (Parikh and Atrey, 2018, Sharma et al., 2019, Shu et al., 2017). In this section, we review the prior work on this topic. Specifically, similar to Pierri and Ceri (2019) and Shu et al. (2017), we classify existing work into three categories: content-based approaches, context-based approaches and mixed approaches, the first two of which, as suggested by their names, mainly rely on news content and social
Problem statement
We define the problem of fake news early detection as follows: let be a set of labelled news records. Each record is represented as a tuple , where (1) is the timestamp when is published online; (2) is the text content of ; (3) is the propagation network of at timestamp (further explained below) ; and (4) is the label: is if is false and otherwise.
Each propagation network is an attributed directed graph , where:
- •
is the
Quantitative analysis of propagation network features
This section initially provides the details about the datasets used in our experiments. Then, we analyse a wide range of node-level features of propagation networks, including temporal-based, text-based and user-based features, to identify their contributions to detect fake news. Note that these features are not well studied in similar previous work (Shu, Mahudeswaran et al., 2019). Moreover, we analyse the importance of the features extracted from complete propagation networks and early
Propagation2Vec
This section provides the technical details of the proposed model Propagation2Vec. Motivated by the findings in Section 4.4, Propagation2Vec embeds propagation networks of news records as low-dimensional vectors such that these embeddings have two main properties that are useful to identify the veracity of news records. First, it is capable of assigning varying importance for the nodes and the information cascades of a propagation network. As found by the empirical study in Section 4.4, the
Experimental verification
In this section, we present our experimental results to demonstrate the performance of the proposed approach for fake news early detection.
Conclusion
In summary, this work proposed Propagation2Vec, a novel propagation-based fake news early detection technique. Propagation2Vec is designed to address two empirically verified research gaps in existing propagation-based detection methods. First, most existing techniques are unable to emphasise the informative nodes and cascades in propagation networks. To address this, we propose a hierarchical attention mechanism to encode propagation networks, which can assign varying levels of importance for
CRediT authorship contribution statement
Amila Silva: Conceptualization, Methodology, Software, Investigation, Validation, Data curation, Writing - original draft. Yi Han: Conceptualization, Data curation, Writing - original draft, Supervision. Ling Luo: Conceptualization, Writing - review & editing, Supervision. Shanika Karunasekera: Conceptualization, Writing - review & editing, Supervision. Christopher Leckie: Conceptualization, Writing - review & editing, Supervision.
Acknowledgements
This research was financially supported by Melbourne Graduate Research Scholarship, Australia and Rowden White Scholarship, Australia .
References (61)
- et al.
Rumor detection on social media with bi-directional graph convolutional networks
(2020) - Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling...
- et al.
Learning phrase representations using RNN encoder-decoder for statistical machine translation
(2014) - et al.
Computational fact checking from knowledge networks
PLOS ONE
(2015) - et al.
Coaid: Covid-19 healthcare misinformation dataset
(2020) - Cui, L., Seo, H., Tabar, M., Ma, F., Wang, S., & Lee, D. (2020). DETERRENT: Knowledge guided graph attention network...
- Guo, H., Cao, J., Zhang, Y., Guo, J., & Li, J. (2018). Rumor detection with hierarchical social attention network. In...
- et al.
Graph neural networks with continual learning for fake news detection from social media
(2020) - et al.
This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news
(2017) - et al.
Robust fake news detection over time and attack
ACM Transactions on Intelligent Systems and Technology (TIST)
(2019)
Novel visual and statistical image features for microblogs news verification
IEEE Transactions on Multimedia
‘Fake news’: Incorrect, but hard to correct. The role of cognitive ability on the impact of false information on social impressions
Intelligence
GCAN: Graph-aware co-attention networks for explainable fake news detection on social media
Fake news detection on social media using geometric deep learning
A review of relational machine learning for knowledge graphs
IEEE
False news on social media: A data-driven survey
SIGMOD Record
Resource description framework (RDF) model and syntax specification
Cited by (66)
Early detection of fake news on virtual social networks: A time-aware approach based on crowd signals
2024, Expert Systems with ApplicationsPredicting and analyzing the popularity of false rumors in Weibo[Formula presented]
2024, Expert Systems with ApplicationsNSEP: Early fake news detection via news semantic environment perception
2024, Information Processing and ManagementExplainable tweet credibility ranker: A comprehensive credibility solution
2023, Computers and Electrical EngineeringA systematic survey on explainable AI applied to fake news detection
2023, Engineering Applications of Artificial IntelligenceMulti-contextual learning in disinformation research: A review of challenges, approaches, and opportunities
2023, Online Social Networks and Media