ABSTRACT
Public digital media can often mix factual information with fake scientific news, which is typically difficult to pinpoint, especially for non-professionals. These scientific news articles create illusions and misconceptions, thus ultimately influence the public opinion, with serious consequences at a broader social scale. Yet, existing solutions aiming at automatically verifying the credibility of news articles are still unsatisfactory. We propose to verify scientific news by retrieving and analyzing its most relevant source papers from an academic digital library (DL), e.g., arXiv. Instead of querying keywords or regular named entities extracted from news articles, we query domain knowledge entities (DKEs) extracted from the text. By querying each DKE, we retrieve a list of candidate scholarly papers. We then design a function to rank them and select the most relevant scholarly paper. After exploring various representations, experiments indicate that the term frequency-inverse document frequency (TF-IDF) representation with cosine similarity outperforms baseline models based on word embedding. This result demonstrates the efficacy of using DKEs to retrieve scientific papers which are relevant to a specific news article. It also indicates that word embedding may not be the best document representation for domain specific document retrieval tasks. Our method is fully automated and can be effectively applied to facilitating fake and misinformed news detection across many scientific domains.
- Taruna Agrawal, Rahul Gupta, and Shrikanth Narayanan. 2017. Multimodal detection of fake social media use through a fusion of classification and pairwise ranking systems. In EUSIPCO, 2017. IEEE, 1045--1049.Google ScholarCross Ref
- Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. In Proceedings of the 11th International Workshop on Semantic Evaluation. 546--555.Google ScholarCross Ref
- Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of EMNLP 2018: System Demonstrations. 169--174.Google ScholarCross Ref
- Danqi Chen. 2018. Neural Reading Comprehension and Beyond. Ph.D. Dissertation. Stanford University.Google Scholar
- Niall J Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community. American Society for Information Science, 82.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019. 4171--4186.Google Scholar
- Eliza Harrison, Paige Martin, Didi Surian, and Adam G Dunn. 2019. Recommending research articles to consumers of online vaccination information. arXiv preprint arXiv:1904.11886 (2019).Google Scholar
- Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14--5010Google Scholar
- Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , EMNLP 2004. 404--411.Google Scholar
- Tomas Mikolov,Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous SpaceWord Representations. In Processings of 2013 Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 746--751.Google Scholar
- Bhaskar Mitra, Nick Craswell, et al. 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1--126.Google Scholar
- Rehana Moin, Khalid Mahmood Zahoor-ur Rehman, Mohammad Eid Alzahrani, and Muhammad Qaiser Saleem. 2018. Framework for Rumors Detection in Social Media. Framework 9, 5 (2018).Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of EMNLP 2014, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). 1532--1543.Google Scholar
- Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, and Mandar Mitra. 2018. Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance. In Proceedings of the 27th CIKM. 1835--1838.Google ScholarDigital Library
- Dietram A Scheufele and Nicole M Krause. 2019. Science audiences, misinformation, and fake news. Proceedings of the National Academy of Sciences (2019).Google ScholarCross Ref
- Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22--36.Google ScholarDigital Library
- Weiming Wen, Songwen Su, and Zhou Yu. 2018. Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3487--3496.Google ScholarCross Ref
- Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang, and C Lee Giles. 2017. HESDK: A hybrid approach to extracting scientific domain knowledge entities. In Proceedings of the 17th JCDL. 241--244.Google ScholarCross Ref
Index Terms
- Searching for Evidence of Scientific News in Scholarly Big Data
Recommendations
Russian Scholarly Papers in Open-Access Megajournals
AbstractThe quantity, research topics, and growth rates are assessed for Russian scholarly papers published in open-access megajournals. Russian papers published in PLoS ONE in 2006–2019 are analyzed on the basis of international scientometric indicators. ...
Scholarly publications beyond pay-walls: increased citation advantage for open publishing
First, we aim to determine the total amount of scholarly articles freely available on the internet. Second, we aim to prove whether there exists a citation advantage for open publishing. The total scholarly publication output of Norway is indexed in ...
Scholarly big data: information extraction and data mining
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementCollections of scholarly documents are usually not thought of as big data. However, large collections of scholarly documents often have many millions of publications, authors, citations, equations, figures, etc., and large scale related data and ...
Comments