Searching for Evidence of Scientific News in Scholarly Big Data

Authors:
Md Reshad Ul Hoque

Old Dominion University, Norfolk, VA, USA

Old Dominion University, Norfolk, VA, USA
View Profile

,
Dash Bradley

Old Dominion University, Norfolk, VA, USA

Old Dominion University, Norfolk, VA, USA
View Profile

,
Chiman Kwan

Applied Research LLC, Rockville, MD, USA

Applied Research LLC, Rockville, MD, USA
View Profile

,
Agnese Chiatti

The Open University, Milton Keynes, United Kingdom

The Open University, Milton Keynes, United Kingdom
View Profile

,
Jiang Li

Old Dominion University, Norfolk, VA, USA

Old Dominion University, Norfolk, VA, USA
View Profile

,
Jian Wu

Old Dominion University, Norfolk, VA, USA

Old Dominion University, Norfolk, VA, USA
View Profile

K-CAP '19: Proceedings of the 10th International Conference on Knowledge CaptureSeptember 2019Pages 251–254https://doi.org/10.1145/3360901.3364438

Published:23 September 2019Publication History

K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture

Pages 251–254

ABSTRACT

Public digital media can often mix factual information with fake scientific news, which is typically difficult to pinpoint, especially for non-professionals. These scientific news articles create illusions and misconceptions, thus ultimately influence the public opinion, with serious consequences at a broader social scale. Yet, existing solutions aiming at automatically verifying the credibility of news articles are still unsatisfactory. We propose to verify scientific news by retrieving and analyzing its most relevant source papers from an academic digital library (DL), e.g., arXiv. Instead of querying keywords or regular named entities extracted from news articles, we query domain knowledge entities (DKEs) extracted from the text. By querying each DKE, we retrieve a list of candidate scholarly papers. We then design a function to rank them and select the most relevant scholarly paper. After exploring various representations, experiments indicate that the term frequency-inverse document frequency (TF-IDF) representation with cosine similarity outperforms baseline models based on word embedding. This result demonstrates the efficacy of using DKEs to retrieve scientific papers which are relevant to a specific news article. It also indicates that word embedding may not be the best document representation for domain specific document retrieval tasks. Our method is fully automated and can be effectively applied to facilitating fake and misinformed news detection across many scientific domains.

References

Taruna Agrawal, Rahul Gupta, and Shrikanth Narayanan. 2017. Multimodal detection of fake social media use through a fusion of classification and pairwise ranking systems. In EUSIPCO, 2017. IEEE, 1045--1049.Google ScholarCross Ref
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications. In Proceedings of the 11th International Workshop on Semantic Evaluation. 546--555.Google ScholarCross Ref
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of EMNLP 2018: System Demonstrations. 169--174.Google ScholarCross Ref
Danqi Chen. 2018. Neural Reading Comprehension and Beyond. Ph.D. Dissertation. Stanford University.Google Scholar
Niall J Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community. American Society for Information Science, 82.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019. 4171--4186.Google Scholar
Eliza Harrison, Paige Martin, Didi Surian, and Adam G Dunn. 2019. Recommending research articles to consumers of online vaccination information. arXiv preprint arXiv:1904.11886 (2019).Google Scholar
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14--5010Google Scholar
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , EMNLP 2004. 404--411.Google Scholar
Tomas Mikolov,Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous SpaceWord Representations. In Processings of 2013 Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 746--751.Google Scholar
Bhaskar Mitra, Nick Craswell, et al. 2018. An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1--126.Google Scholar
Rehana Moin, Khalid Mahmood Zahoor-ur Rehman, Mohammad Eid Alzahrani, and Muhammad Qaiser Saleem. 2018. Framework for Rumors Detection in Social Media. Framework 9, 5 (2018).Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of EMNLP 2014, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). 1532--1543.Google Scholar
Dwaipayan Roy, Debasis Ganguly, Sumit Bhatia, Srikanta Bedathur, and Mandar Mitra. 2018. Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance. In Proceedings of the 27th CIKM. 1835--1838.Google ScholarDigital Library
Dietram A Scheufele and Nicole M Krause. 2019. Science audiences, misinformation, and fake news. Proceedings of the National Academy of Sciences (2019).Google ScholarCross Ref
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22--36.Google ScholarDigital Library
Weiming Wen, Songwen Su, and Zhou Yu. 2018. Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3487--3496.Google ScholarCross Ref
Jian Wu, Sagnik Ray Choudhury, Agnese Chiatti, Chen Liang, and C Lee Giles. 2017. HESDK: A hybrid approach to extracting scientific domain knowledge entities. In Proceedings of the 17th JCDL. 241--244.Google ScholarCross Ref

Index Terms

Searching for Evidence of Scientific News in Scholarly Big Data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
  2. World Wide Web

Recommendations

Russian Scholarly Papers in Open-Access Megajournals
Abstract
The quantity, research topics, and growth rates are assessed for Russian scholarly papers published in open-access megajournals. Russian papers published in PLoS ONE in 2006–2019 are analyzed on the basis of international scientometric indicators. ...
Read More
Scholarly publications beyond pay-walls: increased citation advantage for open publishing

First, we aim to determine the total amount of scholarly articles freely available on the internet. Second, we aim to prove whether there exists a citation advantage for open publishing. The total scholarly publication output of Norway is indexed in ...
Read More
Scholarly big data: information extraction and data mining
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Collections of scholarly documents are usually not thought of as big data. However, large collections of scholarly documents often have many millions of publications, authors, citations, equations, figures, etc., and large scale related data and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture
September 2019
281 pages
ISBN:9781450370080
DOI:10.1145/3360901
General Chairs:
Mayank Kejriwal
University of Southern California Information Sciences Institute, USA
,
Pedro Szekely
University of Southern California Information Sciences Institute, USA
,
Program Chair:
Raphaël Troncy
EURECOM, France
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 September 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
domain knowledge entity
embedding
fake news
web api
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate55of198submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)123
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Searching for Evidence of Scientific News in Scholarly Big Data

K-CAP '19: Proceedings of the 10th International Conference on Knowledge Capture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Russian Scholarly Papers in Open-Access Megajournals

Scholarly publications beyond pay-walls: increased citation advantage for open publishing

Scholarly big data: information extraction and data mining