research-article

Finding the right facts in the crowd: factoid question answering over social media

Authors:
Jiang Bian

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Yandong Liu

Emory University, Atlanta, GA, USA

Emory University, Atlanta, GA, USA
View Profile

,
Eugene Agichtein

Emory University, Atlanta, GA, USA

Emory University, Atlanta, GA, USA
View Profile

,
Hongyuan Zha

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

WWW '08: Proceedings of the 17th international conference on World Wide WebApril 2008Pages 467–476https://doi.org/10.1145/1367497.1367561

Published:21 April 2008Publication History

WWW '08: Proceedings of the 17th international conference on World Wide Web

Pages 467–476

ABSTRACT

Community Question Answering has emerged as a popular and effective paradigm for a wide range of information needs. For example, to find out an obscure piece of trivia, it is now possible and even very effective to post a question on a popular community QA site such as Yahoo! Answers, and to rely on other users to provide answers, often within minutes. The importance of such community QA sites is magnified as they create archives of millions of questions and hundreds of millions of answers, many of which are invaluable for the information needs of other searchers. However, to make this immense body of knowledge accessible, effective answer retrieval is required. In particular, as any user can contribute an answer to a question, the majority of the content reflects personal, often unsubstantiated opinions. A ranking that combines both relevance and quality is required to make such archives usable for factual information retrieval. This task is challenging, as the structure and the contents of community QA archives differ significantly from the web setting. To address this problem we present a general ranking framework for factual information retrieval from social media. Results of a large scale evaluation demonstrate that our method is highly effective at retrieving well-formed, factual answers to questions, as evaluated on a standard factoid QA benchmark. We also show that our learning framework can be tuned with the minimum of manual labeling. Finally, we provide result analysis to gain deeper understanding of which features are significant for social media search and retrieval. Our system can be used as a crucial building block for combining results from a variety of social media content with general web search results, and to better integrate social media content for effective information access.

References

E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of SIGIR, 2006. Google ScholarDigital Library
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media with an application to community-based question answering. In Proceedings of WSDM, 2008. Google ScholarDigital Library
R. Baeza-Yates and B. Ribeiro-Neto. In Modern Information Retrieval, 1999. Google ScholarDigital Library
A. Berger. Statistical machine learning for information retrieval. In Ph.D. Thesis, School of Computer Science, Carnegie Mellon Univ., 2001. Google ScholarDigital Library
E. Brill, S. Dumais, and M. Banko. An analysis of the askmsr question-answering system. In Proceedings of EMNLP, 2002. Google ScholarDigital Library
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, 2005. Google ScholarDigital Library
R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. In AI Magazine, 1997.Google Scholar
Y. Freund, R. Iyer, R. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In Journal of Machine Learning Research, 2003. Google ScholarDigital Library
J. Friedman. Greedy function approximation: a gradient boosting machine. In Ann. Statist., 2001.Google Scholar
J. Jeon, W. Croft, and J. Lee. Finding similar questions in large question and answer archives. In Proceedings of CIKM, 2005. Google ScholarDigital Library
J. Jeon, W. Croft, J. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of SIGIR, 2006. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of KDD, 2002. Google ScholarDigital Library
T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of SIGIR, 2005. Google ScholarDigital Library
P. Jurczyk and E. Agichtein. Discovering authorities in question answer communities using link analysis. In Proc. of ACM Conference on Information and Knowledge Management (CIKM2007), 2007. Google ScholarDigital Library
D. Kelly and J. Teevan. Implicit feedback for inferring user preference: A bibliography. In SIGIR Forum, 2003. Google ScholarDigital Library
J. Ko, L. Si, and E. Nyberg. A probabilistic framework for answer selection in question answering. In Proc. of NAACL HLT, 2007.Google Scholar
M. Lenz, A. Hubner, and M. Kunze. Question answering with textual cbr. In Proc. of Third International Conference on Flexible Query Answering System, 1998. Google ScholarDigital Library
J. Ponte and W. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, 1998. Google ScholarDigital Library
E. Sneiders. Automated faq answering: Continued experience with shallow language understanding. In Proc. of the 1999 AAAI Fall Symposium on Question Answering System, 1999.Google Scholar
R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In HLT-NAACL 2004: Main Proceedings, 2004.Google Scholar
Q. Su, D. Pavlov, J. Chow, and W. Baker. Internet-scale collection of human-reviewed data. In Proc. of the 16th international conference on World Wide Web (WWW2007), 2007. Google ScholarDigital Library
E. M. Voorhees. Overview of the TREC 2003 question answering track. In Text REtrieval Conference, 2003.Google Scholar
H. Zha, Z. Zheng, H. Fu, and G. Sun. Incorporating query difference for learning retrieval functions in world wide web search. In Proceedings of CIKM, 2006. Google ScholarDigital Library
J. Zhang, M. Ackerman, and L. Adamic. Expertise networks in online communities: Structure and algorithms. In Proc. of International World Wide Web Conference WWW2007, 2007. Google ScholarDigital Library
Z. Zheng, H. Zha, K. Chen, and G. Sun. A regression framework for learning ranking functions using relative relevance judgments. In Proc. of SIGIR, 2007. Google ScholarDigital Library

Index Terms

Finding the right facts in the crowd: factoid question answering over social media
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Information retrieval query processing
  2. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Social QA in non-CQA platforms
Abstract
Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among communities of users. Although users tend to find good quality answers in cQA sites, there is ...
Highlights
- Twitter has relevant information for factoid and non-factoid QA tasks.
- ...
Read More
Selecting the most helpful answers in online health question answering communities
Abstract
The online question answering (QA) community has been popular in recent years. In this paper, we focus on the online health question answering (HQA) community. The HQA community provides a platform for health consumers to inquire about health ...
Read More
An Answer Ranking Method in Medical Social Networks
WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

Patient-oriented medical social networks aim to help their users, with different levels of knowledge, by providing information and support on specific medical and health issues. As one of such pioneering networks, the MedHelp includes forums which allow ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
General Chairs:
Jinpeng Huai
Beihang University, China
,
Robin Chen
AT&T Labs, USA
,
Hsiao-Wuen Hon
Microsoft Research Asia, China
,
Yunhao Liu
HK University of Science and Technology, Hong Kong
,
Program Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Andrew Tomkins
Yahoo! Research, USA
,
Xiaodong Zhang
The Ohio State University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community
question answering
ranking
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 190
  Total Citations
  View Citations
- 2,839
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding the right facts in the crowd: factoid question answering over social media

WWW '08: Proceedings of the 17th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Social QA in non-CQA platforms

Selecting the most helpful answers in online health question answering communities

An Answer Ranking Method in Medical Social Networks