skip to main content
10.1145/1367497.1367561acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Finding the right facts in the crowd: factoid question answering over social media

Published:21 April 2008Publication History

ABSTRACT

Community Question Answering has emerged as a popular and effective paradigm for a wide range of information needs. For example, to find out an obscure piece of trivia, it is now possible and even very effective to post a question on a popular community QA site such as Yahoo! Answers, and to rely on other users to provide answers, often within minutes. The importance of such community QA sites is magnified as they create archives of millions of questions and hundreds of millions of answers, many of which are invaluable for the information needs of other searchers. However, to make this immense body of knowledge accessible, effective answer retrieval is required. In particular, as any user can contribute an answer to a question, the majority of the content reflects personal, often unsubstantiated opinions. A ranking that combines both relevance and quality is required to make such archives usable for factual information retrieval. This task is challenging, as the structure and the contents of community QA archives differ significantly from the web setting. To address this problem we present a general ranking framework for factual information retrieval from social media. Results of a large scale evaluation demonstrate that our method is highly effective at retrieving well-formed, factual answers to questions, as evaluated on a standard factoid QA benchmark. We also show that our learning framework can be tuned with the minimum of manual labeling. Finally, we provide result analysis to gain deeper understanding of which features are significant for social media search and retrieval. Our system can be used as a crucial building block for combining results from a variety of social media content with general web search results, and to better integrate social media content for effective information access.

References

  1. E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media with an application to community-based question answering. In Proceedings of WSDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates and B. Ribeiro-Neto. In Modern Information Retrieval, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Berger. Statistical machine learning for information retrieval. In Ph.D. Thesis, School of Computer Science, Carnegie Mellon Univ., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Brill, S. Dumais, and M. Banko. An analysis of the askmsr question-answering system. In Proceedings of EMNLP, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the faq finder system. In AI Magazine, 1997.Google ScholarGoogle Scholar
  8. Y. Freund, R. Iyer, R. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In Journal of Machine Learning Research, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Friedman. Greedy function approximation: a gradient boosting machine. In Ann. Statist., 2001.Google ScholarGoogle Scholar
  10. J. Jeon, W. Croft, and J. Lee. Finding similar questions in large question and answer archives. In Proceedings of CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Jeon, W. Croft, J. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of KDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of SIGIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Jurczyk and E. Agichtein. Discovering authorities in question answer communities using link analysis. In Proc. of ACM Conference on Information and Knowledge Management (CIKM2007), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Kelly and J. Teevan. Implicit feedback for inferring user preference: A bibliography. In SIGIR Forum, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Ko, L. Si, and E. Nyberg. A probabilistic framework for answer selection in question answering. In Proc. of NAACL HLT, 2007.Google ScholarGoogle Scholar
  17. M. Lenz, A. Hubner, and M. Kunze. Question answering with textual cbr. In Proc. of Third International Conference on Flexible Query Answering System, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Ponte and W. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Sneiders. Automated faq answering: Continued experience with shallow language understanding. In Proc. of the 1999 AAAI Fall Symposium on Question Answering System, 1999.Google ScholarGoogle Scholar
  20. R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In HLT-NAACL 2004: Main Proceedings, 2004.Google ScholarGoogle Scholar
  21. Q. Su, D. Pavlov, J. Chow, and W. Baker. Internet-scale collection of human-reviewed data. In Proc. of the 16th international conference on World Wide Web (WWW2007), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. M. Voorhees. Overview of the TREC 2003 question answering track. In Text REtrieval Conference, 2003.Google ScholarGoogle Scholar
  23. H. Zha, Z. Zheng, H. Fu, and G. Sun. Incorporating query difference for learning retrieval functions in world wide web search. In Proceedings of CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Zhang, M. Ackerman, and L. Adamic. Expertise networks in online communities: Structure and algorithms. In Proc. of International World Wide Web Conference WWW2007, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Zheng, H. Zha, K. Chen, and G. Sun. A regression framework for learning ranking functions using relative relevance judgments. In Proc. of SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Finding the right facts in the crowd: factoid question answering over social media

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '08: Proceedings of the 17th international conference on World Wide Web
            April 2008
            1326 pages
            ISBN:9781605580852
            DOI:10.1145/1367497

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 April 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

            Upcoming Conference

            WWW '24
            The ACM Web Conference 2024
            May 13 - 17, 2024
            Singapore , Singapore

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader