skip to main content
research-article

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

Published:01 May 2012Publication History
Skip Abstract Section

Abstract

Community Question Answering (CQA) is a popular type of service where users ask questions and where answers are obtained from other users or from historical question-answer pairs. CQA archives contain large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. This article presents several new approaches to exploiting the category information of questions for improving the performance of question retrieval, and it applies these approaches to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are effective and efficient and are capable of outperforming a variety of baseline methods significantly.

References

  1. Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. 2008. Finding high-quality content in social media. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’08). 183--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 192--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bian, J., Liu, Y., Agichtein, E., and Zha, H. 2008. Finding the right facts in the crowd: Factoid question answering over social media. In Proceedings of the International Conference on World Wide Web (WWW’08). 467--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burke, R. D., Hammond, K. J., Kulyukin, V. A., Lytinen, S. L., Tomuro, N., and Schoenberg, S. 1997. Question answering from frequently asked question files: Experiences with the faq finder system. AI Mag. 18, 2, 57--66.Google ScholarGoogle Scholar
  5. Cao, X., Cong, G., Cui, B., Jensen, C. S., and Zhang, C. 2009. The use of categorization information in language models for question retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’09). 265--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cao, X., Cong, G., Cui, B., and Jensen, C. S. 2010. A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of the International Conference on World Wide Web (WWW’10). 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chekuri, C., Goldwasser, M. H., Raghavan, P., and Upfal, E. 1997. Web search using automatic classification. In Proceedings of the International Conference on World Wide Web (WWW’97).Google ScholarGoogle Scholar
  8. Croft, W. B. 1980. A model of cluster searching based on classification. Inf. Syst. 5, 189--195.Google ScholarGoogle ScholarCross RefCross Ref
  9. Diaz, F. 2005. Regularizing ad hoc retrieval scores. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 672--679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Diaz, F. 2007. Regularizing query-based retrieval scores. Inf. Retriev. 10, 6, 531--562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Duan, H., Cao, Y., Lin, C.-Y., and Yu, Y. 2008. Searching questions by identifying question topic and question focus. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’08). 156--164.Google ScholarGoogle Scholar
  12. Dumais, S. and Chen, H. 2000. Hierarchical classification of Web content. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 256--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Erkan, G. and Radev, D. R. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457--479. Google ScholarGoogle ScholarCross RefCross Ref
  14. Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Griffiths, A., Luckhurst, H. C., and Willett, P. 1986. Using interdocument similarity information in document retrieval systems. J. Amer. Soc. Inf. Sci. 37, 1, 3--11.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hearst, M. A. and Pedersen, J. O. 1996. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 76--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jardine, N. and van Rijsbergen, C. 1971. The use of hierarchical clustering in information retrieval. Inf. Stor. Retriev. 7, 217--240.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jeon, J., Croft, W. B., and Lee, J. H. 2005a. Finding semantically similar questions based on their answers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 617--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeon, J., Croft, W. B., and Lee, J. H. 2005b. Finding similar questions in large question and answer archives. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 84--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jijkoun, V. and de Rijke, M. 2005. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 76--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kurland, O. and Lee, L. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kurland, O. and Lee, L. 2009. Clusters, language models, and ad hoc information retrieval. ACM Trans. Inf. Syst. 27, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lam, W., Ruiz, M., and Srinivasan, P. 1999. Automatic text categorization and its application to text retrieval. IEEE Trans. Knowl. Data Engin. 11, 6, 865--879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liu, X. and Croft, W. B. 2004. Cluster-based retrieval using language models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 186--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liu, Y., Bian, J., and Agichtein, E. 2008. Predicting information seeker satisfaction in community question answering. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 483--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ming, Z., Chua, T.-S., and Cong, G. 2010. Exploring domain-specific term weight in archived question search. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 1605--1608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V. O., and Liu, Y. 2007. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the Meeting of the Association for Computational Linguistics (ACL’07). 464--471.Google ScholarGoogle Scholar
  28. Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at trec-3. In Proceedings of the Text Retrieval Conference (TREC’94). 109--126.Google ScholarGoogle Scholar
  29. Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Soricut, R. and Brill, E. 2004. Automatic question answering: Beyond the factoid. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’04). 57--64.Google ScholarGoogle Scholar
  32. van Rijsbergen, C. J. 1979. Information Retrieval. Butterworth. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Voorhees, E. M. 2001. Overview of the trec 2001 question answering track. In Proceedings of the Text Retrieval Conference (TREC’01). 42--51.Google ScholarGoogle ScholarCross RefCross Ref
  34. Wang, K., Ming, Z., and Chua, T.-S. 2009. A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wang, K., Ming, Z., Hu, X., and Chua, T.-S. 2010. Segmentation of multi-sentence questions: Towards effective question retrieval in cqa services. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 387--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wei, W., Cong, G., Li, X., Ng, S.-K., and Li, G. 2011. Integrating community question and answer archives. In Proceedings of the AAAI’11 Conference on Artificial Intelligence. 1255--1260.Google ScholarGoogle Scholar
  37. Xue, X., Jeon, J., and Croft, W. B. 2008. Retrieval models for question and answer archives. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 475--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22, 2, 179--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zobel, J. and Moffat, A. 2006. Inverted files for text search engines. ACM Comput. Surv. 38, 2, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Information Systems
              ACM Transactions on Information Systems  Volume 30, Issue 2
              May 2012
              245 pages
              ISSN:1046-8188
              EISSN:1558-2868
              DOI:10.1145/2180868
              Issue’s Table of Contents

              Copyright © 2012 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 May 2012
              • Accepted: 1 February 2012
              • Revised: 1 October 2011
              • Received: 1 March 2011
              Published in tois Volume 30, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader