Abstract
Community Question Answering (CQA) is a popular type of service where users ask questions and where answers are obtained from other users or from historical question-answer pairs. CQA archives contain large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. This article presents several new approaches to exploiting the category information of questions for improving the performance of question retrieval, and it applies these approaches to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are effective and efficient and are capable of outperforming a variety of baseline methods significantly.
- Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. 2008. Finding high-quality content in social media. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’08). 183--194. Google ScholarDigital Library
- Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 192--199. Google ScholarDigital Library
- Bian, J., Liu, Y., Agichtein, E., and Zha, H. 2008. Finding the right facts in the crowd: Factoid question answering over social media. In Proceedings of the International Conference on World Wide Web (WWW’08). 467--476. Google ScholarDigital Library
- Burke, R. D., Hammond, K. J., Kulyukin, V. A., Lytinen, S. L., Tomuro, N., and Schoenberg, S. 1997. Question answering from frequently asked question files: Experiences with the faq finder system. AI Mag. 18, 2, 57--66.Google Scholar
- Cao, X., Cong, G., Cui, B., Jensen, C. S., and Zhang, C. 2009. The use of categorization information in language models for question retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’09). 265--274. Google ScholarDigital Library
- Cao, X., Cong, G., Cui, B., and Jensen, C. S. 2010. A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of the International Conference on World Wide Web (WWW’10). 201--210. Google ScholarDigital Library
- Chekuri, C., Goldwasser, M. H., Raghavan, P., and Upfal, E. 1997. Web search using automatic classification. In Proceedings of the International Conference on World Wide Web (WWW’97).Google Scholar
- Croft, W. B. 1980. A model of cluster searching based on classification. Inf. Syst. 5, 189--195.Google ScholarCross Ref
- Diaz, F. 2005. Regularizing ad hoc retrieval scores. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 672--679. Google ScholarDigital Library
- Diaz, F. 2007. Regularizing query-based retrieval scores. Inf. Retriev. 10, 6, 531--562. Google ScholarDigital Library
- Duan, H., Cao, Y., Lin, C.-Y., and Yu, Y. 2008. Searching questions by identifying question topic and question focus. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’08). 156--164.Google Scholar
- Dumais, S. and Chen, H. 2000. Hierarchical classification of Web content. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 256--263. Google ScholarDigital Library
- Erkan, G. and Radev, D. R. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457--479. Google ScholarCross Ref
- Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google ScholarDigital Library
- Griffiths, A., Luckhurst, H. C., and Willett, P. 1986. Using interdocument similarity information in document retrieval systems. J. Amer. Soc. Inf. Sci. 37, 1, 3--11.Google ScholarCross Ref
- Hearst, M. A. and Pedersen, J. O. 1996. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 76--84. Google ScholarDigital Library
- Jardine, N. and van Rijsbergen, C. 1971. The use of hierarchical clustering in information retrieval. Inf. Stor. Retriev. 7, 217--240.Google ScholarCross Ref
- Jeon, J., Croft, W. B., and Lee, J. H. 2005a. Finding semantically similar questions based on their answers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 617--618. Google ScholarDigital Library
- Jeon, J., Croft, W. B., and Lee, J. H. 2005b. Finding similar questions in large question and answer archives. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 84--90. Google ScholarDigital Library
- Jijkoun, V. and de Rijke, M. 2005. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 76--83. Google ScholarDigital Library
- Kurland, O. and Lee, L. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
- Kurland, O. and Lee, L. 2009. Clusters, language models, and ad hoc information retrieval. ACM Trans. Inf. Syst. 27, 3. Google ScholarDigital Library
- Lam, W., Ruiz, M., and Srinivasan, P. 1999. Automatic text categorization and its application to text retrieval. IEEE Trans. Knowl. Data Engin. 11, 6, 865--879. Google ScholarDigital Library
- Liu, X. and Croft, W. B. 2004. Cluster-based retrieval using language models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 186--193. Google ScholarDigital Library
- Liu, Y., Bian, J., and Agichtein, E. 2008. Predicting information seeker satisfaction in community question answering. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 483--490. Google ScholarDigital Library
- Ming, Z., Chua, T.-S., and Cong, G. 2010. Exploring domain-specific term weight in archived question search. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 1605--1608. Google ScholarDigital Library
- Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V. O., and Liu, Y. 2007. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the Meeting of the Association for Computational Linguistics (ACL’07). 464--471.Google Scholar
- Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at trec-3. In Proceedings of the Text Retrieval Conference (TREC’94). 109--126.Google Scholar
- Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47. Google ScholarDigital Library
- Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 21--29. Google ScholarDigital Library
- Soricut, R. and Brill, E. 2004. Automatic question answering: Beyond the factoid. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’04). 57--64.Google Scholar
- van Rijsbergen, C. J. 1979. Information Retrieval. Butterworth. Google ScholarDigital Library
- Voorhees, E. M. 2001. Overview of the trec 2001 question answering track. In Proceedings of the Text Retrieval Conference (TREC’01). 42--51.Google ScholarCross Ref
- Wang, K., Ming, Z., and Chua, T.-S. 2009. A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 187--194. Google ScholarDigital Library
- Wang, K., Ming, Z., Hu, X., and Chua, T.-S. 2010. Segmentation of multi-sentence questions: Towards effective question retrieval in cqa services. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 387--394. Google ScholarDigital Library
- Wei, W., Cong, G., Li, X., Ng, S.-K., and Li, G. 2011. Integrating community question and answer archives. In Proceedings of the AAAI’11 Conference on Artificial Intelligence. 1255--1260.Google Scholar
- Xue, X., Jeon, J., and Croft, W. B. 2008. Retrieval models for question and answer archives. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 475--482. Google ScholarDigital Library
- Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22, 2, 179--214. Google ScholarDigital Library
- Zobel, J. and Moffat, A. 2006. Inverted files for text search engines. ACM Comput. Surv. 38, 2, 6. Google ScholarDigital Library
Index Terms
- Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives
Recommendations
Question Retrieval with High Quality Answers in Community Question Answering
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementThis paper studies the problem of question retrieval in community question answering (CQA). To bridge lexical gaps in questions, which is regarded as the biggest challenge in retrieval, state-of-the-art methods learn translation models using answers ...
A generalized framework of exploring category information for question retrieval in community question answer archives
WWW '10: Proceedings of the 19th international conference on World wide webCommunity Question Answering (CQA) has emerged as a popular type of service where users ask and answer questions and access historical question-answer pairs. CQA archives contain very large volumes of questions organized into a hierarchy of categories. ...
Retrieval models for question and answer archives
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalRetrieval in a question and answer archive involves finding good answers for a user's question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this ...
Comments