research-article

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

Authors:
Xin Cao

Nanyang Technological University

Nanyang Technological University
View Profile

,
Gao Cong

Nanyang Technological University

Nanyang Technological University
View Profile

,
Bin Cui

Peking University

Peking University
View Profile

,
Christian S. Jensen

Aarhus University

Aarhus University
View Profile

,
Quan Yuan

Nanyang Technological University

Nanyang Technological University
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 30 Issue 2Article No.: 7pp 1–38https://doi.org/10.1145/2180868.2180869

Published:01 May 2012Publication History

ACM Transactions on Information Systems

Abstract

Community Question Answering (CQA) is a popular type of service where users ask questions and where answers are obtained from other users or from historical question-answer pairs. CQA archives contain large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. This article presents several new approaches to exploiting the category information of questions for improving the performance of question retrieval, and it applies these approaches to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are effective and efficient and are capable of outperforming a variety of baseline methods significantly.

References

Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. 2008. Finding high-quality content in social media. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’08). 183--194. Google ScholarDigital Library
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: Statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 192--199. Google ScholarDigital Library
Bian, J., Liu, Y., Agichtein, E., and Zha, H. 2008. Finding the right facts in the crowd: Factoid question answering over social media. In Proceedings of the International Conference on World Wide Web (WWW’08). 467--476. Google ScholarDigital Library
Burke, R. D., Hammond, K. J., Kulyukin, V. A., Lytinen, S. L., Tomuro, N., and Schoenberg, S. 1997. Question answering from frequently asked question files: Experiences with the faq finder system. AI Mag. 18, 2, 57--66.Google Scholar
Cao, X., Cong, G., Cui, B., Jensen, C. S., and Zhang, C. 2009. The use of categorization information in language models for question retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’09). 265--274. Google ScholarDigital Library
Cao, X., Cong, G., Cui, B., and Jensen, C. S. 2010. A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of the International Conference on World Wide Web (WWW’10). 201--210. Google ScholarDigital Library
Chekuri, C., Goldwasser, M. H., Raghavan, P., and Upfal, E. 1997. Web search using automatic classification. In Proceedings of the International Conference on World Wide Web (WWW’97).Google Scholar
Croft, W. B. 1980. A model of cluster searching based on classification. Inf. Syst. 5, 189--195.Google ScholarCross Ref
Diaz, F. 2005. Regularizing ad hoc retrieval scores. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 672--679. Google ScholarDigital Library
Diaz, F. 2007. Regularizing query-based retrieval scores. Inf. Retriev. 10, 6, 531--562. Google ScholarDigital Library
Duan, H., Cao, Y., Lin, C.-Y., and Yu, Y. 2008. Searching questions by identifying question topic and question focus. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’08). 156--164.Google Scholar
Dumais, S. and Chen, H. 2000. Hierarchical classification of Web content. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 256--263. Google ScholarDigital Library
Erkan, G. and Radev, D. R. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457--479. Google ScholarCross Ref
Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google ScholarDigital Library
Griffiths, A., Luckhurst, H. C., and Willett, P. 1986. Using interdocument similarity information in document retrieval systems. J. Amer. Soc. Inf. Sci. 37, 1, 3--11.Google ScholarCross Ref
Hearst, M. A. and Pedersen, J. O. 1996. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 76--84. Google ScholarDigital Library
Jardine, N. and van Rijsbergen, C. 1971. The use of hierarchical clustering in information retrieval. Inf. Stor. Retriev. 7, 217--240.Google ScholarCross Ref
Jeon, J., Croft, W. B., and Lee, J. H. 2005a. Finding semantically similar questions based on their answers. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 617--618. Google ScholarDigital Library
Jeon, J., Croft, W. B., and Lee, J. H. 2005b. Finding similar questions in large question and answer archives. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 84--90. Google ScholarDigital Library
Jijkoun, V. and de Rijke, M. 2005. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 76--83. Google ScholarDigital Library
Kurland, O. and Lee, L. 2004. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
Kurland, O. and Lee, L. 2009. Clusters, language models, and ad hoc information retrieval. ACM Trans. Inf. Syst. 27, 3. Google ScholarDigital Library
Lam, W., Ruiz, M., and Srinivasan, P. 1999. Automatic text categorization and its application to text retrieval. IEEE Trans. Knowl. Data Engin. 11, 6, 865--879. Google ScholarDigital Library
Liu, X. and Croft, W. B. 2004. Cluster-based retrieval using language models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 186--193. Google ScholarDigital Library
Liu, Y., Bian, J., and Agichtein, E. 2008. Predicting information seeker satisfaction in community question answering. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 483--490. Google ScholarDigital Library
Ming, Z., Chua, T.-S., and Cong, G. 2010. Exploring domain-specific term weight in archived question search. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 1605--1608. Google ScholarDigital Library
Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V. O., and Liu, Y. 2007. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the Meeting of the Association for Computational Linguistics (ACL’07). 464--471.Google Scholar
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at trec-3. In Proceedings of the Text Retrieval Conference (TREC’94). 109--126.Google Scholar
Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47. Google ScholarDigital Library
Singhal, A., Buckley, C., and Mitra, M. 1996. Pivoted document length normalization. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 21--29. Google ScholarDigital Library
Soricut, R. and Brill, E. 2004. Automatic question answering: Beyond the factoid. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’04). 57--64.Google Scholar
van Rijsbergen, C. J. 1979. Information Retrieval. Butterworth. Google ScholarDigital Library
Voorhees, E. M. 2001. Overview of the trec 2001 question answering track. In Proceedings of the Text Retrieval Conference (TREC’01). 42--51.Google ScholarCross Ref
Wang, K., Ming, Z., and Chua, T.-S. 2009. A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 187--194. Google ScholarDigital Library
Wang, K., Ming, Z., Hu, X., and Chua, T.-S. 2010. Segmentation of multi-sentence questions: Towards effective question retrieval in cqa services. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 387--394. Google ScholarDigital Library
Wei, W., Cong, G., Li, X., Ng, S.-K., and Li, G. 2011. Integrating community question and answer archives. In Proceedings of the AAAI’11 Conference on Artificial Intelligence. 1255--1260.Google Scholar
Xue, X., Jeon, J., and Croft, W. B. 2008. Retrieval models for question and answer archives. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 475--482. Google ScholarDigital Library
Zhai, C. and Lafferty, J. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22, 2, 179--214. Google ScholarDigital Library
Zobel, J. and Moffat, A. 2006. Inverted files for text search engines. ACM Comput. Surv. 38, 2, 6. Google ScholarDigital Library

Index Terms

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives
1. Information systems

Recommendations

Question Retrieval with High Quality Answers in Community Question Answering
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

This paper studies the problem of question retrieval in community question answering (CQA). To bridge lexical gaps in questions, which is regarded as the biggest challenge in retrieval, state-of-the-art methods learn translation models using answers ...
Read More
A generalized framework of exploring category information for question retrieval in community question answer archives
WWW '10: Proceedings of the 19th international conference on World wide web

Community Question Answering (CQA) has emerged as a popular type of service where users ask and answer questions and access historical question-answer pairs. CQA archives contain very large volumes of questions organized into a hierarchy of categories. ...
Read More
Retrieval models for question and answer archives
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Retrieval in a question and answer archive involves finding good answers for a user's question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 30, Issue 2
May 2012
245 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2180868
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2012
- Accepted: 1 February 2012
- Revised: 1 October 2011
- Received: 1 March 2011
Published in tois Volume 30, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Question-answering services
categorization
cluster-based retrieval
question search
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 712
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Question Retrieval with High Quality Answers in Community Question Answering

A generalized framework of exploring category information for question retrieval in community question answer archives

Retrieval models for question and answer archives

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Question Retrieval with High Quality Answers in Community Question Answering

A generalized framework of exploring category information for question retrieval in community question answer archives

Retrieval models for question and answer archives

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media