Abstract
In this article we describe our approach to Chinese information retrieval, where a query is a short natural language description. First, we use automatically extracted short terms from document sets to build indexes and use the short terms in both the query and documents to do initial retrieval. Next, we use long terms extracted from the document collection to reorder the top N retrieved documents to improve precision. Finally, we acquire the relevant terms of the short terms from the Internet and the top retrieved documents and use them to do query expansion. Experiments on the NTCIR-4 CLIR Chinese SLIR sub-collection show that document reranking can both improve the retrieval performance on its own and make a significant contribution to query expansion. The experiments also show that the extended query expansion proposed in this article is more effective than the standard Rocchio query expansion.
- Balinski, J. and Danilowicz, C. 2005. Re-ranking method based on inter-document distance. Information Processing and Management 41 (2005), 759--775.]] Google Scholar
- Bear J., Israel, D., Petit, J., and Martin, D. 1997. Using information extraction to improve document retrieval. In Proceedings of the Sixth Text Retrieval Conference. 1997.]]Google Scholar
- Bouman, A. C., Shapiro, M., Cook, W. G., Atkins, B. C., and Cheng, H. 1998. Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://dynamo.ecn.purdue.edu/bouman/software/cluster/.]]Google Scholar
- Chien, L. F. 1995. Fast and quasi-natural language search for gigabytes of Chinese texts. In Proceedings of the 18th ACM SIGIR Conference on R&D in IR. E. Fox et al. eds. ACM, New York. 1995. 112--120.]] Google Scholar
- Crouch, C., Crouch, D., Chen, Q., and Holtz, S. 2002. Improving the retrieval effectiveness of very short queries. Information Processing and Management 38 (2002).]] Google Scholar
- Dash, M. and Liu, H. 2000. Feature selection for clustering. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000. 110--121.]] Google Scholar
- Fuhr, N. 1992. Probabilistic models in information retrieval. Computer Journal 35, 3 (1992), 243--254.]] Google Scholar
- Ji, D. H., Yang, L. P., and Nie, Y. 2002. Chinese language IR based on term extraction. In Proceedings of the Third NTCIR Workshop. 2002.]]Google Scholar
- Kamps, J. 2004. Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Proceedings of the 21th European Conference on Information Retrieval. 2004.]]Google Scholar
- Kishida, K., Chen, K. H., Lee, S., and Kuriyama, K. et al. 2004. Overview of CLIR task at the Fourth NTCIR Workshop. In Proceedings of the Fourth NTCIR Workshop. 2004.]]Google Scholar
- Kwok, K. L. 1997. Comparing representations in Chinese information retrieval. In Proceedings of the ACM SIGIR-97 Conference. 1997. 34--41.]] Google Scholar
- Landauer, T. K. and Dumais, S. T. 1997. A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge Psychological Review 104 (1997), 211--140.]]Google Scholar
- Lee, K., Park, Y., and Choi, K. S. 2001. Document re-ranking model using clusters. Information Processing and Management 37, 1 (2001), 1--14.]] Google Scholar
- Li, P. 1999. Research on improvement of single Chinese character indexing method. Journal of the China Society for Scientific and Technical Information 18, 5 (1999).]]Google Scholar
- Luk, R. W. P. and Wong, K. F. 2004. Pseudo-relevance feedback and title re-ranking for Chinese IR. In Proceedings of the NTCIR Workshop 4. 2004.]]Google Scholar
- Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR '98 Conference. ACM, New York, 1998.]] Google Scholar
- Nie, J. Y., Gao, J., Zhang, J., and Zhou, M. 2000. On the use of words and N-grams for Chinese information retrieval. In Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages (IRAL-2000). 141--148.]] Google Scholar
- Niu, Z. Y., Ji, D. H., and Lua, K. T. 2004. Learning word sense with feature selection and order identification capabilities. In Proceedings of the ACL 2004 Conference (Barcelona, Spain, July 21--26, 2004), 630--637.]] Google Scholar
- Palmer, D. and Burger, J. 1997. Chinese word segmentation and information retrieval. In Proceedings of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval. Electronic working notes. 1997.]]Google Scholar
- Qu, Y. L., Xu, G. W., and Wang, J. 2000. Rerank method based on individual thesaurus. In Proceedings of the NTCIR2 Workshop. 2000.]]Google Scholar
- Robertson, S. E. and Walker, S. 2001. Microsoft Cambridge at TREC-9: Filtering track. In Proceedings of the Eight Text Retrieval Conference (TREC-8Gaithersburg, MD), 151--161. NIST Special Pub. 500-264.]]Google Scholar
- Salton, G. and Mcgill, M. 1983. Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.]] Google Scholar
- Schutze, H. 1998. The hypertext concordance: a better back-of-the-book index. In Proceedings of the First Workshop on Computational Terminology. 1998. 101--104.]]Google Scholar
- Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the ACM SIGIR'96 Conference. ACM, New York, 1996.]] Google Scholar
- Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1 (2000), 79--112.]] Google Scholar
- Yang, L. P., Ji, D. H., and Tang, L. 2004. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. In Proceedings of the 20th International Conference on Computational Linguistics. 2004.]] Google Scholar
- Yang, L. P., Ji, D. H., Zhou, G. D., and Nie, Y. 2005. Improving retrieval effectiveness by using key terms in top retrieved documents. In Proceedings of the 27th European Conference on Information Retrieval. 2005.]] Google Scholar
Index Terms
- Chinese information retrieval based on terms and relevant terms
Recommendations
Classifying and filtering blind feedback terms to improve information retrieval effectiveness
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous InformationThe classification of blind relevance feedback (BRF) terms described in this paper aims at increasing precision or recall by determining which terms decrease, increase or do not change the corresponding information retrieval (IR) performance metric. ...
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous InformationSuccessful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Improving query expansion using pseudo-relevant web knowledge for information retrieval
Highlights- Web knowledge-based query expansion technique uses the top N pseudo relevant web pages
AbstractIn the field of information retrieval, query expansion (QE) has long been used as a technique to deal with the fundamental issue of word mismatch between a user’s query and the target information. In the context of the relationship ...
Comments