skip to main content
article

Chinese information retrieval based on terms and relevant terms

Published:01 September 2005Publication History
Skip Abstract Section

Abstract

In this article we describe our approach to Chinese information retrieval, where a query is a short natural language description. First, we use automatically extracted short terms from document sets to build indexes and use the short terms in both the query and documents to do initial retrieval. Next, we use long terms extracted from the document collection to reorder the top N retrieved documents to improve precision. Finally, we acquire the relevant terms of the short terms from the Internet and the top retrieved documents and use them to do query expansion. Experiments on the NTCIR-4 CLIR Chinese SLIR sub-collection show that document reranking can both improve the retrieval performance on its own and make a significant contribution to query expansion. The experiments also show that the extended query expansion proposed in this article is more effective than the standard Rocchio query expansion.

References

  1. Balinski, J. and Danilowicz, C. 2005. Re-ranking method based on inter-document distance. Information Processing and Management 41 (2005), 759--775.]] Google ScholarGoogle Scholar
  2. Bear J., Israel, D., Petit, J., and Martin, D. 1997. Using information extraction to improve document retrieval. In Proceedings of the Sixth Text Retrieval Conference. 1997.]]Google ScholarGoogle Scholar
  3. Bouman, A. C., Shapiro, M., Cook, W. G., Atkins, B. C., and Cheng, H. 1998. Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://dynamo.ecn.purdue.edu/bouman/software/cluster/.]]Google ScholarGoogle Scholar
  4. Chien, L. F. 1995. Fast and quasi-natural language search for gigabytes of Chinese texts. In Proceedings of the 18th ACM SIGIR Conference on R&D in IR. E. Fox et al. eds. ACM, New York. 1995. 112--120.]] Google ScholarGoogle Scholar
  5. Crouch, C., Crouch, D., Chen, Q., and Holtz, S. 2002. Improving the retrieval effectiveness of very short queries. Information Processing and Management 38 (2002).]] Google ScholarGoogle Scholar
  6. Dash, M. and Liu, H. 2000. Feature selection for clustering. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000. 110--121.]] Google ScholarGoogle Scholar
  7. Fuhr, N. 1992. Probabilistic models in information retrieval. Computer Journal 35, 3 (1992), 243--254.]] Google ScholarGoogle Scholar
  8. Ji, D. H., Yang, L. P., and Nie, Y. 2002. Chinese language IR based on term extraction. In Proceedings of the Third NTCIR Workshop. 2002.]]Google ScholarGoogle Scholar
  9. Kamps, J. 2004. Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Proceedings of the 21th European Conference on Information Retrieval. 2004.]]Google ScholarGoogle Scholar
  10. Kishida, K., Chen, K. H., Lee, S., and Kuriyama, K. et al. 2004. Overview of CLIR task at the Fourth NTCIR Workshop. In Proceedings of the Fourth NTCIR Workshop. 2004.]]Google ScholarGoogle Scholar
  11. Kwok, K. L. 1997. Comparing representations in Chinese information retrieval. In Proceedings of the ACM SIGIR-97 Conference. 1997. 34--41.]] Google ScholarGoogle Scholar
  12. Landauer, T. K. and Dumais, S. T. 1997. A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge Psychological Review 104 (1997), 211--140.]]Google ScholarGoogle Scholar
  13. Lee, K., Park, Y., and Choi, K. S. 2001. Document re-ranking model using clusters. Information Processing and Management 37, 1 (2001), 1--14.]] Google ScholarGoogle Scholar
  14. Li, P. 1999. Research on improvement of single Chinese character indexing method. Journal of the China Society for Scientific and Technical Information 18, 5 (1999).]]Google ScholarGoogle Scholar
  15. Luk, R. W. P. and Wong, K. F. 2004. Pseudo-relevance feedback and title re-ranking for Chinese IR. In Proceedings of the NTCIR Workshop 4. 2004.]]Google ScholarGoogle Scholar
  16. Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR '98 Conference. ACM, New York, 1998.]] Google ScholarGoogle Scholar
  17. Nie, J. Y., Gao, J., Zhang, J., and Zhou, M. 2000. On the use of words and N-grams for Chinese information retrieval. In Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages (IRAL-2000). 141--148.]] Google ScholarGoogle Scholar
  18. Niu, Z. Y., Ji, D. H., and Lua, K. T. 2004. Learning word sense with feature selection and order identification capabilities. In Proceedings of the ACL 2004 Conference (Barcelona, Spain, July 21--26, 2004), 630--637.]] Google ScholarGoogle Scholar
  19. Palmer, D. and Burger, J. 1997. Chinese word segmentation and information retrieval. In Proceedings of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval. Electronic working notes. 1997.]]Google ScholarGoogle Scholar
  20. Qu, Y. L., Xu, G. W., and Wang, J. 2000. Rerank method based on individual thesaurus. In Proceedings of the NTCIR2 Workshop. 2000.]]Google ScholarGoogle Scholar
  21. Robertson, S. E. and Walker, S. 2001. Microsoft Cambridge at TREC-9: Filtering track. In Proceedings of the Eight Text Retrieval Conference (TREC-8Gaithersburg, MD), 151--161. NIST Special Pub. 500-264.]]Google ScholarGoogle Scholar
  22. Salton, G. and Mcgill, M. 1983. Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.]] Google ScholarGoogle Scholar
  23. Schutze, H. 1998. The hypertext concordance: a better back-of-the-book index. In Proceedings of the First Workshop on Computational Terminology. 1998. 101--104.]]Google ScholarGoogle Scholar
  24. Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the ACM SIGIR'96 Conference. ACM, New York, 1996.]] Google ScholarGoogle Scholar
  25. Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1 (2000), 79--112.]] Google ScholarGoogle Scholar
  26. Yang, L. P., Ji, D. H., and Tang, L. 2004. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. In Proceedings of the 20th International Conference on Computational Linguistics. 2004.]] Google ScholarGoogle Scholar
  27. Yang, L. P., Ji, D. H., Zhou, G. D., and Nie, Y. 2005. Improving retrieval effectiveness by using key terms in top retrieved documents. In Proceedings of the 27th European Conference on Information Retrieval. 2005.]] Google ScholarGoogle Scholar

Index Terms

  1. Chinese information retrieval based on terms and relevant terms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader