article

Chinese information retrieval based on terms and relevant terms

Authors:
Yang Lingpeng

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

,
Ji Donghong

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

,
Tang Li

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

,
Niu Zhengyu

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

ACM Transactions on Asian Language Information Processing Volume 4 Issue 3pp 357–374https://doi.org/10.1145/1111667.1111675

Published:01 September 2005Publication History

ACM Transactions on Asian Language Information Processing

Abstract

In this article we describe our approach to Chinese information retrieval, where a query is a short natural language description. First, we use automatically extracted short terms from document sets to build indexes and use the short terms in both the query and documents to do initial retrieval. Next, we use long terms extracted from the document collection to reorder the top N retrieved documents to improve precision. Finally, we acquire the relevant terms of the short terms from the Internet and the top retrieved documents and use them to do query expansion. Experiments on the NTCIR-4 CLIR Chinese SLIR sub-collection show that document reranking can both improve the retrieval performance on its own and make a significant contribution to query expansion. The experiments also show that the extended query expansion proposed in this article is more effective than the standard Rocchio query expansion.

References

Balinski, J. and Danilowicz, C. 2005. Re-ranking method based on inter-document distance. Information Processing and Management 41 (2005), 759--775.]] Google Scholar
Bear J., Israel, D., Petit, J., and Martin, D. 1997. Using information extraction to improve document retrieval. In Proceedings of the Sixth Text Retrieval Conference. 1997.]]Google Scholar
Bouman, A. C., Shapiro, M., Cook, W. G., Atkins, B. C., and Cheng, H. 1998. Cluster: An unsupervised algorithm for modeling Gaussian mixtures. http://dynamo.ecn.purdue.edu/bouman/software/cluster/.]]Google Scholar
Chien, L. F. 1995. Fast and quasi-natural language search for gigabytes of Chinese texts. In Proceedings of the 18th ACM SIGIR Conference on R&D in IR. E. Fox et al. eds. ACM, New York. 1995. 112--120.]] Google Scholar
Crouch, C., Crouch, D., Chen, Q., and Holtz, S. 2002. Improving the retrieval effectiveness of very short queries. Information Processing and Management 38 (2002).]] Google Scholar
Dash, M. and Liu, H. 2000. Feature selection for clustering. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000. 110--121.]] Google Scholar
Fuhr, N. 1992. Probabilistic models in information retrieval. Computer Journal 35, 3 (1992), 243--254.]] Google Scholar
Ji, D. H., Yang, L. P., and Nie, Y. 2002. Chinese language IR based on term extraction. In Proceedings of the Third NTCIR Workshop. 2002.]]Google Scholar
Kamps, J. 2004. Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Proceedings of the 21th European Conference on Information Retrieval. 2004.]]Google Scholar
Kishida, K., Chen, K. H., Lee, S., and Kuriyama, K. et al. 2004. Overview of CLIR task at the Fourth NTCIR Workshop. In Proceedings of the Fourth NTCIR Workshop. 2004.]]Google Scholar
Kwok, K. L. 1997. Comparing representations in Chinese information retrieval. In Proceedings of the ACM SIGIR-97 Conference. 1997. 34--41.]] Google Scholar
Landauer, T. K. and Dumais, S. T. 1997. A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge Psychological Review 104 (1997), 211--140.]]Google Scholar
Lee, K., Park, Y., and Choi, K. S. 2001. Document re-ranking model using clusters. Information Processing and Management 37, 1 (2001), 1--14.]] Google Scholar
Li, P. 1999. Research on improvement of single Chinese character indexing method. Journal of the China Society for Scientific and Technical Information 18, 5 (1999).]]Google Scholar
Luk, R. W. P. and Wong, K. F. 2004. Pseudo-relevance feedback and title re-ranking for Chinese IR. In Proceedings of the NTCIR Workshop 4. 2004.]]Google Scholar
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR '98 Conference. ACM, New York, 1998.]] Google Scholar
Nie, J. Y., Gao, J., Zhang, J., and Zhou, M. 2000. On the use of words and N-grams for Chinese information retrieval. In Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages (IRAL-2000). 141--148.]] Google Scholar
Niu, Z. Y., Ji, D. H., and Lua, K. T. 2004. Learning word sense with feature selection and order identification capabilities. In Proceedings of the ACL 2004 Conference (Barcelona, Spain, July 21--26, 2004), 630--637.]] Google Scholar
Palmer, D. and Burger, J. 1997. Chinese word segmentation and information retrieval. In Proceedings of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval. Electronic working notes. 1997.]]Google Scholar
Qu, Y. L., Xu, G. W., and Wang, J. 2000. Rerank method based on individual thesaurus. In Proceedings of the NTCIR2 Workshop. 2000.]]Google Scholar
Robertson, S. E. and Walker, S. 2001. Microsoft Cambridge at TREC-9: Filtering track. In Proceedings of the Eight Text Retrieval Conference (TREC-8Gaithersburg, MD), 151--161. NIST Special Pub. 500-264.]]Google Scholar
Salton, G. and Mcgill, M. 1983. Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.]] Google Scholar
Schutze, H. 1998. The hypertext concordance: a better back-of-the-book index. In Proceedings of the First Workshop on Computational Terminology. 1998. 101--104.]]Google Scholar
Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the ACM SIGIR'96 Conference. ACM, New York, 1996.]] Google Scholar
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1 (2000), 79--112.]] Google Scholar
Yang, L. P., Ji, D. H., and Tang, L. 2004. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. In Proceedings of the 20th International Conference on Computational Linguistics. 2004.]] Google Scholar
Yang, L. P., Ji, D. H., Zhou, G. D., and Nie, Y. 2005. Improving retrieval effectiveness by using key terms in top retrieved documents. In Proceedings of the 27th European Conference on Information Retrieval. 2005.]] Google Scholar

Index Terms

Chinese information retrieval based on terms and relevant terms
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Classifying and filtering blind feedback terms to improve information retrieval effectiveness
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

The classification of blind relevance feedback (BRF) terms described in this paper aims at increasing precision or recall by determining which terms decrease, increase or do not change the corresponding information retrieval (IR) performance metric. ...
Read More
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

Successful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
Read More
Improving query expansion using pseudo-relevant web knowledge for information retrieval
Highlights
- Web knowledge-based query expansion technique uses the top N pseudo relevant web pages
Abstract
In the field of information retrieval, query expansion (QE) has long been used as a technique to deal with the fundamental issue of word mismatch between a user’s query and the target information. In the context of the relationship ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 4, Issue 3
September 2005
138 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1111667
Issue’s Table of Contents

Copyright © 2005 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2005
Published in talip Volume 4, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Term extraction
document re-ranking
information retrieval
query expansion
relevant term
term clustering
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 885
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Chinese information retrieval based on terms and relevant terms

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Classifying and filtering blind feedback terms to improve information retrieval effectiveness

Document expansion for image retrieval

Improving query expansion using pseudo-relevant web knowledge for information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Chinese information retrieval based on terms and relevant terms

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Classifying and filtering blind feedback terms to improve information retrieval effectiveness

Document expansion for image retrieval

Improving query expansion using pseudo-relevant web knowledge for information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media