Abstract
At the NTCIR-4 workshop, Justsystem Corporation (JSC) and Clairvoyance Corporation (CC) collaborated in the cross-language retrieval task (CLIR). Our goal was to evaluate the performance and robustness of our recently developed commercial-grade CLIR systems for English and Asian languages. The main contribution of this article is the investigation of different strategies, their interactions in both monolingual and bilingual retrieval tasks, and their respective contributions to operational retrieval systems in the context of NTCIR-4. We report results of Japanese and English monolingual retrieval and results of Japanese-to-English bilingual retrieval. In monolingual retrieval analysis, we examine two special properties of the NTCIR experimental design (two levels of relevance and identical queries in multiple languages) and explore how they interact with strategies of our retrieval system, including pseudo-relevance feedback, multi-word term down-weighting, and term weight merging strategies. Our analysis shows that the choice of language (English or Japanese) does not have a significant impact on retrieval performance. Query expansion is slightly more effective with relaxed judgments than with rigid judgments. For better retrieval performance, weights of multi-word terms should be lowered. In the bilingual retrieval analysis, we aim to identify robust strategies that are effective when used alone and when used in combination with other strategies. We examine cross-lingual specific strategies such as translation disambiguation and translation structuring, as well as general strategies such as pseudo-relevance feedback and multi-word term down-weighting. For shorter title topics, pseudo-relevance feedback is a major performance enhancer, but translation structuring affects retrieval performance negatively when used alone or in combination with other strategies. All experimented strategies improve retrieval performance for the longer description topics, with pseudo-relevance feedback and translation structuring as the major contributors.
- Allan, J., Connell, M. E., Croft, W. B., Feng, F., Fisher, D., and Li, X. 2000. Inquery and TREC-9. In Proceedings of the 9th Text REtrieval Conference (TREC 2000). National Institute of Standards and Technology (NIST), Gaithersburg, MD.]]Google Scholar
- Ballesteros, L. and Croft, W. B. 1996. Dictionary methods for cross-lingual information retrieval, In Proceedings of the 7th International Conference on Database and Expert Systems Applications (Zurich, Switzerland), 791--801.]] Google Scholar
- Ballesteros, L. and Croft, W. B. 1998. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia). ACM, New York, 64--71.]] Google Scholar
- Chen, A., He, J., Xu, L., Gey, F. C., and Meggs, J. 1997. Chinese text retrieval without using a dictionary. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Philadelphia, PA). ACM, New York, 42--49.]] Google Scholar
- Davis, M. W. and Ogden, W. C. 1997. QUILT: Implementing a large-scale cross-language text retrieval system. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Philadelphia, PA). ACM, New York, 92--98.]] Google Scholar
- Evans, D. A. and Lefferts, R. G. 1995. CLARIT-TREC experiments. Inf. Process. Manage. 31, 3 (1997), 385--395.]] Google Scholar
- Evans, D. A. and Zhai, C. 1996. Noun-phrase analysis in unrestricted text. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (University of California, Santa Cruz). Morgan Kaufmann/ACL, 17--24.]] Google Scholar
- Fujita, S. 1999. Notes on phrasal indexing: JSCB evaluation experiments at NTCIR AD HOC. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems (NACSIS), Tokyo, Japan.]]Google Scholar
- Grefenstette, G. 1998. The problem of cross-language information retrieval. In Cross-Language Information Retrieval. G. Grefenstette (ed). Kluwer Academic, Boston, MA, 1--9.]]Google Scholar
- Hull, D. A. and Grefenstette, G. 1996. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Zurich, Switzerland). ACM, New York, 49--57.]] Google Scholar
- Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., and Hidaka, S. 1999. Overview of IR tasks at the first NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems (NACSIS), Tokyo, Japan, 11--44.]]Google Scholar
- Kishida, K., Chen, K., Lee, S., Kuriyama, K., Kando, N., Chen, H., Myaeng, S. H., and Eguchi, K. 2004. Overview of CLIR task at the fourth NTCIR workshop. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, Japan, 1--60.]]Google Scholar
- Kishida, K. and Kando, N. 2004. Two-stages refinement of query translation for pivot language approach to cross-lingual information retrieval. In Comparative Evaluation of Multilingual Information Access Systems, 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003 (Trondheim, Norway, Aug. 21-22, 2003). Revised selected papers. P. C. Gonzalo et al. (eds). Lecture Notes in Computer Science 3237, Springer, New York, 253--262.]]Google Scholar
- Kwok, K. L. 1997. Comparing representations in Chinese information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (Philadelphia, PA). ACM, New York, 34--41.]] Google Scholar
- Kwok, K. L. 1999. Employing multiple representations in Chinese information retrieval. J. American Society for Information Science 50, 8 (1999), 709--723.]] Google Scholar
- Fujii, H. and Croft, W. B. 1993. A comparison of indexing techniques for Japanese text retrieval. In Proceedings of the16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Pittsburgh, PA). ACM, New York, 237--246.]] Google Scholar
- Lindman, H. R. 1974. Analysis of Variances in Complex Experimental Designs. Freeman, New York.]]Google Scholar
- Littman, M. L., Dumais, S. T., and Landauer, T. K. 1998. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval. G. Grefenstette (ed). Kluwer Academic, Boston, MA, 51--62.]]Google Scholar
- Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.]] Google Scholar
- McCarley, J. S. 1999. Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (University of Maryland, College Park). ACL, 208--214.]] Google Scholar
- Nakagawa, T. and Kitamura, M. 2004. NTCIR-4 CLIR experiments at Oki. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, 96--99.]]Google Scholar
- Oyama, K., Ishida, E., and Kando, N. (Eds) 2003. NTCIR Workshop 3: Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, National Institute of Informatics, Tokyo, Japan.]]Google Scholar
- Oard, D. W. 1998. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of Machine Translation and the Information Soup, Third Conference of the Association for Machine Translation in the Americas. AMTA '98. Lecture Notes in Computer Science 1529, Springer, New York, 472--483.]] Google Scholar
- Oard, D. W. and Wang, J. 2001. NTCIR-2 experiments at Maryland: Comparing structured queries and balanced translation. In Proceedings of the 2nd NTCIR Workshop on Research in Chinese & Japanese Text Retrieval and Text Summarization. National Institute of Informatics, Tokyo, Japan.]]Google Scholar
- Pirkola, A. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia). ACM, New York, 55--63.]] Google Scholar
- Pirkola, A., Puolamaki, D., and Jarvelin, K. 2003. Applying query structuring in cross-language retrieval. Inf. Process. Manage. 39, 3 (2003), 391--402.]] Google Scholar
- Qu, Y., Grefenstette, G., and Evans, D. A. 2003. Resolving translation ambiguity using monolingual corpora. In Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum (CLEF 2002, Rome, Italy, Sept. 19-20, 2002), C. Peters et al. (eds). Lecture Notes in Computer Science 2785, Springer, New York, 223--241.]]Google Scholar
- Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland). ACM/Springer, New York, 232--241.]] Google Scholar
- Savoy, J. 2004. Report on CLIR task for the NTCIR-4 evaluation campaign. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, Japan, 178--192.]]Google Scholar
- Sperer, R. and Oard, W. 2000. Structured translation for cross-language information retrieval. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Athens, Greece). ACM, New York, 120--127.]] Google Scholar
- Sakai, T., Koyama, M., Kumano, A., and Manabe, T. 2004. Toshiba BIRDJE at NTCIR-4 CLIR: Monolingual/bilingual IR and flexible feedback. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, Japan, 65--72.]]Google Scholar
- Sheridan, P. and Ballerini, J. P. 1997. Experiments in multilingual information retrieval using the SPIDER system, In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Zurich, Switzerland). ACM, New York, 58--65.]] Google Scholar
- Tong, X., Zhai, C., Milic-Frayling, N., and Evans, D. A. 1996. Experiments on Chinese text indexing---CLARIT TREC-5 Chinese track report. In Proceedings of the Fifth Text REtrieval Conference (TREC-5, Gaithersburg, MD). National Institute of Standards and Technology (NIST), Special Publication 500-238.]]Google Scholar
- Utiyama, M., Isahara, H. 2003. Reliable measures for aligning Japanese-English news articles and sentences. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (Sapporo, Japan). 72--79.]] Google Scholar
- Voorhees, E. 2002. The philosophy of information retrieval evaluation. In Evaluation of Cross-Language Information Retrieval Systems: CLEF 2001 Workshop Revised Papers. C. Peters, et al. (eds). Lecture Notes in Computer Science 2406, Springer, New York, 355--370.]] Google Scholar
- Yamashita, T. and Matsumoto, Y. 2000. Language independent morphological analysis. In Proceedings of the 6th Applied Natural Language Processing Conference (Seattle, WA). 232--328.]] Google Scholar
- Yang, L., Ji, D., and Tang, L. 2004. Chinese information retrieval based on terms and ontology. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting, National Institute of Informatics, Tokyo, Japan, 136--142.]]Google Scholar
Index Terms
- Towards effective strategies for monolingual and bilingual information retrieval: Lessons learned from NTCIR-4
Recommendations
Statistical Models for Monolingual and Bilingual Information Retrieval
AbstractThis work reviews information retrieval systems developed at ITC-irst which were evaluated through several tracks of CLEF, during the last three years. The presentation tries to follow the progress made over time in developing new statistical ...
Applications of tf-idf concept to improve monolingual and cross-language information retrieval based on word embeddings
AISS '19: Proceedings of the 1st International Conference on Advanced Information Science and SystemThis work applied word embeddings for English monolingual information retrieval and Dutch-English cross-language information retrieval. Besides word embeddings, this work also applied tf-idf concept to increase result of relevant documents. We present ...
Word normalization and decompounding in mono- and bilingual IR
AbstractThe present research studies the impact of decompounding and two different word normalization methods, stemming and lemmatization, on monolingual and bilingual retrieval. The languages in the monolingual runs are English, Finnish, German and ...
Comments