ABSTRACT
Users' cross-lingual queries to a digital library system might be short and not included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the corpus source to translate unknown query terms for cross-language information retrieval (CLIR) in digital libraries. We propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and benefit CLIR services in a digital library that only has monolingual document collections Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms and Web query terms.
- Chakrabarti, S. Mining the Web: Analysis of Hypertext and Semi Structured Data, Morgan Kaufmann, 2002.]]Google Scholar
- Chen, A. Jiang, H. and Gey, F Combining Multiple Sources for Short Query Translation in Chinese-English Cross-Language Information Retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), 2000, 17--23.]] Google ScholarDigital Library
- Chien, L F PAT-Tree-based Keyword Extraction for Chinese Information Retrieval. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 1997), 1997, 50--58.]] Google ScholarDigital Library
- Dumais, S. T. Landauer, T. K and Littman, M. L. Automatic Cross-Linguistic Information Retrieval Using Latent Semantic Indexing In Proceedings of ACM-SIGIR Workshop on Cross-Linguistic Information Retrieval (SIGIR 1996), 1996, 16--24.]]Google Scholar
- Fung, P. and Yee, L. Y. An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In Proceedings of the 36th Annual Conference of the Association for Computational Linguistics (ACL 1998), 1998, 414--420.]] Google ScholarDigital Library
- Gale, W. A. and Church, K. W. Identifying Word Correspondences in Parallel Texts. In Proceedings of DARPA Speech and Natural Language Workshop, 1991, 152--157.]] Google ScholarDigital Library
- Gale, W. A. and Church, K. W. A Program for Aligning Sentences in Bilingual Corpora Computational Linguistics, 19, 1 (1993), 75--102.]] Google ScholarDigital Library
- Gonnet, G. H. Baeza-yates, R. A. and Snider, T. New Indices for Text: Pat Trees and Pat Arrays Information Retrieval Data Structures & Algorithms, Prentice Hall, 1992, 66--82.]] Google ScholarDigital Library
- Kwok, K L NTCIR-2 Chinese, Cross Language Retrieval Experiments Using PIRCS. In Proceedings of NTCIR workshop meeting, 2001, 111--118.]]Google Scholar
- Larson, R. R. Gey, F. and Chen, A. Harvesting Translingual Vocabulary Mappings for Multilingual Digital Libraries. In Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2002), 2002, 185--190.]] Google ScholarDigital Library
- Lavrenko, V. Choquette, M. and Croft, W. B. Cross-Lingual Relevance Models. In Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR 2002), 2002, 175--182.]] Google ScholarDigital Library
- Lu, W. H. Chien, L. F. and Lee, H. J. Translation of Web Queries using Anchor Text Mining ACM Transactions on Asian Language Information Processing, 1 (2002), 159--172.]] Google ScholarDigital Library
- Lu, W. H. Chien, L. F. and Lee, H. J. Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach ACM Transactions on Information Systems, 22 (2004), 1--28.]] Google ScholarDigital Library
- Manber, U. and Baeza-yates, R. An Algorithm for String Matching with a Sequence of Don't Cares Information Processing Letters, 37 (1991), 133--136.]] Google ScholarDigital Library
- Morrison, D. PATRICIA: Practical Algorithm to Retrieve Information Coded in Alphanumeric JACM, 1968, 514--534.]] Google ScholarDigital Library
- Nie, J. Y. Isabelle, P. Simard, M. and Durand, R Cross-language Information Retrieval Based on Parallel Texts and Automatic Mining of Parallel Texts from the Web In Proceedings of ACM Conference on Research and Development in Information Retrieval (SIGIR 1999), 1999, 74--81.]] Google ScholarDigital Library
- Rapp, R. Automatic Identification of Word Translations from Unrelated English and German Corpora. In Proceedings of the 37th Annual Conference of the Association for Computational Linguistics (ACL 1999), 1999, 519--526.]] Google ScholarDigital Library
- Silva, J. F. Dias, G. Guillore, S. and Lopes, G. P. Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units Lecture Notes in Artificial Intelligence, 1695, Springer-Verlag, 1999, 113--132.]] Google ScholarDigital Library
- Silva, J. F. and Lopes, G. P. A Local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units. In Proceedings of the 6th Meeting on the Mathematics of Language, 1999, 369--381.]]Google Scholar
- Smadja, F. McKeown, K. and Hatzivassiloglou, V. Translating Collocations for Bilingual Lexicons: A Statistical Approach, Computational Linguistics, 22, 1 (1996), 1--38.]] Google ScholarDigital Library
Index Terms
- Translating unknown cross-lingual queries in digital libraries using a web-based approach
Recommendations
Translating unknown queries with web corpora for cross-language information retrieval
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalIt is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries due to that real queries might be short. The purpose of this paper is to investigate the feasibility of exploiting the Web as the ...
Toward Web mining of cross-language query translations in digital libraries
This paper proposes an effective query-translation approach that enables a cross-language information retrieval (CLIR) service to be more easily supported in digital library systems that only contain monolingual content. A query-translation engine ...
Translation of web queries using anchor text mining
This article presents an approach to automatically extracting translations of Web query terms through mining of Web anchor texts and link structures. One of the existing difficulties in cross-language information retrieval (CLIR) and Web search is the ...
Comments