Abstract
Cross-language information retrieval (CLIR) is an active sub-domain of information retrieval (IR). Like IR, CLIR is centered on the search for documents and for information contained within those documents. Unlike IR, CLIR must reconcile queries and documents that are written in different languages. The usual solution to this mismatch involves translating the query and/or the documents before performing the search. Translation is therefore a pivotal activity for CLIR engines. Over the last 15 years, the CLIR community has developed a wide range of techniques and models supporting free text translation. This article presents an overview of those techniques, with a special emphasis on recent developments.
- AbdulJaleel, N. and Larkey, L. S. 2003. Statistical transliteration for English-Arabic cross language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, 139--146. Google ScholarDigital Library
- Adriani, M. 2000. Using statistical term similarity for sense disambiguationin cross-language information retrieval. Inf. Retr. 2, 1, 71--82. Google ScholarDigital Library
- Adriani, M. and Wahyu, I. 2005. The performance of a machine translation-based English-Indonesian CLIR system. In (CLEF 2005): Workshop on Cross-Language Information Retrieval and Evaluation. Google ScholarDigital Library
- Agirre, E., Di Nunzio, G. M., Ferro, N., Mandl, T., and Peters, C. 2009. CLEF 2008: Ad hoc track overview. In Proceedings of the 9th Cross-language Evaluation Forum Conference on Evaluating Systems for Multilingual and Multimodal Information Access (CLEF'08). Springer, 15--37. Google ScholarDigital Library
- Alfonseca, E., Bilac, S., and Pharies, S. 2008. Decompounding query keywords from compounding languages. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Papers. Association for Computational Linguistics, 253--256. Google ScholarDigital Library
- Aljlayl, M. and Frieder, O. 2001. Effective Arabic-English cross-language information retrieval via machine-readable dictionaries and machine translation. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, New York, 295--302. Google ScholarDigital Library
- Amati, G. and Van Rijsbergen, C. J. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 357--389. Google ScholarDigital Library
- Anderka, M., Lipka, N., and Stein, B. 2009. Evaluating cross-language explicit semantic analysis and cross querying. In Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF'09). Springer, 50--57. Google ScholarDigital Library
- Anderka, M. and Stein, B. 2009. The ESA retrieval model revisited. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 670--671. Google ScholarDigital Library
- Bacchin, M., Ferro, N., and Melucci, M. 2005. A probabilistic model for stemmer generation. Info. Process. Manag. 41, 1, 121--137. Google ScholarDigital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 2008. Modern Information Retrieval, 2nd ed. Addison-Wesley Publishing Company. Google ScholarDigital Library
- Ballesteros, L. and Croft, W. B. 1997. Phrasal translation and query expansion techniques for cross-language information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 84--91. Google ScholarDigital Library
- Ballesteros, L. and Croft, W. B. 1998. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 64--71. Google ScholarDigital Library
- Ballesteros, L. and Sanderson, M. 2003. Addressing the lack of direct translation resources for cross-language retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, 147--152. Google ScholarDigital Library
- Benczur, A., Csalogany, K., Fogaras, D., Friedman, E., Sarlas, T., Uher, M., and Windhager, E. 2003. Searching a small national domain A preliminary report. In Proceedings of the 12th International World Wide Web Conference (WWW).Google Scholar
- Berger, A. and Lafferty, J. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 222--229. Google ScholarDigital Library
- Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. 944937. Google ScholarCross Ref
- Boughanem, M., Chrisment, C., and Nassr, N. 2002. Investigation on disambiguation in CLIR: Aligned corpus and bi-directional translation-based strategies. In Proceedings of the 2nd Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems (Revised Papers). Springer, 158--168. Google ScholarDigital Library
- Braschler, M. and Ripplinger, B. 2004. How effective is stemming and decompounding for German text retrieval? Inf. Retr. 7, 3-4, 291--316. Google ScholarDigital Library
- Broglio, J., Callan, J. P., and Croft, W. B. 1993. INQUERY system overview. In Proceedings of the Annual Meeting of the ACL. 47--67. Google ScholarDigital Library
- Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2, 263--311. Google ScholarDigital Library
- Buckley, C., Mitra, M., Walz, J., and Cardie, C. 2000. Using clustering and superconcepts within SMART: TREC 6. Inf. Process. Manage. 36, 1, 109--131. Google ScholarDigital Library
- Callan, J. P., Croft, W. B., and Harding, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications. Springer, 78--83.Google Scholar
- Cao, G., Gao, J., and Nie, J.-y. 2007a. A system to mine large-scale bilingual dictionaries from monolingual Web. In Proceedings of the 11th Machine Translation Summit (MT Summit XI). 57--64.Google Scholar
- Cao, G., Gao, J., Nie, J.-Y., and Bai, J. 2007b. Extending query translation to cross-language query expansion with markov chain models. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. ACM, New York, 351--360. Google ScholarDigital Library
- Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R., Geng, Y., and Lee, D. 1997. Translingual information retrieval: A comparative evaluation. In Proceedings of the 15th International Joint Conference on Artificial Intelligence. 708--714.Google Scholar
- Chen, A. 2002. Cross-Language retrieval experiments at CLEF-2002. In Proceedings of Evaluation of Cross-Language Information Retrieval Systems: 3rd Workshop of the Cross-Language Evaluation Forum. Springer, 28--48.Google Scholar
- Chen, J., Chau, R., and Yeh, C.-H. 2004. Discovering parallel text from the World Wide Web. In Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation. Vol. 32, 157--161. Google ScholarDigital Library
- Chen, J. and Nie, J.-Y. 2000. Parallel web text mining for cross-language IR. In Proceedings of RIAO-2000: Content-Based Multimedia Information Access. 188--192.Google Scholar
- Cheng, P.-J., Teng, J.-W., Chen, R.-C., Wang, J.-H., Lu, W.-H., and Chien, L.-F. 2004. Translating unknown queries with web corpora for cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 146--153. Google ScholarDigital Library
- Chew, P. A., Verzi, S. J., Bauer, T. L., and McClain, J. T. 2006. Evaluation of the bible as a resource for cross-language information retrieval. In Proceedings of the Workshop on Multilingual Language Resources and Interoperability. 68--74. Google ScholarDigital Library
- Cimiano, P., Schultz, A., Sizov, S., Sorg, P., and Staab, S. 2009. Explicit versus latent concept models for cross-language information retrieval. In Proceedings of the 21st International Jont Conference on Artifical Intelligence. Morgan Kaufmann Publishers Inc., 1513--1518. Google ScholarDigital Library
- Cleverdon, C. W. 1991. The significance of the Cranfield tests on index languages. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 3--12. Google ScholarDigital Library
- Darwish, K. and Oard, D. W. 2003. Probabilistic structured query methods. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 338--344. Google ScholarDigital Library
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 6, 391--407.Google ScholarCross Ref
- Demner-Fushman, D. and Oard, D. W. 2003. The effect of bilingual term list size on dictionary-based cross-language information retrieval. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS'03) IEEE Computer Society. Google ScholarDigital Library
- Dolamic, L. and Savoy, J. 2010. Retrieval effectiveness of machine translated queries. J. Amer. Soc. Inf. Sci. Technol. 61, 2266--2273. Google ScholarDigital Library
- Dumais, S. T. 1993. Latent semantic indexing (LSI) and TREC-2. In Proceedings of TREC. 105--115.Google Scholar
- Dumais, S. T. 1995. Latent semantic indexing (LSI): TREC-3 report. In Proceedings of TREC. 219--230.Google Scholar
- Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-language retrieval using latent semantic indexing. In Proceedings of the AAAI Spring Symposium Series: Cross-Language Text and Speech Retrieval. 18--24.Google Scholar
- Fautsch, C. and Savoy, J. 2009. Algorithmic stemmers or morphological analysis? An evaluation. J. Amer. Soc. Inf. Sci. Technol. 60, 1616--1624. Google ScholarDigital Library
- Federico, M. and Bertoldi, N. 2002. Statistical cross-language information retrieval using n-best query translations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 167--174. Google ScholarDigital Library
- Ferro, N. and Peters, C. 2009. CLEF 2009 Ad hoc track overview: TEL and Persian tasks. In Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF'09). Springer, 13--35. Google ScholarDigital Library
- Fox, C. 1989. A stop list for general text. SIGIR Forum 24, 1-2, 19--21. Google ScholarDigital Library
- Franz, M., McCarley, J., and Roukos, S. 1999. Ad hoc and multilingual information retrieval at IBM. In Proceedings of TREC-7. 157--168.Google Scholar
- Fujii, A. and Ishikawa, T. 2001. Japanese/english cross-language information retrieval: exploration of query translation and transliteration. Comput. Humanit. 35, 4, 389--420.Google ScholarCross Ref
- Gao, J. and Nie, J.-Y. 2006. A study of statistical models for query translation: finding a good unit of translation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 194--201. Google ScholarDigital Library
- Gao, J., Nie, J.-Y., Wu, G., and Cao, G. 2004. Dependence language model for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 170--177. Google ScholarDigital Library
- Gao, J., Nie, J.-Y., Xun, E., Zhang, J., Zhou, M., and Huang, C. 2001. Improving query translation for cross-language information retrieval using statistical models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 96--104. Google ScholarDigital Library
- Gao, J., Zhou, M., Nie, J.-Y., He, H., and Chen, W. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 183--190. Google ScholarDigital Library
- Gao, W., Wong, K.-F., and Lam, W. 2005. Phoneme-Based transliteration of foreign names for OOV problem. In Proceedings of the 1st International Joint Conference in Natural Language Processing (IJCNLP 04.) Vol. 3248/2005. Springer, 110--119. Google ScholarDigital Library
- Gey, F. 2007. Search between Chinese and Japanese text collections. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 73--76.Google Scholar
- Gey, F. and Chen, A. 1998. TREC-9 cross-language information retrieval (English-Chinese) overview. In Proceedings of the 9th Text Retrieval Conference (TREC-9). 15--23.Google Scholar
- Gollins, T. and Sanderson, M. 2001. Improving cross language retrieval with triangulated translation. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 90--95. Google ScholarDigital Library
- Goto, I., Kato, N., Ehara, T., and Tanaka, H. 2004. Back transliteration from Japanese to English using target English context. In Proceedings of the 20th International Conference on Computational Linguistics. (COLING '04). Association for Computational Linguistics. Google ScholarDigital Library
- Goutte, C., Cancedda, N., Dymetman, M., and Foster, G., Eds. 2009. Learning Machine Translation. The MIT Press, Cambridge, MA. Google ScholarDigital Library
- He, D. and Wu, D. 2008. Translation enhancement: A new relevance feedback method for cross-language information retrieval. In Proceeding of the 17th ACM Conference on Information and Knowledge Management. ACM, New York, 729--738. Google ScholarDigital Library
- Hedlund, T. 2002. Compounds in dictionary-based cross-language information retrieval. Info. Res. 7, 2.Google Scholar
- Hollink, V., Kamps, J., Monz, C., and Rijke, M. D. 2004. Monolingual document retrieval for european languages. Inf. Retr. 7, 1-2, 33--52. Google ScholarDigital Library
- Huang, F., Zhang, Y., and Vogel, S. 2005. Mining key phrase translations from Web corpora. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 483--490. Google ScholarDigital Library
- Hull, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 329--338. Google ScholarDigital Library
- Hull, D. A. 1996. Stemming algorithms: A case study for detailed evaluation. J. Amer. Soc. Info. Sci. 47, 1, 70--84. Google ScholarDigital Library
- Hull, D. A. 1997. Using structured queries for disambiguation in cross-language information retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 84--98.Google Scholar
- Hull, D. A. and Grefenstette, G. 1996. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 49--57. Google ScholarDigital Library
- Jang, M.-G., Myaeng, S. H., and Park, S. Y. 1999. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 223--229. Google ScholarDigital Library
- Jeong, K., Myaeng, S., Lee, J., and Choi, K.-S. 1999. Automatic identification and back-transliteration of foreign words for information retrieval. Info. Process. Manag. 35, 523--540.Google ScholarCross Ref
- Jin, R., Hauptmann, A. G., and Zhai, C. X. 2002. Title language model for information retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 42--48. Google ScholarDigital Library
- Jones, G. J. F., Fantino, F., Newman, E., and Zhang, Y. 2008. Domain-Specific query translation for multilingual information access using machine translation augmented with dictionaries mined from wikipedia. In Proceedings of the 2nd International Workshop on Cross Lingual Information Access - Addressing the Information Need of Multilingual Societies (CLIA 08). 34--41.Google Scholar
- Kang, B.-J. and Choi, K.-S. 2000. Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages. (IRAL '00). ACM, New York, 133--140. Google ScholarDigital Library
- Kang, I.-H. and Kim, G. 2000. English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1. Association for Computational Linguistics, 418--424. Google ScholarDigital Library
- Kang, I.-S., Na, S.-H., and Lee, J.-H. 2004. POSTECH at NTCIR-4: CJKE monolingual and Korean-related cross-language retrieval experiments. In Proceedings of the 4th NTCIR Workshop. National Institute of Informatics.Google Scholar
- Kashioka, H., Maruyama, T., and Tanaka, H. 2003. Building a parallel corpus for monologues with clause alignment. In Proceedings of the Machine Translation Summit (MT Summit IX). 216--223.Google Scholar
- Keskustalo, H., Pirkola, A., Visala, K., Leppanen, E., and Jarvelin, K. 2003. Non-Adjacent digrams improve matching of cross-lingual spelling variants. In Proceedings of String Processing and Information Retrieval: 10th International Symposium (SPIRE 03). 252--265.Google Scholar
- Kishida, K. 2008. Prediction of performance of cross-language information retrieval using automatic evaluation of translation. Libr. Info. Sci. Res. 30, 2, 138--144.Google ScholarCross Ref
- Kishida, K. and Kando, N. 2005. Hybrid approach of query and document translation with pivot language for cross-language information retrieval. In Proceedings of the Workshop on Cross-Language Information Retrieval and Evaluation. Google ScholarDigital Library
- Knight, K. and Graehl, J. 1998. Machine transliteration. Comput. Linguist. 24, 4, 599--612. Google ScholarDigital Library
- Korn, M., Schulz, S., Medelyan, O., and Hahn, U. 2005. Bootstrapping dictionaries for cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 528--535. Google ScholarDigital Library
- Kraaij, W. 2003. Exploring transitive translation methods. In Proceedings of 4th Dutch-Belgian Information Retrieval Workshop.Google Scholar
- Kraaij, W., Nie, J.-Y., and Simard, M. 2003. Embedding web-based statistical translation models in cross-language information retrieval. Comput. Linguist. 29, 3, 381--419. Google ScholarDigital Library
- Kuriyama, K., Kando, N., Nozue, T., and Eguchi, K. 2002. Pooling for a large-scale test collection: An analysis of the search results from the first NTCIR workshop. Inf. Retr. 5, 41--59. Google ScholarDigital Library
- Kwok, K. L. 1999. English-Chinese cross-language retrieval based on a translation package. In Proceedings of the Machine Translation Summit VII Workshop of Machine Translation for Cross Language Information Retrieval. 8--13.Google Scholar
- Kwok, K. L. 2000. Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages. ACM Press, New York, 173--179. Google ScholarDigital Library
- Kwok, K. L. and Grunfeld, L. 1996. TREC-5 English and Chinese retrieval experiments using PIRCS. In Proceedings of TREC-5. 133--142.Google Scholar
- Lancaster, F. and Fayen, E. 1973. Information Retrieval On-Line. Melville Publishing Co., Los Angeles, CA.Google Scholar
- Landauer, T. K., Foltz, P. W., and Laham, D. 1998. An introduction to latent semantic analysis. Discourse Process. 25, 259--284.Google ScholarCross Ref
- Lavrenko, V., Choquette, M., and Croft, W. B. 2002. Cross-lingual relevance models. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 175--182. Google ScholarDigital Library
- Lavrenko, V. and Croft, W. B. 2001. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 120--127. Google ScholarDigital Library
- Lee, C.-J., Chen, C.-H., Kao, S.-H., and Cheng, P.-J. 2010. To translate or not to translate? In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 651--658. Google ScholarDigital Library
- Leek, T., Jin, H., Sista, S., and Schwartz, R. 2000. The BBN cross-lingual topic detection and tracking system. In Working Notes of the 3rd Topic Detection and Tracking Workshop. National Institutes of Standards and Technology.Google Scholar
- Lehtokangas, R., Airio, E., J, K., and rvelin. 2004. Transitive dictionary translation challenges direct dictionary translation in CLIR. Inf. Process. Manage. 40, 6, 973--988. Google ScholarDigital Library
- Lehtokangas, R., Keskustalo, H., and Järvelin, K. 2008. Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments. J. Amer. Soc. Inf. Sci. Technol. 59, 476--488. Google ScholarDigital Library
- Leveling, J., Zhou, D., Jones, G. J. F., and Wade, V. 2009. Document expansion, query translation and language modeling for ad-hoc IR. In Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF'09). Springer, 58--61. Google ScholarDigital Library
- Levow, G.-A. and Oard, D. W. 2000. Translingual topic tracking with PRISE. In Working Notes of the 3rd Topic Detection and Tracking Workshop. National Institutes of Standards and Technology.Google Scholar
- Levow, G.-A., Oard, D. W., and Resnik, P. 2005. Dictionary-Based techniques for cross-language information retrieval. Inf. Process. Manage. 41, 3, 523--547. Google ScholarDigital Library
- Lin, M.-C., Li, M.-X., Hsu, C.-C., and Wu, S.-H. 2010. Query expansion from Wikipedia and topic Web crawler on CLIR. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 101--106.Google Scholar
- Liu, X. and Croft, W. B. 2005. Statistical language modeling for information retrieval. Ann. Rev. Info. Sci. Technol. 39, 1, 1--31.Google ScholarCross Ref
- Liu, Y., Jin, R., and Chai, J. Y. 2005. A maximum coherence model for dictionary-based cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 536--543. Google ScholarDigital Library
- Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3, 1--49. Google ScholarDigital Library
- Loponen, A. and Järvelin, K. 2010. A dictionary- and corpus-independent statistical lemmatizer for information retrieval in low resource languages. In Proceedings of the International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum (CLEF'10). Springer, 3--14. Google ScholarDigital Library
- Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2002. Translation of web queries using anchor text mining. ACM Trans. Asian Lang. Info. Process. 1, 2, 159--172. Google ScholarDigital Library
- Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2004. Anchor text mining for translation of web queries: A transitive translation approach. ACM Trans. Inf. Syst. 22, 2, 242--269. Google ScholarDigital Library
- Maeda, A., Sadat, F., Yoshikawa, M., and Uemura, S. 2000. Query term disambiguation for web cross-language information retrieval using a search engine. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages. ACM Press, 25--32. Google ScholarDigital Library
- Majumder, P., Mitra, M., Parui, S. K., Kole, G., Mitra, P., and Datta, K. 2007. YASS: Yet another suffix stripper. ACM Trans. Info. Syst. 25, 4, 18:1--3:20. Google ScholarDigital Library
- Manning, C. D., Raghavan, P., and Schtze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Mayfield, J. and McNamee, P. 2004. Triangulation without translation. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 490--491. Google ScholarDigital Library
- McCarley, J. S. 1999. Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 208--214. Google ScholarDigital Library
- McEwan, C. J. A., Ounis, I., and Ruthven, I. 2002. Building bilingual dictionaries from parallel web documents. In Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval. Springer, 303--323. Google ScholarDigital Library
- McNamee, P. and Mayfield, J. 2002. Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 159--166. Google ScholarDigital Library
- McNamee, P. and Mayfield, J. 2004a. Character n-gram tokenization for European language text retrieval. Inf. Retr. 7, 1-2, 73--97. Google ScholarDigital Library
- McNamee, P. and Mayfield, J. 2004b. Cross-Language retrieval using HAIRCUT at CLEF 2004. In Proceedings of the Workshop on Cross-Language Information Retrieval and Evaluation. Google ScholarDigital Library
- McNamee, P., Mayfield, J., and Piatko, C. 2002. HAIRCUT: A system for multilingual text retrieval in java. J. Comput. Small Coll. 17, 8--22. Google ScholarDigital Library
- Melamed, I. D. 2000. Models of translational equivalence among words. Comput. Linguist. 26, 221--249. Google ScholarDigital Library
- Melucci, M. and Orio, N. 2003. A novel method for stemmer generation based on hidden markov models. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM Press, New York, 131--138. Google ScholarDigital Library
- Meng, H., Chen, B., Khudanpur, S., Levow, G.-A., Lo, W.-K., Oard, D., Schone, P., Tang, K., Wang, H.-M., and Wang, J. 2001. Mandarin-English information (MEI): Investigating translingual speech retrieval. In Proceedings of the 1st International Conference on Human Language Technology Research. Association for Computational Linguistics, 1--7. Google ScholarDigital Library
- Meng, H., Khudanpur, S., Levow, G., Oard, D. W., and Wang, H.-M. 2000. Mandarin-English information (MEI): Investigating translingual speech retrieval. In Proceedings of the NAACL-ANLP Workshop on Embedded Machine Translation Systems. Vol 5, Association for Computational Linguistics, 23--30. Google ScholarDigital Library
- Miller, D. R. H., Leek, T., and Schwartz, R. M. 1999. A hidden markov model information retrieval system. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 214--221. Google ScholarDigital Library
- Monz, C. and Dorr, B. J. 2005. Iterative translation disambiguation for cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 520--527. Google ScholarDigital Library
- Moreau, F., Claveau, V., and Sebillot, P. 2007. Automatic morphological query expansion using analogy-based machine learning. In Proceedings of the 29th European Conference on IR Research. Springer, 222--233. Google ScholarDigital Library
- Mori, T., Kokubu, T., and Tanaka, T. 2001. Cross-Lingual information retrieval based on LSI with multiple word spaces. In Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access.Google Scholar
- Nie, J.-y. 1998. Using a probabilistic translation model for cross-language information retrieval. In Proceedings of the 6th Workshop on Very Large Corpora. Morgan Kaufmann Publishers.Google Scholar
- Nie, J.-y. 2010. Cross-Language Information Retrieval. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. Google ScholarDigital Library
- Nie, J.-Y. and Ren, F. 1999. Chinese information retrieval: Using characters or words? Info. Process. Manag. 35, 4, 443--462.Google ScholarCross Ref
- Nie, J.-Y., Simard, M., Isabelle, P., and Durand, R. 1999. Cross-Language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 74--81. Google ScholarDigital Library
- Oard, D. W. 1998. A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup. Springer, 472--483. Google ScholarDigital Library
- Oard, D. W. 1999. Topic tracking with the PRISE information retrieval system. In Proceedings of the DARPA Broadcast News Workshop. 209--211.Google Scholar
- Oard, D. W. and Dorr, B. J. 1996. A survey of multilingual text retrieval. Tech. rep., University. of Maryland Institute for Advanced Computer Studies report no. UMIACS-TR-96-19, University of Maryland at College Park, MD. Google ScholarDigital Library
- Oard, D. W. and Ertunc, F. 2002. Translation-Based indexing for cross-language retrieval. In Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval. Springer, 324--333. Google ScholarDigital Library
- Oard, D. W. and Hackett, P. 1997. Document translation for cross-langauge text retrieval at the university of Maryland. In Proceedings of the 6th Text Retrieval Conference (TREC-6). NIST, 687--696.Google Scholar
- Oard, D. W., Levow, G.-A., and Cabezas, C. I. 2000. CLEF experiments at Maryland: Statistical stemming and backoff translation. In Proceedings of Evaluation of Cross-Language Information Retrieval Systems: Third Workshop of the Cross-Language Evaluation Forum. Google ScholarDigital Library
- Oard, D. W. and Wang, J. 2001. NTCIR-2 ECIR experiments at Maryland: Comparing pirkola's structured queries and balanced translation. In Proceedings of the 2nd NTCIR Workshop on Research in Chinese & Japanese, Text Retrieval and Text Summarization. National Institute of Informatics.Google Scholar
- Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51. Google ScholarDigital Library
- Parton, K., McKeown, K. R., Allan, J., and Henestroza, E. 2008. Simultaneous multilingual search for translingual information retrieval. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, New York, 719--728. Google ScholarDigital Library
- Peters, C. and Picchi, E. 1996. A system for cross-language information retrieval. ERCIM News 27.Google Scholar
- Peters, C. and Sheridan, P. 2001. Lectures on Information Retrieval. Springer, Chapter Multilingual information Access, 51--80. Google ScholarDigital Library
- Pirkola, A. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 55--63. Google ScholarDigital Library
- Pirkola, A., Keskustalo, H., Leppanen, E., Kansala, A.-P., and Jarvelin, K. 2002. Targeted s-gram matching: A novel n-gram matching technique for cross- and monolingual word form variants. Info. Res. 7, 2.Google Scholar
- Pirkola, A., Puolamäki, D., and Järvelin, K. 2003a. Applying query structuring in cross-language retrieval. Inf. Process. Manage. 39, 391--402. Google ScholarDigital Library
- Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., J, K., and rvelin. 2003b. Fuzzy translation of cross-lingual spelling variants. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 345--352. Google ScholarDigital Library
- Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 275--281. Google ScholarDigital Library
- Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 130--137.Google ScholarCross Ref
- Potthast, M., Stein, B., and Anderka, M. 2008. A Wikipedia-based multilingual retrieval model. In Proceedings of 30th European Conference on Information Retrieval. Springer, 522--530. Google ScholarDigital Library
- Qu, Y., Grefenstette, G., and Evans, D. A. 2003. Automatic transliteration for Japanese-to-English text retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 353--360. Google ScholarDigital Library
- Resnik, P. 1998. Parallel strands: A preliminary investigation into mining the web for bilingual text. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup. Springer, 72--82. Google ScholarDigital Library
- Resnik, P. 1999. Mining the web for bilingual text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 527--534. Google ScholarDigital Library
- Resnik, P., Oard, D., and Levow, G. 2001. Improved cross-language retrieval using backoff translation. In Proceedings of the 1st International Conference on Human Language Technology Research (HLT '01). Association for Computational Linguistics, 1--3. Google ScholarDigital Library
- Resnik, P. and Smith, N. A. 2003. The Web as a parallel corpus. Comput. Linguist. 29, 3, 349--380. Google ScholarDigital Library
- Robertson, A. and Willett, P. 1998. Applications of n-grams in textual information systems. J. Document. 54, 1, 48--69.Google ScholarCross Ref
- Rocchio, J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, 313--323.Google Scholar
- Ruiz, M., Diekema, A., and Sheridan, P. 1999. CINDOR conceptual interlingua document retrieval: TREC-8 evaluation. In Proceedings of the 8th Text Retrieval Conference (TREC-8).Google Scholar
- Sakai, T., Kando, N., Lin, C.-J., Mitamura, T., Shima, H., Ji, D., Chen, K.-H., and Nyberg, E. 2008. Overview of the NTCIR-7 ACLIA IR4QA task. In Proceedings of the 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 77--114.Google Scholar
- Sakai, T., Shima, H., Kando, N., Song, R., Lin, C.-J., Mitamura, T., Sugimito, M., and Lee, C.-W. 2010. Overview of NTCIR-8 ACLIA IR4QA. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 63--93.Google Scholar
- Salton, G. 1971. The SMART Retrieval System & Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarDigital Library
- Salton, G., Fox, E. A., and Wu, H. 1983. Extended Boolean information retrieval. Comm. ACM 26, 1022--1036. Google ScholarDigital Library
- Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 613--620. Google ScholarDigital Library
- Savoy, J. 2004. Combining multiple strategies for effective monolingual and cross-language retrieval. Inf. Retr. 7, 121--148. Google ScholarDigital Library
- Savoy, J. 2005. Comparative study of monolingual and multilingual search models for use with asian languages. ACM Trans. Asian Lang. Inf. Process. 4, 2, 163--189. Google ScholarDigital Library
- Savoy, J. 2007. Why do successful search systems fail for some topics. In Proceedings of the ACM Symposium on Applied Computing (SAC '07). ACM, New York, 872--877. Google ScholarDigital Library
- Savoy, J. and Dolamic, L. 2009. How effective is google's translation service in search? Comm. ACM 52, 139--143. Google ScholarDigital Library
- Schäuble, P. 1993. SPIDER: A multiuser information retrieval system for semistructured and dynamic data. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 318--327. Google ScholarDigital Library
- Schönhofen, P., Benczúr, A., Bíró, I., and Csalogány, K. 2008. Cross-Language retrieval with Wikipedia. In Proceedings of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF 07) (Revised Selected Papers). Springer, 72--79.Google Scholar
- Shannon, C. E. and Weaver, W. 1963. A Mathematical Theory of Communication. University of Illinois Press. Google ScholarDigital Library
- Shaw, J. A. and Fox, E. A. 1994. Combination of multiple searches. In the Proceedings of the 2nd Text REtrieval Conference (TREC-2). 243--252.Google Scholar
- Sheridan, P. and Ballerini, J. P. 1996. Experiments in multilingual information retrieval using the SPIDER system. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 58--65. Google ScholarDigital Library
- Shi, L. 2010. Mining OOV translations from mixed-language Web pages for cross language information retrieval. In Proceedings of the 32nd European Conference on Information Retrieval (ECIR 10). 471--482. Google ScholarDigital Library
- Shi, L., Nie, J.-Y., and Bai, J. 2007. Comparing different units for query translation in Chinese cross-language information retrieval. In Proceedings of the 2nd International Conference on Scalable Information Systems. 1--9. Google ScholarDigital Library
- Singhal, A. and Pereira, F. 1999. Document expansion for speech retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 34--41. Google ScholarDigital Library
- Snajder, J., Basic, B. D., and Tadic, M. 2008. Automatic acquisition of inflectional lexica for morphological normalisation. Inf. Process. Manage. 44, 5, 1720--1731. Google ScholarDigital Library
- Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the 8th International Conference on Information and Knowledge Management. ACM Press, New York, 316--321. Google ScholarDigital Library
- Sorg, P. and Cimiano, P. 2008. Cross-language information retrieval with explicit semantic analysis. In the Working Notes of the CLEF Workshop.Google Scholar
- Sparck Jones, K. 1988. A Statistical Interpretation of Term Specificity and its Application in Retrieval. Taylor Graham Publishing, London, UK, 132--142. Google ScholarDigital Library
- Su, C.-Y., Lin, T.-C., and Wu, S.-H. 2007. Using wikipedia to translate OOV terms on MLIR. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 109--115.Google Scholar
- Sun, L., Xue, S., Qu, W., Wang, X., and Sun, Y. 2002. Constructing of a large-scale Chinese-English parallel corpus. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization - Volume 12. Association for Computational Linguistics, 1--8. Google ScholarDigital Library
- Virga, P. and Khudanpur, S. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-Language Named Entity Recognition - Volume 15. Association for Computational Linguistics, 57--64. Google ScholarDigital Library
- Voorhees, E. M. and Harman, D. 2000. Overview of the ninth text retrieval conference (trec-9). In Proceedings of the 9th Text REtrieval Conference (TREC-9). 1--14.Google ScholarCross Ref
- Wang, J. and Oard, D. W. 2006. Combining bidirectional translation and synonymy for cross-language information retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 202--209. Google ScholarDigital Library
- Wong, S. K. M., Ziarko, W., and Wong, P. C. N. 1985. Generalized vector spaces model in information retrieval. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 18--25. Google ScholarDigital Library
- Xu, J. and Weischedel, R. 2000. Cross-Lingual information retrieval using hidden markov models. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora Held in conjunction with The 38th Annual Meeting of the Association for Computational Linguistics. Vol. 13, Association for Computational Linguistics, 95--103. Google ScholarDigital Library
- Xu, J. and Weischedel, R. 2005. Empirical studies on the impact of lexical resources on CLIR performance. Inf. Process. Manage. 41, 3, 475--487. Google ScholarDigital Library
- Xu, J., Weischedel, R., and Nguyen, C. 2001. Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 105--110. Google ScholarDigital Library
- Yang, C. C. and Li, K. W. 2002. Mining English/Chinese parallel documents from the World Wide Web. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, 188--192.Google Scholar
- Zhai, C. 2009. Statistical Language Models for Information Retrieval. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers. Google ScholarDigital Library
- Zhai, C. and Lafferty, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 334--342. Google ScholarDigital Library
- Zhang, Y., Uchimoto, K., Ma, Q., and Isahara, H. 2005a. Building an annotated Japanese-Chinese parallel corpus - A part of NICT multilingual corpora. In Proceedings of the 10th Machine Translation Summit MT Summit X. 71--78.Google Scholar
- Zhang, Y. and Vines, P. 2004. Using the web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, 162--169. Google ScholarDigital Library
- Zhang, Y., Vines, P., and Zobel, J. 2005b. Chinese OOV translation and post-translation query expansion in Chinese-English cross-lingual information retrieval. ACM Trans. Asian Lang. Info. Process. 4, 2, 57--77. Google ScholarDigital Library
- Zhou, D., Truran, M., Brailsford, T., and Ashman, H. 2007. NTCIR-6 experiments using pattern matched translation extraction. In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 145--151.Google Scholar
- Zhou, D., Truran, M., Brailsford, T., and Ashman, H. 2008a. A hybrid technique for English-Chinese cross language information retrieval. ACM Trans. Asian Lang. Info. Process. 7, 5:1--5:35. Google ScholarDigital Library
- Zhou, D., Truran, M., Brailsford, T., Ashman, H., and Goulding, J. 2008b. Gcon: A graph-based technique for resolving ambiguity in query translation candidates. In Proceedings of the 23rd Annual ACM Symposium on Applied Computing. 1566--1573. Google ScholarDigital Library
- Zhu, J. and Wang, H. 2006. The effect of translation quality in MT-based cross-language information retrieval. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 593--600. Google ScholarDigital Library
- Zobel, J. and Dart, P. 1995. Finding approximate matches in large lexicons. Softw. Practi. Exper. 25, 3, 331--345. Google ScholarDigital Library
Index Terms
- Translation techniques in cross-language information retrieval
Recommendations
Research on English-Chinese bi-directional cross-language information retrieval
Proceedings of the 2005 joint Chinese-German conference on Cognitive systemsWith the rapid growing amount of information available to us, the situations that a user needs to use a retrieval system to perform querying a multilingual document collection are becoming increasingly emerging and common. Thus an important problem is ...
Simultaneous multilingual search for translingual information retrieval
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementWe consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a ...
Translation disambiguation for cross-language information retrieval using context-based translation probability
Disambiguation between multiple translation choices is very important in dictionary-based cross-language information retrieval. In prior work, disambiguation techniques have used term co-occurrence statistics from the collection being ...
Comments