Abstract
An Example-Based Machine Translation (EBMT) system, whose translation example unit is a sentence, can produce an accurate and natural translation if translation examples similar enough to an input sentence are retrieved. Such a system, however, suffers from the problem of narrow coverage. To reduce the problem, a large-scale parallel corpus is required and, therefore, an efficient method is needed to retrieve translation examples from a large-scale corpus. The authors propose an efficient retrieval method for a sentence-wise EBMT using edit-distance. The proposed retrieval method efficiently retrieves the most similar sentences using the measure of edit-distance without omissions. The proposed method employs search-space division, word graphs, and an A* search algorithm. The performance of the EBMT was evaluated through Japanese-to-English translation experiments using a bilingual corpus comprising hundreds of thousands of sentences from a travel conversation domain. The EBMT system achieved a high-quality translation ability by using a large corpus and also achieved efficient processing by using the proposed retrieval method.
- Baldwin, T. and Tanaka, H. 2001. Balancing up efficiency and accuracy in translation retrieval. Journal of Natural Language Processing 8, 2, 19--37.Google Scholar
- Brzozowski, J. A. 1962. Canonical regular expressions and minimal state graphs for definite events. In Proc. of Symposium of Mathematical Theory of Automata, MRI Symposia Series 12, 529--561.Google Scholar
- Cormen, H. T., Leiserson, C. E., and Rivest, L. R. 1989. Introduction to Algorithms. The MIT Press, Cambridge, MA. Google Scholar
- Cranias, L., Papageorgiou, H., and Piperidis, S. 1997. Example retrieval from a translation memory. Natural Language Engineering 3, 4, 255--277. Google Scholar
- Doi, T. and Sumita, E. 2005. Splitting input for machine translation using n-gram language model together with utterance similarity. IEICE Transactions on Information and Systems E88-D, 6, 1256--1264. Google Scholar
- Imamura, K. 2002. Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT. In Proc. of TMI-2002. 74--84.Google Scholar
- Imamura, K., Sumita, E., and Matsumoto, Y. 2003. Feedback cleaning of machine translation rules using automatic evaluation. In Proc. of ACL 2003. 457--454. Google Scholar
- Manning, D. C. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA. Google Scholar
- Nagao, M. 1984. A framework of a mechanical translation between Japanese and English by analogy principle. In Artificial and Human Intelligence, A. Elithorn and R. Banerji, Eds. North-Holland, Amsterdam. 173--180. Google Scholar
- Nilsson, N. 1971. Problem-Solving Methods in Artificial Intelligence. McGraw-Hill, New York. Google Scholar
- Ohno, S. and Hamanishi, M. 1984. Ruigo-Shin-Jiten (in Japanese). Kadokawa, Tokyo, Japan.Google Scholar
- Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: A method for automatic evaluation of machine translation. In Proc. of 40th Annual Meeting of ACL. 311--318. Google Scholar
- Planas, E. and Furuse, O. 1999. Formalizing translation memories. In Proc. of 7th MT Summit. 331--339.Google Scholar
- Rapp, R. 2002. A part-of-speech-based search algorithm for translation memories. In Proc. of LREC 2002. 466--472.Google Scholar
- Sato, S. 1992. CTM: An example-based translation aid system. In Proc. of COLING '92. 1259--1263. Google Scholar
- Shimohata, M., Sumita, E., and Matsumoto, Y. 2003. Example-based rough translation for speech-to-speech translation. In Proc. of 9th MT Summit. 354--361.Google Scholar
- Somers, H. 2003. An overview of ebmt. In Recent Advances in Example-Based Machine Translation, M. Carl and A. Way, Eds. Kluwer Academic Publ., Boston, MA. 3--57.Google Scholar
- Sugaya, F., Takezawa, T., Yokoo, A., and Yamamoto, S. 2001. Proposal of an evaluation method for speech translation capability by comparing a speech translation system with humans and experiments using the method. IEICE Transactions on Information and Systems (Japanese edn.) J84-D-II, 11, 2362--2370.Google Scholar
- Sumita, E. 2003. An example-based machine translation system using DP-matching between word sequences. In Recent Advances in Example-Based Machine Translation, M. Carl and A. Way, Eds. Kluwer Academic Publ., Boston, MA. 189--209. Google Scholar
- Sumita, E. and Iida, H. 1991. Experiments and prospects of example-based machine translation. In Proc. of 29th Annual Meeting of ACL. 185--192. Google Scholar
- Sumita, E., Yamada, S., Yamamoto, K., Paul, M., Kashioka, H., Ishikawa, K., and Shirai, S. 1999. Solutions to problems inherent in spoken-language translation: The ATR-MATRIX approach. In Proc. of 7th MT Summit. 229--235.Google Scholar
- Takezawa, T. and Kikui, G. 2003. Collecting machine-translation-aided bilingual dialogues for corpus-based speech translation. In Proc. of EUROSPEECH. 2757--2760.Google Scholar
- Ueffing, N., Och, F., and Ney, H. 2002. Generation of word graphs in statistical machine translation. In Proc. of Conf. on Empirical Methods for Natural Language Processing. 156--163. Google Scholar
Index Terms
- Example-based machine translation using efficient sentence retrieval based on edit-distance
Recommendations
Example-based machine translation based on tree---string correspondence and statistical generation
This paper describes an example-based machine translation (EBMT) method based on tree---string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree ...
Dependency treelet translation: the convergence of statistical and example-based machine-translation?
We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with ...
Example-based machine translation: a review and commentary
In the last decade the dominant models of MT have been data-driven or corpus-based. Of the two main trends, statistical machine translation and example-based machine translation (EBMT), the latter is much less clearly defined. In a review of the ...
Comments