ABSTRACT
Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.
- E. H. Hovy. 1999. Toward finely differentiated evaluation metrics for machine translation. In Proceedings of the Eagles Workshop on Standards and Evaluation, Pisa, Italy.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, John Henderson, and Florence Reeder. 2002. Corpus-based comprehensive and diagnostic MT evaluation: Initial Arabic, Chinese, French, and Spanish results. In Proceedings of Human Language Technology 2002, San Diego, CA. To appear. Google ScholarDigital Library
- Florence Reeder. 2001. Additional mt-eval references. Technical report, International Standards for Language Engineering, Evaluation Working Group. http://issco-www.unige.ch/projects/isle/taxonomy2/Google Scholar
- J. S. White and T. O'Connell. 1994. The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In Proceedings of the First Conference of the Association for Machine Translation in the Americas, pages 193--205, Columbia, Maryland.Google Scholar
- BLEU: a method for automatic evaluation of machine translation
Recommendations
Extending the BLEU MT evaluation method with frequency weightings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational LinguisticsWe present the results of an experiment on extending the automatic method of Machine Translation evaluation BLUE with statistical weights for lexical items, such as tf.idf scores. We show that this extension gives additional information about evaluated ...
Evaluation of English to Arabic Machine Translation Systems using BLEU and GTM
ICETC '17: Proceedings of the 9th International Conference on Education Technology and ComputersThe aim of this research study is to compare the effectiveness of three systems: Google Translator, Bing Translator and Golden Alwafi that are used to translate the corpus sentences from English language to Arabic language and then evaluate these ...
Comparing reordering constraints for SMT using efficient Bleu oracle computation
SSST '07: Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical TranslationThis paper describes a new method to compare reordering constraints for Statistical Machine Translation. We investigate the best possible (oracle) Bleu score achievable under different reordering constraints. Using dynamic programming, we efficiently ...
Comments