ABSTRACT
Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality. These training criteria make use of recently proposed automatic evaluation metrics. We describe a new algorithm for efficient training an unsmoothed error count. We show that significantly better results can often be obtained if the final evaluation criterion is taken directly into account as part of the training procedure.
- Srinivas Bangalore, O. Rambox, and S. Whittaker. 2000. Evaluation metrics for generation. In Proceedings of the International Conference on Natural Language Generation, Mitzpe Ramon, Israel. Google ScholarDigital Library
- George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proc. ARPA Workshop on Human Language Technology.Google ScholarDigital Library
- Richhard O. Duda and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. John Wiley, New York, NY. Google ScholarDigital Library
- Joshua Goodman. 1996. Parsing algorithms and metrics. In Proceedings of the 34th Annual Meeting of the ACL, pages 177--183, Santa Cruz, CA, June. Google ScholarDigital Library
- B. H. Juang, W. Chou, and C. H. Lee. 1995. Statistical and discriminative methods for speech recognition. In A. J. Rubio Ayuso and J. M. Lopez Soler, editors, Speech Recognition and Coding - New Advances and Trends. Springer Verlag, Berlin, Germany.Google Scholar
- Shankar Kumar and William Byrne. 2002. Minimum bayes-risk alignment of bilingual texts. In Proc. of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA. Google ScholarDigital Library
- Sonja Nießen, Franz J. Och, G. Leusch, and Hermann Ney. 2000. An evaluation tool for machine translation: Fast evaluation for machine translation research. In Proc. of the Second Int. Conf. on Language Resources and Evaluation (LREC), pages 39--45, Athens, Greece, May.Google Scholar
- Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, July. Google ScholarDigital Library
- Franz J. Och, Christoph Tillmann, and Hermann Ney. 1999. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20--28, University of Maryland, College Park, MD, June. Google ScholarDigital Library
- Chris Paciorek and Roni Rosenfeld. 2000. Minimum classification error training in exponential language models. In NIST/DARPA Speech Transcription Workshop, May.Google Scholar
- Kishore A. Papineni, Salim Roukos, and R. T. Ward. 1997. Feature-based language understanding. In European Conf. on Speech Communication and Technology, pages 1435--1438, Rhodes, Greece, September.Google Scholar
- Kishore A. Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, September.Google Scholar
- Kishore A. Papineni. 1999. Discriminative training via linear programming. In Proceedings of the 1999 IEEE International Conference on Acoustics, Speech & Signal Processing, Atlanta, March. Google ScholarDigital Library
- William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2002. Numerical Recipes in C++. Cambridge University Press, Cambridge, UK.Google Scholar
- Ralf Schlüter and Hermann Ney. 2001. Model-based MCE bound to the true Bayes' error. IEEE Signal Processing Letters, 8(5):131--133, May.Google ScholarCross Ref
- Christoph Tillmann, Stephan Vogel, Hermann Ney, Alex Zubiaga, and Hassan Sawaf. 1997. Accelerated DP based search for statistical translation. In European Conf. on Speech Communication and Technology, pages 2667--2670, Rhodes, Greece, September.Google Scholar
- Nicola Ueffing, Franz Josef Och, and Hermann Ney. 2002. Generation of word graphs in statistical machine translation. In Proc. Conference on Empirical Methods for Natural Language Processing, pages 156--163, Philadelphia, PE, July. Google ScholarDigital Library
Index Terms
Minimum error rate training in statistical machine translation
Recommendations
Lattice-based minimum error rate training for statistical machine translation
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingMinimum Error Rate Training (MERT) is an effective means to estimate the feature function weights of a linear model such that an automated evaluation criterion for measuring system performance can directly be optimized in training. To accomplish this, ...
Using Statistical Machine Translation to Grade Training Data
ISUC '08: Proceedings of the 2008 Second International Symposium on Universal CommunicationOne of the main causes of errors in statistical machine translation are the erroneous phrase pairs that can find their way into the phrase table. These phrases are the result of poor word-to-word alignments during the training of the translation ...
Random restarts in minimum error rate training for statistical machine translation
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1Och's (2003) minimum error rate training (MERT) procedure is the most commonly used method for training feature weights in statistical machine translation (SMT) models. The use of multiple randomized starting points in MERT is a well-established ...
Comments