ABSTRACT
We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam-and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task.
- Chris Callison-Burch, Colin Bannard, and Josh Schroeder. 2005. Scaling phrase-based statistical machine translation to larger corpora and longer phrases. In Proceedings of ACL. Google ScholarDigital Library
- Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation (WMT08). Google ScholarDigital Library
- Chris Callison-Burch. 2009. A 109 word parallel corpus. In preparation.Google Scholar
- David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228. Google ScholarDigital Library
- Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In Proceedings of ACL. Google ScholarDigital Library
- Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the ACL/Coling. Google ScholarDigital Library
- Liang Huang and David Chiang. 2005. Better k-best parsing. In Proceedings of the International Workshop on Parsing Technologies. Google ScholarDigital Library
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT/NAACL. Google ScholarDigital Library
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL-2007 Demo and Poster Sessions. Google ScholarDigital Library
- Philipp Koehn. 2005. A parallel corpus for statistical machine translation. In Proceedings of MT-Summit, Phuket, Thailand.Google Scholar
- Shankar Kumar and William Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proceedings of HLT/NAACL.Google Scholar
- Zhifei Li and Sanjeev Khudanpur. 2008a. Large-scale discriminative n-gram language models for statistical machine translation. In Proceedings of AMTA.Google Scholar
- Zhifei Li and Sanjeev Khudanpur. 2008b. A scalable decoder for parsing-based machine translation with equivalent language model state maintenance. In In Proceedings Workshop on Syntax and Structure in Statistical Translation. Google ScholarDigital Library
- Zhifei Li, Chris Callison-Burch, Sanjeev Khudanpur, and Wren Thornton. 2009a. Decoding in joshua: Open source, parsing-based machine translation. The Prague Bulletin of Mathematical Linguistics, 91:47--56.Google ScholarCross Ref
- Zhifei Li, Jason Eisner, and Sanjeev Khudanpur. 2009b. Variational decoding for statistical machine translation. In preparation.Google Scholar
- Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment templates for statistical machine translation. In Proceedings of the ACL/Coling. Google ScholarDigital Library
- Adam Lopez. 2007. Hierarchical phrase-based translation with suffix arrays. In Proceedings of EMNLP-CoLing.Google Scholar
- Robert C. Moore. 2002. Fast and accurate sentence alignment of bilingual corpora. In Proceedings of AMTA. Google ScholarDigital Library
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19--51. Google ScholarDigital Library
- Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of ACL.Google ScholarDigital Library
- Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal smt. In Proceedings of ACL. Google ScholarDigital Library
- David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In Proceedings of the ACL/Coling. Google ScholarDigital Library
- Andreas Stolcke. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, Denver, Colorado, September.Google Scholar
- David Talbot and Miles Osborne. 2007. Randomised language modelling for statistical machine translation. In Proceedings of ACL.Google Scholar
- Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79--88.Google ScholarCross Ref
Index Terms
- Joshua: an open source toolkit for parsing-based machine translation
Recommendations
Joshua 2.0: a toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies
WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATRWe describe the progress we have made in the past year on Joshua (Li et al., 2009a), an open source toolkit for parsing based machine translation. The new functionality includes: support for translation grammars with a rich set of syntactic nonterminals,...
Demonstration of Joshua: an open source toolkit for parsing-based machine translation
ACLDemos '09: Proceedings of the ACL-IJCNLP 2009 Software DemonstrationsWe describe Joshua (Li et al., 2009a), an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for translation via synchronous context free grammars (SCFGs): chart-parsing, n-gram language model ...
Joshua 3.0: syntax-based machine translation with the Thrax grammar extractor
WMT '11: Proceedings of the Sixth Workshop on Statistical Machine TranslationWe present progress on Joshua, an open-source decoder for hierarchical and syntax-based machine translation. The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor. Thrax extracts both hierarchical (Chiang,...
Comments