skip to main content
10.5555/1626431.1626459dlproceedingsArticle/Chapter ViewAbstractPublication PagesstatmtConference Proceedingsconference-collections
research-article
Free Access

Joshua: an open source toolkit for parsing-based machine translation

Published:30 March 2009Publication History

ABSTRACT

We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam-and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task.

References

  1. Chris Callison-Burch, Colin Bannard, and Josh Schroeder. 2005. Scaling phrase-based statistical machine translation to larger corpora and longer phrases. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation (WMT08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chris Callison-Burch. 2009. A 109 word parallel corpus. In preparation.Google ScholarGoogle Scholar
  4. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the ACL/Coling. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Liang Huang and David Chiang. 2005. Better k-best parsing. In Proceedings of the International Workshop on Parsing Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT/NAACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL-2007 Demo and Poster Sessions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philipp Koehn. 2005. A parallel corpus for statistical machine translation. In Proceedings of MT-Summit, Phuket, Thailand.Google ScholarGoogle Scholar
  11. Shankar Kumar and William Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proceedings of HLT/NAACL.Google ScholarGoogle Scholar
  12. Zhifei Li and Sanjeev Khudanpur. 2008a. Large-scale discriminative n-gram language models for statistical machine translation. In Proceedings of AMTA.Google ScholarGoogle Scholar
  13. Zhifei Li and Sanjeev Khudanpur. 2008b. A scalable decoder for parsing-based machine translation with equivalent language model state maintenance. In In Proceedings Workshop on Syntax and Structure in Statistical Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zhifei Li, Chris Callison-Burch, Sanjeev Khudanpur, and Wren Thornton. 2009a. Decoding in joshua: Open source, parsing-based machine translation. The Prague Bulletin of Mathematical Linguistics, 91:47--56.Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhifei Li, Jason Eisner, and Sanjeev Khudanpur. 2009b. Variational decoding for statistical machine translation. In preparation.Google ScholarGoogle Scholar
  16. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment templates for statistical machine translation. In Proceedings of the ACL/Coling. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Adam Lopez. 2007. Hierarchical phrase-based translation with suffix arrays. In Proceedings of EMNLP-CoLing.Google ScholarGoogle Scholar
  18. Robert C. Moore. 2002. Fast and accurate sentence alignment of bilingual corpora. In Proceedings of AMTA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of ACL.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal smt. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In Proceedings of the ACL/Coling. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andreas Stolcke. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, Denver, Colorado, September.Google ScholarGoogle Scholar
  24. David Talbot and Miles Osborne. 2007. Randomised language modelling for statistical machine translation. In Proceedings of ACL.Google ScholarGoogle Scholar
  25. Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79--88.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Joshua: an open source toolkit for parsing-based machine translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          StatMT '09: Proceedings of the Fourth Workshop on Statistical Machine Translation
          March 2009
          286 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 30 March 2009

          Qualifiers

          • research-article

          Acceptance Rates

          StatMT '09 Paper Acceptance Rate12of21submissions,57%Overall Acceptance Rate24of59submissions,41%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader