skip to main content
10.5555/1609067.1609115dlproceedingsArticle/Chapter ViewAbstractPublication PageseaclConference Proceedingsconference-collections
research-article
Free Access

Lightly supervised transliteration for machine translation

Published:30 March 2009Publication History

ABSTRACT

We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76% of the cases, and in 92% of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a Hebrew-to-English MT system that uses our transliteration module.

References

  1. Yaser Al-Onaizan and Kevin Knight. 2002. Translating named entities using monolingual and bilingual resources. In ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 400--408, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mansur Arbabi, Scott M. Fischthal, Vincent C. Cheng, and Elizabeth Bart. 1994. Algorithms for arabic name transliteration. IBM Journal of Research and Development, 38(2):183--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Roy Bar-Haim, Khalil Sima'an, and Yoad Winter. 2008. Part-of-speech tagging of Modern Hebrew text. Natural Language Engineering, 14(2):223--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thorsten Brants and Alex Franz. 2006. Web 1T 5-gram version 1.1. Technical report, Google Reseach.Google ScholarGoogle Scholar
  5. Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematic of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chih-Chung Chang and Chih-Jen Lin, 2001. LIB-SVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  7. Yoav Goldberg and Michael Elhadad. 2008. Identification of transliterated foreign words in hebrew script. In CICLing, pages 466--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dan Goldwasser and Dan Roth. 2008. Active sample selection for named entity transliteration. In Proceedings of ACL-08: HLT, Short Papers, pages 53--56, Columbus, Ohio, June. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ulf Hermjakob, Kevin Knight, and Hal Daumé III. 2008. Name translation in statistical machine translation - learning when to transliterate. In Proceedings of ACL-08: HLT, pages 389--397, Columbus, Ohio, June. Association for Computational Linguistics.Google ScholarGoogle Scholar
  10. Alon Itai and Shuly Wintner. 2008. Language resources for Hebrew. Language Resources and Evaluation, 42(1):75--98, March.Google ScholarGoogle ScholarCross RefCross Ref
  11. Alon Itai, Shuly Wintner, and Shlomo Yona. 2006. A computational lexicon of contemporary hebrew. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pages 19--22, Genoa, Italy.Google ScholarGoogle Scholar
  12. Kevin Knight and Jonathan Graehl. 1997. Machine transliteration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 128--135, Madrid, Spain. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177--180, Prague, Czech Republic, June. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alon Lavie, Erik Peterson, Katharina Probst, Shuly Wintner, and Yaniv Eytani. 2004a. Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 1--10, Baltimore, MD, October.Google ScholarGoogle Scholar
  15. Alon Lavie, Kenji Sagae, and Shyamsundar Jayaraman. 2004b. The significance of recall in automatic metrics for mt evaluation. In Robert E. Frederking and Kathryn Taylor, editors, AMTA, volume 3265 of Lecture Notes in Computer Science, pages 134--143. Springer.Google ScholarGoogle Scholar
  16. David Matthews. 2007. Machine transliteration of proper names. Master's thesis, School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  17. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001. BLEU: a method for automatic evaluation of machine translation. In ACL'02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 311--318, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bernhard Schölkopf, Alex J. Smola, Robert Williamson, and Peter Bartlett. 2000. New support vector algorithms. Neural Computation, 12:1207--1245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bonnie Glover Stalls and Kevin Knight. 1998. Translating names and technical terms in Arabic text. In Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages, pages 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andreas Stolcke. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings International Conference on Spoken Language Processing (ICSLP 2002), pages 901--904.Google ScholarGoogle Scholar
  21. Vladimir N. Vapnik. 1995. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Su-Youn Yoon, Kyoung-Young Kim, and Richard Sproat. 2007. Multilingual transliteration using feature based phonetic method. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 112--119, Prague, Czech Republic, June. Association for Computational Linguistics.Google ScholarGoogle Scholar

Index Terms

  1. Lightly supervised transliteration for machine translation

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image DL Hosted proceedings
              EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
              March 2009
              905 pages

              Publisher

              Association for Computational Linguistics

              United States

              Publication History

              • Published: 30 March 2009

              Qualifiers

              • research-article

              Acceptance Rates

              EACL '09 Paper Acceptance Rate100of360submissions,28%Overall Acceptance Rate100of360submissions,28%
            • Article Metrics

              • Downloads (Last 12 months)24
              • Downloads (Last 6 weeks)1

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader