research-article

Free Access

Lightly supervised transliteration for machine translation

Authors:
Amit Kirschenbaum

University of Haifa, Haifa, Israel

University of Haifa, Haifa, Israel
View Profile

,
Shuly Wintner

University of Haifa, Haifa, Israel

University of Haifa, Haifa, Israel
View Profile

EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsMarch 2009Pages 433–441

Published:30 March 2009Publication History

EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

Pages 433–441

ABSTRACT

We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38% of the errors made by a baseline method. The identified terms are then transliterated. We present an SMT-based transliteration model trained with a parallel corpus extracted from Wikipedia using a fairly simple method which requires minimal knowledge. The correct result is produced in more than 76% of the cases, and in 92% of the instances it is one of the top-5 results. We also demonstrate a small improvement in the performance of a Hebrew-to-English MT system that uses our transliteration module.

References

Yaser Al-Onaizan and Kevin Knight. 2002. Translating named entities using monolingual and bilingual resources. In ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 400--408, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarDigital Library
Mansur Arbabi, Scott M. Fischthal, Vincent C. Cheng, and Elizabeth Bart. 1994. Algorithms for arabic name transliteration. IBM Journal of Research and Development, 38(2):183--194. Google ScholarDigital Library
Roy Bar-Haim, Khalil Sima'an, and Yoad Winter. 2008. Part-of-speech tagging of Modern Hebrew text. Natural Language Engineering, 14(2):223--251. Google ScholarDigital Library
Thorsten Brants and Alex Franz. 2006. Web 1T 5-gram version 1.1. Technical report, Google Reseach.Google Scholar
Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematic of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311. Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin, 2001. LIB-SVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
Yoav Goldberg and Michael Elhadad. 2008. Identification of transliterated foreign words in hebrew script. In CICLing, pages 466--477. Google ScholarDigital Library
Dan Goldwasser and Dan Roth. 2008. Active sample selection for named entity transliteration. In Proceedings of ACL-08: HLT, Short Papers, pages 53--56, Columbus, Ohio, June. Association for Computational Linguistics. Google ScholarDigital Library
Ulf Hermjakob, Kevin Knight, and Hal Daumé III. 2008. Name translation in statistical machine translation - learning when to transliterate. In Proceedings of ACL-08: HLT, pages 389--397, Columbus, Ohio, June. Association for Computational Linguistics.Google Scholar
Alon Itai and Shuly Wintner. 2008. Language resources for Hebrew. Language Resources and Evaluation, 42(1):75--98, March.Google ScholarCross Ref
Alon Itai, Shuly Wintner, and Shlomo Yona. 2006. A computational lexicon of contemporary hebrew. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pages 19--22, Genoa, Italy.Google Scholar
Kevin Knight and Jonathan Graehl. 1997. Machine transliteration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 128--135, Madrid, Spain. Association for Computational Linguistics. Google ScholarDigital Library
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177--180, Prague, Czech Republic, June. Association for Computational Linguistics. Google ScholarDigital Library
Alon Lavie, Erik Peterson, Katharina Probst, Shuly Wintner, and Yaniv Eytani. 2004a. Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 1--10, Baltimore, MD, October.Google Scholar
Alon Lavie, Kenji Sagae, and Shyamsundar Jayaraman. 2004b. The significance of recall in automatic metrics for mt evaluation. In Robert E. Frederking and Kathryn Taylor, editors, AMTA, volume 3265 of Lecture Notes in Computer Science, pages 134--143. Springer.Google Scholar
David Matthews. 2007. Machine transliteration of proper names. Master's thesis, School of Informatics, University of Edinburgh.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001. BLEU: a method for automatic evaluation of machine translation. In ACL'02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 311--318, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarDigital Library
Bernhard Schölkopf, Alex J. Smola, Robert Williamson, and Peter Bartlett. 2000. New support vector algorithms. Neural Computation, 12:1207--1245. Google ScholarDigital Library
Bonnie Glover Stalls and Kevin Knight. 1998. Translating names and technical terms in Arabic text. In Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages, pages 34--41. Google ScholarDigital Library
Andreas Stolcke. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings International Conference on Spoken Language Processing (ICSLP 2002), pages 901--904.Google Scholar
Vladimir N. Vapnik. 1995. The nature of statistical learning theory. Springer-Verlag New York, Inc., New York, NY, USA. Google ScholarDigital Library
Su-Youn Yoon, Kyoung-Young Kim, and Richard Sproat. 2007. Multilingual transliteration using feature based phonetic method. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 112--119, Prague, Czech Republic, June. Association for Computational Linguistics.Google Scholar

Index Terms

Lightly supervised transliteration for machine translation
1. Computing methodologies

Recommendations

Hindi-to-Urdu machine translation through transliteration
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation. We propose two probabilistic models, based on conditional and joint probability formulations, that are novel solutions to the problem. Our ...
Read More
Transliteration normalization for Information Extraction and Machine Translation

Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV) rate, and present challenges for Machine Translation, Information ...
Read More
Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
March 2009
905 pages
General Chair:
Alex Lascarides
University of Edinburgh (UK)
,
Program Chairs:
Claire Gardent
CNRS/LORIA Nancy (France)
,
Joakim Nivre
Uppsala University and Vaxjo University (Sweden)
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 30 March 2009
Qualifiers
- research-article
Conference

Acceptance Rates
EACL '09 Paper Acceptance Rate100of360submissions,28%Overall Acceptance Rate100of360submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 227
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lightly supervised transliteration for machine translation

EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hindi-to-Urdu machine translation through transliteration

Transliteration normalization for Information Extraction and Machine Translation

Word Sense Based Hindi-Tamil Statistical Machine Translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Lightly supervised transliteration for machine translation

EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hindi-to-Urdu machine translation through transliteration

Transliteration normalization for Information Extraction and Machine Translation

Word Sense Based Hindi-Tamil Statistical Machine Translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media