Abstract
Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan. Transliteration of Persian to English—that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary—is an unstudied problem. In this paper we make three novel contributions. First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian. Second, we discuss the difficulties in establishing a corpus for studying transliteration. Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English vowels. This trait makes transliteration of Persian particularly difficult for phonetic based methods. This new model outperforms the existing grapheme based methods on Persian, exhibiting a 24% relative increase in transliteration accuracy measured using the top-5 criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English-Arabic cross language information retrieval. In: CIKM, pp. 139–146 (2003)
Bilac, S., Tanaka, H.: Direct combination of spelling and pronunciation information for robust back-transliteration. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 413–424. Springer, Heidelberg (2005)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computional Linguistics 19(2), 263–311 (1993)
Cleary, J.G., Witten, I.H.: A comparison of enumerative and adaptive codes. IEEE Transactions on Information Theory 30(2), 306–315 (1984)
Eppstein, D.: Finding the k shortest paths. SIAM J. Computing 28(2), 652–673 (1998)
Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12(4), 381–402 (1980)
Jung, S.Y., Hong, S.L., Paek, E.: An English to Korean transliteration model of extended markov window. In: COLING, pp. 383–389 (2000)
Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Linden, K.: Multilingual modeling of cross-lingual spelling variants. Inf. Retrieval 9(3), 295–310 (2005)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Jong-Hoon, O., Key-Sun, C.: An ensemble of transliteration models for information retrieval. Inf. Process. Manage. 42(4), 980–1002 (2006)
Toivonen, J., Pirkola, A., Keskustalo, H., Visala, K., Järvelin, K.: Translating cross-lingual spelling variants using transformation rules. Inf. Process. Manage. 41(4), 859–872 (2005)
Wan, S., Verspoor, C.: Automatic English-Chinese name transliteration for development of multilingual resources. In: COLING-ACL, pp. 1352–1356 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karimi, S., Turpin, A., Scholer, F. (2006). English to Persian Transliteration. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_21
Download citation
DOI: https://doi.org/10.1007/11880561_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)