English to Persian Transliteration

Karimi, Sarvnaz; Turpin, Andrew; Scholer, Falk

doi:10.1007/11880561_21

Sarvnaz Karimi¹⁹,
Andrew Turpin¹⁹ &
Falk Scholer¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

705 Accesses
6 Citations

Abstract

Persian is an Indo-European language written using Arabic script, and is an official language of Iran, Afghanistan, and Tajikistan. Transliteration of Persian to English—that is, the character-by-character mapping of a Persian word that is not readily available in a bilingual dictionary—is an unstudied problem. In this paper we make three novel contributions. First, we present performance comparisons of existing grapheme-based transliteration methods on English to Persian. Second, we discuss the difficulties in establishing a corpus for studying transliteration. Finally, we introduce a new model of Persian that takes into account the habit of shortening, or even omitting, runs of English vowels. This trait makes transliteration of Persian particularly difficult for phonetic based methods. This new model outperforms the existing grapheme based methods on Persian, exhibiting a 24% relative increase in transliteration accuracy measured using the top-5 criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AbdulJaleel, N., Larkey, L.S.: Statistical transliteration for English-Arabic cross language information retrieval. In: CIKM, pp. 139–146 (2003)
Google Scholar
Bilac, S., Tanaka, H.: Direct combination of spelling and pronunciation information for robust back-transliteration. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 413–424. Springer, Heidelberg (2005)
Chapter Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computional Linguistics 19(2), 263–311 (1993)
Google Scholar
Cleary, J.G., Witten, I.H.: A comparison of enumerative and adaptive codes. IEEE Transactions on Information Theory 30(2), 306–315 (1984)
Article MathSciNet Google Scholar
Eppstein, D.: Finding the k shortest paths. SIAM J. Computing 28(2), 652–673 (1998)
Article MATH MathSciNet Google Scholar
Hall, P.A.V., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. 12(4), 381–402 (1980)
Article MathSciNet Google Scholar
Jung, S.Y., Hong, S.L., Paek, E.: An English to Korean transliteration model of extended markov window. In: COLING, pp. 383–389 (2000)
Google Scholar
Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
MathSciNet Google Scholar
Linden, K.: Multilingual modeling of cross-lingual spelling variants. Inf. Retrieval 9(3), 295–310 (2005)
Article MathSciNet Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article Google Scholar
Jong-Hoon, O., Key-Sun, C.: An ensemble of transliteration models for information retrieval. Inf. Process. Manage. 42(4), 980–1002 (2006)
Article Google Scholar
Toivonen, J., Pirkola, A., Keskustalo, H., Visala, K., Järvelin, K.: Translating cross-lingual spelling variants using transformation rules. Inf. Process. Manage. 41(4), 859–872 (2005)
Article Google Scholar
Wan, S., Verspoor, C.: Automatic English-Chinese name transliteration for development of multilingual resources. In: COLING-ACL, pp. 1352–1356 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Technology, RMIT University, GPO, Box 2476V, Melbourne, 3001, Australia
Sarvnaz Karimi, Andrew Turpin & Falk Scholer

Authors

Sarvnaz Karimi
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Turpin
View author publications
You can also search for this author in PubMed Google Scholar
Falk Scholer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Strathclyde, Scotland
Fabio Crestani
Dipartimento di Informatica, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Paolo Ferragina
Department of Information Studies, University of Sheffield, Sheffield, UK
Mark Sanderson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karimi, S., Turpin, A., Scholer, F. (2006). English to Persian Transliteration. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_21

Download citation

DOI: https://doi.org/10.1007/11880561_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics