skip to main content
10.1145/2701336.2701649acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

ISM@FIRE-2013 Shared Task on Transliterated Search

Published:04 December 2013Publication History

ABSTRACT

This paper describes the approach we adopted during official submission of FIRE-2013 Shared Task on Transliterated Search along with few other approaches that we experimented post-submission. The techniques solve the problem of language labeling, by identifying query word as English or Hindi (E or H) term in mixed language sentence queries. Manual and machine learning algorithms are used. For the transliteration of H labeled word we use manual (dictionary based), generative (grapheme based) and combination of both in different algorithms. We observe that learning based classification improves labeling accuracy. Extraction based transliteration gives better result than Generation based when the terms are available in bilingual dictionary. But it may lead to incorrect transliteration if terms are wrongly aligned and the approach fails for out-of-dictionary words. In this case transliteration by generation is the only alternative. But generation alone does not perform well because of spelling variation in transliterated terms. During evaluation we also observe that transliteration systems are generally corpus-biased. Although our performance in the official submission was moderate, we obtain better results during our post-submission experiments.

References

  1. Chinnakotla, M. K., Damani, O. P., and Satoskar, A. Transliteration for resource-scarce languages. ACM Transactions on Asian Language Information Processing (TALIP) 9, 4 (2010), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chinnakotla, M. K., Ranadive, S., Damani, O. P., and Bhattacharyya, P. Hindi to english and marathi to english cross language information retrieval evaluation. In Advances in Multilingual and Multimodal Information Retrieval. Springer, 2008, pp. 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Choudhury, M., Majumder, P., Roy, R. S., and Agarwal, K. Fire shared task on transliterated search. http://research.microsoft.com/en-us/events/fire13_st_on_transliteratedsearch/default.aspx, 2013. Online; accessed 10-09-2013.Google ScholarGoogle Scholar
  4. Dale, R. Language technology. Slides of HCSNet Summer School Course. Sydney (2007).Google ScholarGoogle Scholar
  5. Das, A., Ekbal, A., Mandal, T., and Bandyopadhyay, S. English to hindi machine transliteration system at news 2009. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 80--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. El-Kahky, A., Darwish, K., Aldein, A. S., El-Wahab, M. A., Hefny, A., and Ammar, W. Improved transliteration mining using graph reinforcement. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011), Association for Computational Linguistics, pp. 1384--1393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gupta, K., Choudhury, M., and Bali, K. Mining hindi-english transliteration pairs from online hindi lyrics. In LREC (2012), pp. 2459--2465.Google ScholarGoogle Scholar
  8. Karimi, S., Scholer, F., and Turpin, A. Machine transliteration survey. ACM Computing Surveys (CSUR) 43, 3 (2011), 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Khapra, M. M., and Bhattacharyya, P. Improving transliteration accuracy using word-origin detection and lexicon lookup. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 84--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. King, B., and Abney, S. Labeling the languages of words in mixed-language documents using weakly supervised methods. In Proceedings of NAACL-HLT (2013), pp. 1110--1119.Google ScholarGoogle Scholar
  11. Klein, D. The stanford classifier. http://http://nlp.stanford.edu/software/classifier.shtml, 2003. Online; accessed 19-02-2014.Google ScholarGoogle Scholar
  12. Knight, K., and Graehl, J. Machine transliteration. Computational Linguistics 24, 4 (1998), 599--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kumaran, A., Khapra, M. M., and Bhattacharyya, P. Compositional machine transliteration. ACM Transactions on Asian Language Information Processing (TALIP) 9, 4 (2010), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Malik, A., Besacier, L., Boitet, C., and Bhattacharyya, P. A hybrid model for urdu hindi transliteration. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 177--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. McCallum, A. K. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu/download.php, 2002. Online; accessed 19-02-2014.Google ScholarGoogle Scholar
  16. Rama, T., and Gali, K. Modeling machine transliteration as a phrase based statistical machine translation problem. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 124--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Roy, R. S., Choudhury, M., Majumder, P., and Agarwal, K. Overview and datasets of fire 2013 track on transliterated search. In Pre-proceedings of the FIRE 5th workshop (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sharma, S., Bora, N., and Halder, M. English-hindi transliteration using statistical machine translation in different notation. Training 20000, 297380 (2012).Google ScholarGoogle Scholar
  19. Singh, P. RomaDeva: English(roman) to hindi(devanagri) transliteration tool. https://code.google.com/p/romadeva/downloads/list, 2012. Online; accessed 19-02-2014.Google ScholarGoogle Scholar
  20. Sowmya, V., Choudhury, M., Bali, K., Dasgupta, T., and Basu, A. Resource creation for training and testing of transliteration systems for indian languages. In LREC (2010).Google ScholarGoogle Scholar
  21. Yoon, S.-Y., Kim, K.-Y., and Sproat, R. Multilingual transliteration using feature based phonetic method. In Annual Meeting-Association for Computational Linguistics (2007), vol. 45(1), Citeseer, pp. 112--119.Google ScholarGoogle Scholar

Index Terms

  1. ISM@FIRE-2013 Shared Task on Transliterated Search

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation
            December 2013
            105 pages
            ISBN:9781450328302
            DOI:10.1145/2701336
            • Editors:
            • Prasenjit Majumder,
            • Mandar Mitra,
            • Madhulika Agrawal,
            • Parth Mehta

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 December 2013

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate19of64submissions,30%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader