research-article

ISM@FIRE-2013 Shared Task on Transliterated Search

Authors:
Dinesh Kumar Prabhakar

Indian School of Mines, Dhanbad, jharkhand, India 826004

Indian School of Mines, Dhanbad, jharkhand, India 826004
View Profile

,
Sukomal Pal

Indian School of Mines, Dhanbad, jharkhand, India 826004

Indian School of Mines, Dhanbad, jharkhand, India 826004
View Profile

FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval EvaluationDecember 2013Article No.: 17Pages 1–6https://doi.org/10.1145/2701336.2701649

Published:04 December 2013Publication History

FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

Pages 1–6

ABSTRACT

This paper describes the approach we adopted during official submission of FIRE-2013 Shared Task on Transliterated Search along with few other approaches that we experimented post-submission. The techniques solve the problem of language labeling, by identifying query word as English or Hindi (E or H) term in mixed language sentence queries. Manual and machine learning algorithms are used. For the transliteration of H labeled word we use manual (dictionary based), generative (grapheme based) and combination of both in different algorithms. We observe that learning based classification improves labeling accuracy. Extraction based transliteration gives better result than Generation based when the terms are available in bilingual dictionary. But it may lead to incorrect transliteration if terms are wrongly aligned and the approach fails for out-of-dictionary words. In this case transliteration by generation is the only alternative. But generation alone does not perform well because of spelling variation in transliterated terms. During evaluation we also observe that transliteration systems are generally corpus-biased. Although our performance in the official submission was moderate, we obtain better results during our post-submission experiments.

References

Chinnakotla, M. K., Damani, O. P., and Satoskar, A. Transliteration for resource-scarce languages. ACM Transactions on Asian Language Information Processing (TALIP) 9, 4 (2010), 14. Google ScholarDigital Library
Chinnakotla, M. K., Ranadive, S., Damani, O. P., and Bhattacharyya, P. Hindi to english and marathi to english cross language information retrieval evaluation. In Advances in Multilingual and Multimodal Information Retrieval. Springer, 2008, pp. 111--118. Google ScholarDigital Library
Choudhury, M., Majumder, P., Roy, R. S., and Agarwal, K. Fire shared task on transliterated search. http://research.microsoft.com/en-us/events/fire13_st_on_transliteratedsearch/default.aspx, 2013. Online; accessed 10-09-2013.Google Scholar
Dale, R. Language technology. Slides of HCSNet Summer School Course. Sydney (2007).Google Scholar
Das, A., Ekbal, A., Mandal, T., and Bandyopadhyay, S. English to hindi machine transliteration system at news 2009. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 80--83. Google ScholarDigital Library
El-Kahky, A., Darwish, K., Aldein, A. S., El-Wahab, M. A., Hefny, A., and Ammar, W. Improved transliteration mining using graph reinforcement. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011), Association for Computational Linguistics, pp. 1384--1393. Google ScholarDigital Library
Gupta, K., Choudhury, M., and Bali, K. Mining hindi-english transliteration pairs from online hindi lyrics. In LREC (2012), pp. 2459--2465.Google Scholar
Karimi, S., Scholer, F., and Turpin, A. Machine transliteration survey. ACM Computing Surveys (CSUR) 43, 3 (2011), 17. Google ScholarDigital Library
Khapra, M. M., and Bhattacharyya, P. Improving transliteration accuracy using word-origin detection and lexicon lookup. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 84--87. Google ScholarDigital Library
King, B., and Abney, S. Labeling the languages of words in mixed-language documents using weakly supervised methods. In Proceedings of NAACL-HLT (2013), pp. 1110--1119.Google Scholar
Klein, D. The stanford classifier. http://http://nlp.stanford.edu/software/classifier.shtml, 2003. Online; accessed 19-02-2014.Google Scholar
Knight, K., and Graehl, J. Machine transliteration. Computational Linguistics 24, 4 (1998), 599--612. Google ScholarDigital Library
Kumaran, A., Khapra, M. M., and Bhattacharyya, P. Compositional machine transliteration. ACM Transactions on Asian Language Information Processing (TALIP) 9, 4 (2010), 13. Google ScholarDigital Library
Malik, A., Besacier, L., Boitet, C., and Bhattacharyya, P. A hybrid model for urdu hindi transliteration. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 177--185. Google ScholarDigital Library
McCallum, A. K. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu/download.php, 2002. Online; accessed 19-02-2014.Google Scholar
Rama, T., and Gali, K. Modeling machine transliteration as a phrase based statistical machine translation problem. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (2009), Association for Computational Linguistics, pp. 124--127. Google ScholarDigital Library
Roy, R. S., Choudhury, M., Majumder, P., and Agarwal, K. Overview and datasets of fire 2013 track on transliterated search. In Pre-proceedings of the FIRE 5th workshop (2013). Google ScholarDigital Library
Sharma, S., Bora, N., and Halder, M. English-hindi transliteration using statistical machine translation in different notation. Training 20000, 297380 (2012).Google Scholar
Singh, P. RomaDeva: English(roman) to hindi(devanagri) transliteration tool. https://code.google.com/p/romadeva/downloads/list, 2012. Online; accessed 19-02-2014.Google Scholar
Sowmya, V., Choudhury, M., Bali, K., Dasgupta, T., and Basu, A. Resource creation for training and testing of transliteration systems for indian languages. In LREC (2010).Google Scholar
Yoon, S.-Y., Kim, K.-Y., and Sproat, R. Multilingual transliteration using feature based phonetic method. In Annual Meeting-Association for Computational Linguistics (2007), vol. 45(1), Citeseer, pp. 112--119.Google Scholar

Index Terms

ISM@FIRE-2013 Shared Task on Transliterated Search

Recommendations

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.

Query Word Labeling is on token level language ...
Read More
ISM@FIRE-2014: Shared Task on Transliterated Search
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

This paper describe approaches we used for the Shared Task on Transliterated Search in FIRE-2014. The approaches solve identification of native languages of given terms/words and their labeling. MaxEnt a supervised classifier is used for the ...
Read More
Overview of the FIRE 2013 Track on Transliterated Search
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

In this paper, we provide an overview of the FIRE 2013 track on transliterated search and describe the datasets released as part of the track. This was the first year that the track was organized. We had proposed two subtasks as part of the challenge. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation
December 2013
105 pages
ISBN:9781450328302
DOI:10.1145/2701336
Editors:
Prasenjit Majumder
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Mandar Mitra
Indian Statistical Institute, Kolkata, India
,
Madhulika Agrawal
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Parth Mehta
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text classification
Transliteration
Word labeling
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate19of64submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 76
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ISM@FIRE-2013 Shared Task on Transliterated Search

FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Index Terms

Recommendations

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

ISM@FIRE-2014: Shared Task on Transliterated Search

Overview of the FIRE 2013 Track on Transliterated Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

ISM@FIRE-2013 Shared Task on Transliterated Search

FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

ABSTRACT

References

Cited By

Index Terms

Recommendations

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

ISM@FIRE-2014: Shared Task on Transliterated Search

Overview of the FIRE 2013 Track on Transliterated Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media