ABSTRACT
This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.
- E. Brill, G. Kacmarcik, C. Brockett. 2001. Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs, In Proc. of NLPPRS, pp. 393--399.Google Scholar
- S. Brin and L. Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine, In Proc. of 7th WWW, pp. 107--117. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Ser. B. Vol. 39, pp. 1--38.Google Scholar
- P. Fung and L.-Y. Yee. 1998. An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In Proc. of 17th COLING and 36th ACL, pp. 414--420. Google ScholarDigital Library
- F. Huang, Y. Zhang and Stephan Vogel. 2005. Mining Key Phrase Translations from Web Corpora. In Proc. of HLT-EMNLP, pp. 483--490. Google ScholarDigital Library
- D. Jurafsky and J. H. Martin. 2000. Speech and Language Processing, pp. 102--120, Prentice-Hall, New Jersey. Google ScholarDigital Library
- K. Knight and J. Graehl. 1998. Machine Transliteration, Computational Linguistics, Vol. 24, No. 4, pp. 599--612. Google ScholarDigital Library
- J.-S. Kuo and Y.-K. Yang. 2004. Constructing Transliterations Lexicons from Web Corpora, In the Companion Volume, 42nd ACL, pp. 102--105. Google ScholarDigital Library
- J.-S. Kuo and Y.-K. Yang. 2005. Incorporating Pronunciation Variation into Extraction of Transliterated-term Pairs from Web Corpora, In Proc. of ICCC, pp. 131--138.Google Scholar
- C.-J. Lee and J.-S. Chang. 2003. Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts Using a Statistical Machine Transliteration Model, In Proc. of HLT-NAACL Workshop Data Driven MT and Beyond, pp. 96--103. Google ScholarDigital Library
- D. D. Lewis and J. Catlett. 1994. Heterogeneous Uncertainty Sampling for Supervised Learning, In Proc. of ICML 1994, pp. 148--156.Google Scholar
- H. Li, M. Zhang and J. Su. 2004. A Joint Source Channel Model for Machine Transliteration, In Proc. of 42nd ACL, pp. 159--166. Google ScholarDigital Library
- W. Lam, R.-Z. Huang and P.-S. Cheung. 2004. Learning Phonetic Similarity for Matching Named Entity Translations and Mining New Translations, In Proc. of 27th ACM SIGIR, pp. 289--296. Google ScholarDigital Library
- W.-H. Lu, L.-F. Chien and H.-J Lee. 2002. Translation of Web Queries Using Anchor Text Mining, TALIP, Vol. 1, Issue 2, pp. 159--172. Google ScholarDigital Library
- H. M. Meng, W.-K. Lo, B. Chen and T. Tang. 2001. Generate Phonetic Cognates to Handle Name Entities in English-Chinese Cross-Language Spoken Document Retrieval, In Proc. of ASRU, pp. 311--314.Google Scholar
- J.-Y. Nie, P. Isabelle, M. Simard, and R. Durand. 1999. Cross-language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Text from the Web", In Proc. of 22nd ACM SIGIR, pp 74--81. Google ScholarDigital Library
- V. Pagel, K. Lenzo and A. Black. 1998. Letter to Sound Rules for Accented Lexicon Compression, In Proc. of ICSLP, pp. 2015--2020.Google Scholar
- R. Rapp. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora, In Proc. of 37th ACL, pp. 519--526. Google ScholarDigital Library
- G. Riccardi and D. Hakkani-Tür. 2003. Active and Unsupervised Learning for Automatic Speech Recognition. In Proc. of 8th Eurospeech.Google Scholar
- P. Virga and S. Khudanpur. 2003. Transliteration of Proper Names in Cross-Lingual Information Retrieval, In Proc. of 41st ACL Workshop on Multilingual and Mixed Language Named Entity Recognition, pp. 57--64. Google ScholarDigital Library
- S. Wan and C. M. Verspoor. 1998. Automatic English-Chinese Name Transliteration for Development of Multilingual Resources, In Proc. of 17th COLING and 36th ACL, pp. 1352--1356. Google ScholarDigital Library
- Learning transliteration lexicons from the web
Recommendations
Active learning for constructing transliteration lexicons from the Web
This article presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration ...
Mining the Web for Transliteration Lexicons: Joint-Validation Approach
WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web IntelligenceThe Web provides the largest data collection, which reflects language use in daily life. With the advent of new technology and the flood of information on the Web, it has become quite common to create new terms supporting new concepts and translate ...
The viability of web-derived polarity lexicons
HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational LinguisticsWe examine the viability of building large polarity lexicons semi-automatically from the web. We begin by describing a graph propagation framework inspired by previous work on constructing polarity lexicons from lexical graphs (Kim and Hovy, 2004; Hu ...
Comments