skip to main content
10.3115/1220175.1220317dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Learning transliteration lexicons from the web

Authors Info & Claims
Published:17 July 2006Publication History

ABSTRACT

This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.

References

  1. E. Brill, G. Kacmarcik, C. Brockett. 2001. Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs, In Proc. of NLPPRS, pp. 393--399.Google ScholarGoogle Scholar
  2. S. Brin and L. Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine, In Proc. of 7th WWW, pp. 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. P. Dempster, N. M. Laird and D. B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Ser. B. Vol. 39, pp. 1--38.Google ScholarGoogle Scholar
  4. P. Fung and L.-Y. Yee. 1998. An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In Proc. of 17th COLING and 36th ACL, pp. 414--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Huang, Y. Zhang and Stephan Vogel. 2005. Mining Key Phrase Translations from Web Corpora. In Proc. of HLT-EMNLP, pp. 483--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Jurafsky and J. H. Martin. 2000. Speech and Language Processing, pp. 102--120, Prentice-Hall, New Jersey. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Knight and J. Graehl. 1998. Machine Transliteration, Computational Linguistics, Vol. 24, No. 4, pp. 599--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J.-S. Kuo and Y.-K. Yang. 2004. Constructing Transliterations Lexicons from Web Corpora, In the Companion Volume, 42nd ACL, pp. 102--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.-S. Kuo and Y.-K. Yang. 2005. Incorporating Pronunciation Variation into Extraction of Transliterated-term Pairs from Web Corpora, In Proc. of ICCC, pp. 131--138.Google ScholarGoogle Scholar
  10. C.-J. Lee and J.-S. Chang. 2003. Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts Using a Statistical Machine Transliteration Model, In Proc. of HLT-NAACL Workshop Data Driven MT and Beyond, pp. 96--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. D. Lewis and J. Catlett. 1994. Heterogeneous Uncertainty Sampling for Supervised Learning, In Proc. of ICML 1994, pp. 148--156.Google ScholarGoogle Scholar
  12. H. Li, M. Zhang and J. Su. 2004. A Joint Source Channel Model for Machine Transliteration, In Proc. of 42nd ACL, pp. 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Lam, R.-Z. Huang and P.-S. Cheung. 2004. Learning Phonetic Similarity for Matching Named Entity Translations and Mining New Translations, In Proc. of 27th ACM SIGIR, pp. 289--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W.-H. Lu, L.-F. Chien and H.-J Lee. 2002. Translation of Web Queries Using Anchor Text Mining, TALIP, Vol. 1, Issue 2, pp. 159--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. M. Meng, W.-K. Lo, B. Chen and T. Tang. 2001. Generate Phonetic Cognates to Handle Name Entities in English-Chinese Cross-Language Spoken Document Retrieval, In Proc. of ASRU, pp. 311--314.Google ScholarGoogle Scholar
  16. J.-Y. Nie, P. Isabelle, M. Simard, and R. Durand. 1999. Cross-language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Text from the Web", In Proc. of 22nd ACM SIGIR, pp 74--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Pagel, K. Lenzo and A. Black. 1998. Letter to Sound Rules for Accented Lexicon Compression, In Proc. of ICSLP, pp. 2015--2020.Google ScholarGoogle Scholar
  18. R. Rapp. 1999. Automatic Identification of Word Translations from Unrelated English and German Corpora, In Proc. of 37th ACL, pp. 519--526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Riccardi and D. Hakkani-Tür. 2003. Active and Unsupervised Learning for Automatic Speech Recognition. In Proc. of 8th Eurospeech.Google ScholarGoogle Scholar
  20. P. Virga and S. Khudanpur. 2003. Transliteration of Proper Names in Cross-Lingual Information Retrieval, In Proc. of 41st ACL Workshop on Multilingual and Mixed Language Named Entity Recognition, pp. 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Wan and C. M. Verspoor. 1998. Automatic English-Chinese Name Transliteration for Development of Multilingual Resources, In Proc. of 17th COLING and 36th ACL, pp. 1352--1356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Learning transliteration lexicons from the web

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
        July 2006
        1214 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 17 July 2006

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader