skip to main content
article

Aligning word senses using bilingual corpora

Authors Info & Claims
Published:01 June 2006Publication History
Skip Abstract Section

Abstract

The growing importance of multilingual information retrieval and machine translation has made multilingual ontologies extremely valuable resources. Since the construction of an ontology from scratch is a very expensive and time-consuming undertaking, it is attractive to consider ways of automatically aligning monolingual ontologies, which already exist for many of the world's major languages. Previous research exploited similarity in the structure of the ontologies to align, or manually created bilingual resources. These approaches cannot be used to align ontologies with vastly different structures and can only be applied to much studied language pairs for which expensive resources are already available. In this paper, we propose a novel approach to align the ontologies at the node level: Given a concept represented by a particular word sense in one ontology, our task is to find the best corresponding word sense in the second language ontology. To this end, we present a language-independent, corpus-based method that borrows from techniques used in information retrieval and machine translation. We show its efficiency by applying it to two very different ontologies in very different languages: the Mandarin Chinese HowNet and the American English WordNet. Moreover, we propose a methodology to measure bilingual corpora comparability and show that our method is robust enough to use noisy nonparallel bilingual corpora efficiently, when clean parallel corpora are not available.

References

  1. Agirre, E., Ansa, O., Hovy, E., and Martinez, D. 2000. Enriching very large ontologies using the www. In Proceedings of the Ontology Learning Workshop, ECAI 2000. Berlin, Germany.Google ScholarGoogle Scholar
  2. Asanoma, H. 2001. Alignment of ontologies: Wordnet and goi-taikei. In Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations. Pittsburgh, PA.Google ScholarGoogle Scholar
  3. Baker, C. F., Fillmore, C. J., and Lowe, J. B. 1998. The Berkeley FrameNet project. In Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, C. Boitet and P. Whitelock, Eds. Morgan Kaufmann, San Francisco, CA. 86--90. Google ScholarGoogle Scholar
  4. Brill, E. 1995. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21, 4, 543--565. Google ScholarGoogle Scholar
  5. Brown, P., Cocke, J., Della Pietra, S., Della Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., and Rossin, P. 1990. A statistical approach to machine translation. Computational Linguistics 16, 79--85. Google ScholarGoogle Scholar
  6. Church, K. 1993. Char_align: A program for aligning parallel texts at the character level. In Proceedings of the 31st Annual ACL Conference. Columbus, OH. 1--8. Google ScholarGoogle Scholar
  7. Dagan, I. and Itai, A. 1994. Word sense disambiguation using a second language monolingual corpus. Computational Linguistics 20, 4, 563--596. Google ScholarGoogle Scholar
  8. Dong, Z. 1988. Knowledge description: What, how and who? In Proceedings of International Symposium on Electronic Dictionary. Tokyo, Japan.Google ScholarGoogle Scholar
  9. Dorr, B., Levow, G., and Lin, D. 2000. Large-scale construction of a Chinese-English semantic hierarchy. Tech. rep., University of Maryland, College Park, MD.Google ScholarGoogle Scholar
  10. Fung, P. and Lo, Y. Y. 1998. An ir approach for translating new words from nonparallel, comparable texts. In Proceedings of the 36th Annual Conference of the Association for Computational Linguistics. Montreal, Canada. 414--420. Google ScholarGoogle Scholar
  11. Fung, P. and Lo, Y. Y. 1999. Understanding chinese spontaneous speech: Are mandarin and cantonese very different? In Proceedings of ISSPIS 99. Guangzhou, China.Google ScholarGoogle Scholar
  12. Kilgarriff, A. 2001. Comparing corpora. International Journal of Corpus Linguistics 6, 1, 1--37.Google ScholarGoogle Scholar
  13. Kilgarriff, A. and Rose, T. 1998. Measures for corpus similarity and homogeneity. In Proceedings of the 3rd Conference on Empirical Methods in Natural Language Processing. Granada, Spain, 46--52.Google ScholarGoogle Scholar
  14. Knight, K. and Luk, S. 1994. Building a large-scale knowledge base for machine translation. In Proceedings of the National Conference on Artificial Intelligence. AAAI. Google ScholarGoogle Scholar
  15. Lee, L. 1999. Measures of distributional similarity. In Proceedings of the 37th Conference of the Association for Computational Linguistics. College Park, MD. 25--32. Google ScholarGoogle Scholar
  16. Lenat, D. B. 1995. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38, 11, 33--38. Google ScholarGoogle Scholar
  17. Levin, B. 1993. English verb classes and alternations: A preliminary investigation. University of Chicago Press, Chicago, IL.Google ScholarGoogle Scholar
  18. Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL 98. Montreal, Canada. Google ScholarGoogle Scholar
  19. Melamed, I. 1997. Automatic discovery of non-compositional compounds in parallel data. In Proceedings of EMNLP-1997. Providence, RI.Google ScholarGoogle Scholar
  20. Miller, G. 1990. Wordnet: An on-line lexical database. International Journal of Lexicography 3, 4.Google ScholarGoogle Scholar
  21. Ngai, G. and Florian, R. 2001. Transformation-based learning in the fast lane. In Proceedings of the 39th Annual ACL Conference. Pittsburgh, PA. Google ScholarGoogle Scholar
  22. Palmer, M. and Wu, Z. 1995. Verb semantics for English-Chinese translation. Machine Translation 10, 1--2, 59--92.Google ScholarGoogle Scholar
  23. Rapp, R. 1995. Identifying word translations in non-parallel texts. In Proceedings of the 33rd Annual ACL Conference. 320--322. Google ScholarGoogle Scholar
  24. Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. Google ScholarGoogle Scholar
  25. Rigau i Claramunt, G. and Agirre, E. 2002. Semi-automatic methods for WordNet construction. In Proceedings of the first Global WordNet Conference. Mysore, India.Google ScholarGoogle Scholar
  26. Sekine, S., Sudo, K., and Ogino, T. 1999. Statistical matching of two ontologies. In Proceedings of SIGLEX99: Standardizing Lexical Resources. College Park, MD.Google ScholarGoogle Scholar
  27. Smadja, F., McKeown, K., and Hatzivassiloglou, V. 1996. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22, 1, 3. Google ScholarGoogle Scholar
  28. Vossen, P., Ed. 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Press, New York. Google ScholarGoogle Scholar
  29. Wong, P. and Fung, P. 2002. Nouns in wordnet and hownet: An analysis and comparison of semantic relations. In Proceedings of the 1st International Conference on Global Wordnet. Mysore, India.Google ScholarGoogle Scholar
  30. Wu, D. and Xia, X. 1995. Large-scale automatic extraction of an english-chinese lexicon. machine translation 9, 3--4, 295--313.Google ScholarGoogle Scholar

Index Terms

  1. Aligning word senses using bilingual corpora

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader