skip to main content
article

Estimating satisfactoriness of selectional restriction from corpus without a thesaurus

Published:01 December 2005Publication History
Skip Abstract Section

Abstract

A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in natural language processing and has been used for syntactic and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This paper proposes a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations 〈noun, postpositional-particle, verb〉. The experimental results indicate that our method estimates the degree of satisfaction of a word combination not very well observed in the corpus, and that the accuracy of syntactic disambiguation using the co-occurrencies estimated by our method is higher than using co-occurrence probabilities smoothed by previous methods.

References

  1. Dagan, I., Lee, L., and Pereira, F. 1999. Similarity-based models of word co-occurrence probabilities. Machine Learning 34, 1-3 (Feb.), 43--69. Google ScholarGoogle Scholar
  2. Grishman, R. and Sterling, J. 1994. Generalizing automatically generated selectional patterns. In Proceedings of COLING'94. 742--747. Google ScholarGoogle Scholar
  3. Hindle, D. 1990. Noun classification from predicate-argument structures. In Proceedings of the 28th Annual Meeting of ACL. 268--275. Google ScholarGoogle Scholar
  4. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 50--57. Google ScholarGoogle Scholar
  5. Kawahara, D. and Kurohashi, S. 2002. Case frame construction by coupling the predicate and its closest case component. Journal of Natural Language Processing 9, 1 (Jan.), 3--19 (in Japanese).Google ScholarGoogle Scholar
  6. Lee, L. J. 1997. Similarity-based approaches to natural language processing. Ph.D. thesis, Harvard University, Cambridge, Massachusetts. Google ScholarGoogle Scholar
  7. Nagamatsu, K. and Tanaka, H. 1996. Evaluation of a similarity measure based on co-occurrence and dependency between words. In IPSJ SIG Notes 96-NL-116. Information Processing Society of Japan, 73--78 (in Japanese).Google ScholarGoogle Scholar
  8. Pereira, F., Tishby, N., and Lee, L. 1993. Distributional clustering of english words. In Proceedings of the 31st Annual Meeting of ACL. 183--190. Google ScholarGoogle Scholar
  9. Uchimoto, K., Sekine, S., and Isahara, H. 1999. Japanese dependency structure analysis based on maximum entropy models. Trans. of Information Processing Society of Japan 40, 9 (Sept.), 3397--3407 (in Japanese). Google ScholarGoogle Scholar

Index Terms

  1. Estimating satisfactoriness of selectional restriction from corpus without a thesaurus

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader