ABSTRACT
In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words, as determined by an appropriate word similarity metric. Our evaluation suggests that this method performs better than existing smoothing methods, and may provide an alternative to class based models.
- Peter Brown, Vincent Della Pietra, Peter deSouza, Jenifer Lai, and Robert Mercer. Class-based n-gram models of natural language. Computational Linguistics. (To appear). Google ScholarDigital Library
- P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer, 1991. Word sense disambiguation using statistical methods. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
- Kenneth W. Church and William A. Gale. 1991. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language, 5:19--54.Google ScholarCross Ref
- Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29. Google ScholarDigital Library
- Kenneth W. Church and Robert L. Mercer. 1992. Introduction to the special issue in computational linguistics using large corpora. Computational Linguistics. (In press). Google ScholarDigital Library
- Ido Dagan and Alon Itai. 1990. Automatic acquisition of constraints for the resolution of anaphora references and syntactic ambiguities. In Proc. of COLING. Google ScholarDigital Library
- Ido Dagan, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
- R. Fano. 1961. Transmission of Information. Cambridge, Mass: MIT Press.Google Scholar
- William Gale, Kenneth Church, and David Yarowsky. 1992. Using bilingual materials to develop word sense disambiguation methods. In Proc. of the International Conference on Theoretical and Methodolgical Issues in Machine Translation.Google Scholar
- I. J. Good. 1953. The population frequencies of species and the estimation of population parameters. Biometrika, 40:237--264.Google ScholarCross Ref
- R. Grishman, L. Hirschman, and Ngo Thanh Nhan. 1986. Discovery procedures for sublanguage selectional patterns - initial experiments. Computational Linguistics, 12:205--214. Google ScholarDigital Library
- D. Hindle and M. Rooth. 1991. Structural ambiguity and lexical relations. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
- D. Hindle. 1990. Noun classification from predicate-argument structures. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
- L. Hirschman. 1986. Discovering sublanguage structures. In R. Grishman and R. Kittredge, editors, Analyzing Language in Restricted Domains: Sublanguage Description and Processing, pages 211--234. Lawrence Erlbaum Associates.Google Scholar
- F. Jelinek and R. Mercer. 1985. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin, 28:2591--2594.Google Scholar
- Frederick Jelinek. 1990. Self-organized language modeling for speech recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, pages 450--506. Morgan Kaufmann Publishers, Inc., San Maeio, California. Google ScholarDigital Library
- Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, speech, and Signal Processing, 35(3):400--401.Google ScholarCross Ref
- Yoelle Maarek and Frank Smadja. 1989. Full text indexing based on lexical relations - An application: Software libraries. In Proc. of SIGIR. Google ScholarDigital Library
- Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
- Philip Resnik. 1992. Wordnet and distributional analysis: A class-based approach to lexical discovery. In AAAI Workshop on Statistically-based Natural Language Processing Techniques, July.Google Scholar
- V. Sadler. 1989. Working with analogical semantics: Disambiguation techniques in DLT. Foris Publications.Google Scholar
- Frank Smadja and Katheleen McKeown. 1990. Automatically extracting and representing collocations for language generation. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
- Contextual word similarity and estimation from sparse data
Recommendations
Similarity-based estimation of word cooccurrence probabilities
ACL '94: Proceedings of the 32nd annual meeting on Association for Computational LinguisticsIn many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is ...
Similarity-Based Models of Word Cooccurrence Probabilities
Special issue on natural language learningIn many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and ”eat a beach” is ...
Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection
Computational Collective IntelligenceAbstractMultiword Expression (MWE) detection is a crucial problem for many NLP applications. Recent methods approach it as a sequence labeling task and require manually annotated corpus. Traditional methods are based on statistical association measures ...
Comments