Article

Free Access

Contextual word similarity and estimation from sparse data

Authors:
Ido Dagan

AT&T Bell Laboratories, Murray Hill, NJ

AT&T Bell Laboratories, Murray Hill, NJ
View Profile

,
Shaul Marcus

Technion, Haifa, Israel

Technion, Haifa, Israel
View Profile

,
Shaul Markovitch

Technion, Haifa, Israel

Technion, Haifa, Israel
View Profile

ACL '93: Proceedings of the 31st annual meeting on Association for Computational LinguisticsJune 1993Pages 164–171https://doi.org/10.3115/981574.981596

Published:22 June 1993Publication History

ACL '93: Proceedings of the 31st annual meeting on Association for Computational Linguistics

Pages 164–171

ABSTRACT

In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words, as determined by an appropriate word similarity metric. Our evaluation suggests that this method performs better than existing smoothing methods, and may provide an alternative to class based models.

References

Peter Brown, Vincent Della Pietra, Peter deSouza, Jenifer Lai, and Robert Mercer. Class-based n-gram models of natural language. Computational Linguistics. (To appear). Google ScholarDigital Library
P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer, 1991. Word sense disambiguation using statistical methods. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
Kenneth W. Church and William A. Gale. 1991. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language, 5:19--54.Google ScholarCross Ref
Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22--29. Google ScholarDigital Library
Kenneth W. Church and Robert L. Mercer. 1992. Introduction to the special issue in computational linguistics using large corpora. Computational Linguistics. (In press). Google ScholarDigital Library
Ido Dagan and Alon Itai. 1990. Automatic acquisition of constraints for the resolution of anaphora references and syntactic ambiguities. In Proc. of COLING. Google ScholarDigital Library
Ido Dagan, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
R. Fano. 1961. Transmission of Information. Cambridge, Mass: MIT Press.Google Scholar
William Gale, Kenneth Church, and David Yarowsky. 1992. Using bilingual materials to develop word sense disambiguation methods. In Proc. of the International Conference on Theoretical and Methodolgical Issues in Machine Translation.Google Scholar
I. J. Good. 1953. The population frequencies of species and the estimation of population parameters. Biometrika, 40:237--264.Google ScholarCross Ref
R. Grishman, L. Hirschman, and Ngo Thanh Nhan. 1986. Discovery procedures for sublanguage selectional patterns - initial experiments. Computational Linguistics, 12:205--214. Google ScholarDigital Library
D. Hindle and M. Rooth. 1991. Structural ambiguity and lexical relations. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
D. Hindle. 1990. Noun classification from predicate-argument structures. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
L. Hirschman. 1986. Discovering sublanguage structures. In R. Grishman and R. Kittredge, editors, Analyzing Language in Restricted Domains: Sublanguage Description and Processing, pages 211--234. Lawrence Erlbaum Associates.Google Scholar
F. Jelinek and R. Mercer. 1985. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin, 28:2591--2594.Google Scholar
Frederick Jelinek. 1990. Self-organized language modeling for speech recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, pages 450--506. Morgan Kaufmann Publishers, Inc., San Maeio, California. Google ScholarDigital Library
Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, speech, and Signal Processing, 35(3):400--401.Google ScholarCross Ref
Yoelle Maarek and Frank Smadja. 1989. Full text indexing based on lexical relations - An application: Software libraries. In Proc. of SIGIR. Google ScholarDigital Library
Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library
Philip Resnik. 1992. Wordnet and distributional analysis: A class-based approach to lexical discovery. In AAAI Workshop on Statistically-based Natural Language Processing Techniques, July.Google Scholar
V. Sadler. 1989. Working with analogical semantics: Disambiguation techniques in DLT. Foris Publications.Google Scholar
Frank Smadja and Katheleen McKeown. 1990. Automatically extracting and representing collocations for language generation. In Proc. of the Annual Meeting of the ACL. Google ScholarDigital Library

Contextual word similarity and estimation from sparse data
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Similarity-based estimation of word cooccurrence probabilities
ACL '94: Proceedings of the 32nd annual meeting on Association for Computational Linguistics

In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is ...
Read More
Similarity-Based Models of Word Cooccurrence Probabilities
Special issue on natural language learning

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and ”eat a beach” is ...
Read More
Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection
Computational Collective Intelligence
Abstract
Multiword Expression (MWE) detection is a crucial problem for many NLP applications. Recent methods approach it as a sequence labeling task and require manually annotated corpus. Traditional methods are based on statistical association measures ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '93: Proceedings of the 31st annual meeting on Association for Computational Linguistics
June 1993
320 pages
Program Chair:
Lenhart Schubert
University of Rochester
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 22 June 1993
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 989
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Contextual word similarity and estimation from sparse data

ACL '93: Proceedings of the 31st annual meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Similarity-based estimation of word cooccurrence probabilities

Similarity-Based Models of Word Cooccurrence Probabilities

Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Contextual word similarity and estimation from sparse data

ACL '93: Proceedings of the 31st annual meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Similarity-based estimation of word cooccurrence probabilities

Similarity-Based Models of Word Cooccurrence Probabilities

Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media