Learning to Parse Natural Language with Maximum Entropy Models

Ratnaparkhi, Adwait

doi:10.1023/A:1007502103375

Learning to Parse Natural Language with Maximum Entropy Models

Published: February 1999

Volume 34, pages 151–175, (1999)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning to Parse Natural Language with Maximum Entropy Models

Download PDF

Adwait Ratnaparkhi¹

2007 Accesses
84 Citations
Explore all metrics

Abstract

This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the state-of-the-art.

References

Abney, S. (1991). Parsing by Chunks. In R. Berwick, S. Abney, & C. Tenny (Eds.), Principle-based parsing. Kluwer Academic Publishers.
Aho, A.V., Sethi, R., & Ullman, J.D. (1988). Compilers: Principles techniques and tools. Addison Wesley.
Allen, J. (1995). Natural language understanding. Benjamin/Cummings Publishing.
Berger, A., Della Pietra, S.A., & Della Pietra, V.J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
Google Scholar
Black, E. et al. (1991). A procedure for quantitatively comparing the syntactic coverage of English grammars. Proceedings of the February 1991 DARPA Speech and Natural Language Workshop (pp. 306–311).
Black, E., Jelinek, F., Lafferty, J., Magerman, D.M., Mercer, R., & Roukos, S. (1993). Towards history-based grammars: Using Richer models for probabilistic parsing. Proceedings of the 31st Annual Meeting of the ACL, Columbus, OH.
Briscoe, T., & Carroll, J. (1993). Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars. Computational Linguistics, 19(1).
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. Fourteenth National Conference on Artificial Intelligence, Providence, RI.
Church, K. (1988). A stochastic parts program and noun phrase chunker for unrestricted text. Proceedings of the Second Conference on Applied Natural Language Processing.
Collins, M.J. (1996). A new statistical parser based on bigram lexical dependencies. Proceedings of the 34th Annual Meeting of the ACL.
Collins, M. (1997). Three generative, lexicalised models for statistical parsing. Proceedings of the 35th Annual Meeting of the ACL, and 8th Conference of the EACL, Madrid, Spain. ACL.
Darroch, J.N., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5), 1470–1480.
Google Scholar
Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions Pattern Analysis and Machine Intelligence, 19(4).
Francis, W.N., & Kucera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin.
Google Scholar
Goodman, J. (1997). Probabilistic feature grammars. Proceedings of the International Workshop on Parsing Technologies.
Hermjakob, U., & Mooney, R.J. (1997). Learning parse and translation decision from examples with rich context. Proceedings of the 35th Annual Meeting of the ACL, and 8th Conference of the EACL, Madrid, Spain. ACL.
Jaynes, E.T. (1957). Information theory and statistical mechanics. Physical Review, 106, 620–630.
Google Scholar
Jelinek, F., Lafferty, J., Magerman, D.M., Mercer, R., Ratnaparkhi, A., & Roukos, S. (1994). Decision tree parsing using a hidden derivational model. Proceedings of the Human Language Technology Workshop (pp. 272–277). Plainsboro, NJ: ARPA.
Google Scholar
Lau, R., Rosenfeld, R., & Roukos, S. (1993). Adaptive language modeling using the maximum entropy principle. Proceedings of the Human Language Technology Workshop (pp. 108–113). ARPA.
Magerman, D.M. (1995). Statistical decision-tree models for parsing. Proceedings of the 33rd Annual Meeting of the ACL.
Marcus, M.P. (1980). A theory of syntactic recognition for natural language. Cambridge, MA: MIT Press.
Google Scholar
Marcus, M.P., Santorini, B., & Marcinkiewicz, M.A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Ramshaw, L.A., & Marcus, M.P. (1995). Text chunking using transformation-based learning. In D. Yarowsky, & K. Church (Eds.), Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA.
Ratnaparkhi, A. (1996). A maximum entropy part of speech tagger. In E. Brill, & K. Church (Eds.), Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania.
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. Computer, Speech, and Language, 10.
Sekine, S. (1997). The domain dependence of parsing. Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 96–102). Washington, DC.
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L., & Palmucci, J. (1993). Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 19(2), 359–382.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, 200 South 33rd Street, Philadelphia, PA, 19104-6389
Adwait Ratnaparkhi

Authors

Adwait Ratnaparkhi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ratnaparkhi, A. Learning to Parse Natural Language with Maximum Entropy Models. Machine Learning 34, 151–175 (1999). https://doi.org/10.1023/A:1007502103375

Download citation

Issue Date: February 1999
DOI: https://doi.org/10.1023/A:1007502103375

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning to Parse Natural Language with Maximum Entropy Models

Abstract

Article PDF

Similar content being viewed by others

Maximum Entropy Models for Natural Language Processing

Natural Language Processing, Moving from Rules to Data

Learning Domain-Specific Grammars from a Small Number of Examples

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning to Parse Natural Language with Maximum Entropy Models

Abstract

Article PDF

Similar content being viewed by others

Maximum Entropy Models for Natural Language Processing

Natural Language Processing, Moving from Rules to Data

Learning Domain-Specific Grammars from a Small Number of Examples

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation