Abstract
This paper presents a machine learning system for parsing natural language that learns from manually parsed example sentences, and parses unseen data at state-of-the-art accuracies. Its machine learning technology, based on the maximum entropy framework, is highly reusable and not specific to the parsing problem, while the linguistic hints that it uses to learn can be specified concisely. It therefore requires a minimal amount of human effort and linguistic knowledge for its construction. In practice, the running time of the parser on a test sentence is linear with respect to the sentence length. We also demonstrate that the parser can train from other domains without modification to the modeling framework or the linguistic hints it uses to learn. Furthermore, this paper shows that research into rescoring the top 20 parses returned by the parser might yield accuracies dramatically higher than the state-of-the-art.
Article PDF
Similar content being viewed by others
References
Abney, S. (1991). Parsing by Chunks. In R. Berwick, S. Abney, & C. Tenny (Eds.), Principle-based parsing. Kluwer Academic Publishers.
Aho, A.V., Sethi, R., & Ullman, J.D. (1988). Compilers: Principles techniques and tools. Addison Wesley.
Allen, J. (1995). Natural language understanding. Benjamin/Cummings Publishing.
Berger, A., Della Pietra, S.A., & Della Pietra, V.J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39–71.
Black, E. et al. (1991). A procedure for quantitatively comparing the syntactic coverage of English grammars. Proceedings of the February 1991 DARPA Speech and Natural Language Workshop (pp. 306–311).
Black, E., Jelinek, F., Lafferty, J., Magerman, D.M., Mercer, R., & Roukos, S. (1993). Towards history-based grammars: Using Richer models for probabilistic parsing. Proceedings of the 31st Annual Meeting of the ACL, Columbus, OH.
Briscoe, T., & Carroll, J. (1993). Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars. Computational Linguistics, 19(1).
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. Fourteenth National Conference on Artificial Intelligence, Providence, RI.
Church, K. (1988). A stochastic parts program and noun phrase chunker for unrestricted text. Proceedings of the Second Conference on Applied Natural Language Processing.
Collins, M.J. (1996). A new statistical parser based on bigram lexical dependencies. Proceedings of the 34th Annual Meeting of the ACL.
Collins, M. (1997). Three generative, lexicalised models for statistical parsing. Proceedings of the 35th Annual Meeting of the ACL, and 8th Conference of the EACL, Madrid, Spain. ACL.
Darroch, J.N., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5), 1470–1480.
Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions Pattern Analysis and Machine Intelligence, 19(4).
Francis, W.N., & Kucera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin.
Goodman, J. (1997). Probabilistic feature grammars. Proceedings of the International Workshop on Parsing Technologies.
Hermjakob, U., & Mooney, R.J. (1997). Learning parse and translation decision from examples with rich context. Proceedings of the 35th Annual Meeting of the ACL, and 8th Conference of the EACL, Madrid, Spain. ACL.
Jaynes, E.T. (1957). Information theory and statistical mechanics. Physical Review, 106, 620–630.
Jelinek, F., Lafferty, J., Magerman, D.M., Mercer, R., Ratnaparkhi, A., & Roukos, S. (1994). Decision tree parsing using a hidden derivational model. Proceedings of the Human Language Technology Workshop (pp. 272–277). Plainsboro, NJ: ARPA.
Lau, R., Rosenfeld, R., & Roukos, S. (1993). Adaptive language modeling using the maximum entropy principle. Proceedings of the Human Language Technology Workshop (pp. 108–113). ARPA.
Magerman, D.M. (1995). Statistical decision-tree models for parsing. Proceedings of the 33rd Annual Meeting of the ACL.
Marcus, M.P. (1980). A theory of syntactic recognition for natural language. Cambridge, MA: MIT Press.
Marcus, M.P., Santorini, B., & Marcinkiewicz, M.A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Ramshaw, L.A., & Marcus, M.P. (1995). Text chunking using transformation-based learning. In D. Yarowsky, & K. Church (Eds.), Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA.
Ratnaparkhi, A. (1996). A maximum entropy part of speech tagger. In E. Brill, & K. Church (Eds.), Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania.
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. Computer, Speech, and Language, 10.
Sekine, S. (1997). The domain dependence of parsing. Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 96–102). Washington, DC.
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L., & Palmucci, J. (1993). Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 19(2), 359–382.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ratnaparkhi, A. Learning to Parse Natural Language with Maximum Entropy Models. Machine Learning 34, 151–175 (1999). https://doi.org/10.1023/A:1007502103375
Issue Date:
DOI: https://doi.org/10.1023/A:1007502103375