A Neural Syntactic Language Model

Emami, Ahmad; Jelinek, Frederick

doi:10.1007/s10994-005-0916-y

A Neural Syntactic Language Model

Original Articles
Published: 02 June 2005

Volume 60, pages 195–227, (2005)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Neural Syntactic Language Model

Download PDF

Ahmad Emami¹ &
Frederick Jelinek¹

1394 Accesses
29 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a study of using neural probabilistic models in a syntactic based language model. The neural probabilistic model makes use of a distributed representation of the items in the conditioning history, and is powerful in capturing long dependencies. Employing neural network based models in the syntactic based language model enables it to use efficiently the large amount of information available in a syntactic parse in estimating the next word in a string. Several scenarios of integrating neural networks in the syntactic based language model are presented, accompanied by the derivation of the training procedures involved. Experiments on the UPenn Treebank and the Wall Street Journal corpus show significant improvements in perplexity and word error rate over the baseline SLM. Furthermore, comparisons with the standard and neural net based N-gram models with arbitrarily long contexts show that the syntactic information is in fact very helpful in estimating the word string probability. Overall, our neural syntactic based model achieves the best published results in perplexity and WER for the given data sets.

References

Bellegarda, J. R. (1997). A latent semantic analysis framework for large–span language modeling. In Proceedings of the 5th European Conference on Speech Communication and Technology (pp. 1451&1454). Vol. 3. Rhodes, Greece.
Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Advances in Neural Information Processing Systems, 13, 933–938.
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neuralprobabilistic language model. Journal of Machine Learning Reseach, 3, 1137–1155.
Article Google Scholar
Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropyapproach to natural language processing. Computational Linguistics, 22:1, 39–72.
Google Scholar
Bridle, J. S. (1989). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical patternrecognition. In F. Fougelman-Soulie and J. Herault (Eds.), Neuro-computing: Algorithms, architectures and applicatations (pp. 227&236).
Byrne, W., Gunawardana, A., & Khudanpur, S. (1998). Information geometry and EMvariants. Technical Report CLSP Research Note (17). Department of Electrical andComputer Engineering, The Johns Hopkins University, Baltimore, MD.
Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting and 10th Conference of the European Chapter of ACL (pp. 116–123). Toulouse, France.
Chelba, C. (1997). A structured language model. In ACL-EACL, Student Section (pp. 498&500). Madrid, Spain.
Chelba, C., & Jelinek, F. (2000). Structured language modeling. Computer Speech and Language, 14:4, 283–332.
Article Google Scholar
Chelba, C., & Xu, P. (2001). Richer syntactic dependencies for structuredlanguage modeling. In Proceedings of the Automatic Speech Recognition and Understanding Workshop. Madonna di Campiglio, Trento-Italy.
Chen, S. F. & Goodman, J. (1999). An empirical study of smoothing techniquesfor language modeling. Computer Speech and Language, 13, 359–394.
Article Google Scholar
Collins, M. (1996). A new statistical parser based on bigram lexicaldependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 184&191). Santa Cruz, CA.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41:6, 391–407.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood fromincomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.
Google Scholar
Elman, J. L. (1991). Distributed representations, simple recurrent networks,and grammatical structure. Machine Learning, 7, 195–225.
Google Scholar
Emami, A. (2003). Improving a connectionist based syntactical language model. In Proceedings of the 8th European Conference on Speech Communication and Technology (pp. 413–416), Vol. 1. Geneva, Switzerland.
Emami, A., & Jelinek, F. (2004). Exact training of a neural syntactic languagemodel. In Proceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing. Montreal,Quebec.
Emami, A., Xu, P., & Jelinek, F. (2003). Using a connectionist model in asyntactical based language model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 372–375). Vol. I. Hong Kong.
Fodor, J. A. & Pylyshyn, Z.W. (1988). Connectionism and cognitive structure: A critical analysis. Cognition, 28, 3–71.
Article PubMed Google Scholar
Goodman, J. (2001). A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research, Redmond, WA.
Gropp,W., Lusk, E., & Skjellum, A. (1999). Using MPI: Portable parallelProgramming with themessage-passing interface. Cambridge: MA: MIT Press.
Google Scholar
Henderson, J. (2000). A neural network parser that handles sparse data. In Proceedings of 6th International Workshop on Parsing Technologies (pp. 123–134). Trento, Italy.
Henderson, J. (2003). Inducing history representations for broad coveragestatistical parsing. In Proceedings of the North American Chapter of Association Computational Linguistics and Human Language Technology Conference HLT-NAACL.
Hinton, G. E. (1986). Learning distributed representations of concepts. In R. G. M. Morris (Ed.), Parallel distributed processing:Implications for psychology and Neurobiology (pp. 46–61). Oxford, UK: Oxford University Press.
Google Scholar
Ho, E. & Chan, L. (1999). How to design a connectionist holistic parser. Neural Computation, 11:8, 1995–2016.
Article PubMed Google Scholar
Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge, MA and London: MIT Press.
Google Scholar
Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov sourceparameters from sparse data. In Proceedings of Workshop on Pattern Recognition in Practice (pp. 381–397). Amsterdam, The Netherlands: North Holland Publishing Co.
Kim, W., Khudanpur, S., & Wu, J. (2001). Smoothing issues in the structuredlanguage model. In Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 717–720). Alborg, Denmark.
Kneser, R., & Ney, H. (1995). Improved backing-off for m-gram languagemodeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 181&184), Vol. I.
Lawrence, S., Giles, C. L., & Fong, S. (1996). Can recurrent neural networkslearn natural language grammars?. In Proceedings of the IEEE International Conference on Neural Networks (pp. 1853&1858). Piscataway, NJ: IEEE Press.
Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979). Basiclinear algebra subprograms for fortran usage. ACM Transactions on Mathematical Software, 5:3, 308–323.
Article Google Scholar
LeCun, Y. (1985). A learning scheme for asymmetric threshold networks. In Proceedings of Cognitiva 85 (pp. 599–604). Paris, France.
Miikkulainen, R. & Dyer, M. G. (1991). Natural language processing withmodular neural networks and distributed lexicon. Cognitive Science, 15, 343–399.
Article Google Scholar
Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilisticdependencies in stochastic language modeling.. Computer Speech and Language, 8, 1–38.
Article Google Scholar
Paul, D. B., & Baker, J. M. (1992). The design for the wall street journal-based CSR corpus. In Proceedings of the DARPA SLS Workshop.
Ratnaparkhi, A. (1997). A linear observed time statistical parser based onmaximum entropy models. In Second Conference on Empirical Methods in Natural Language Processing (pp. 1–10). Providence, RI.
Roark, B. (2001). Robust probabilistic predictive syntactic processing: Motivations, models and applications. Ph.D. thesis, Brown University, Providence, RI.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Leaning internalrepresentations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Paralleldistributed processing, I. Cambridge, MA: MIT Press.
Google Scholar
Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for largevocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, (pp. 765–768). Vol. II. Orlando, FL.
Van Uystel, D. H., Van Compernolle, D., & Wambacq, P. (2001). Maximum-likelihood training of the PLCG-based language model. In Proceedings of the Automatic Speech Recognition andUnderstanding Workshop. Madonna di Campiglio, Trento-Italy.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysisin the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA.
Xu, P., Chelba, C., & Jelinek, F. (2002). A study on richer syntacticdependencies for structured language modeling. In Proceedings of the 40th Annual Meeting of the Associationfor Computational Linguistics. Philadelphia, PA.
Xu, P., Emami, A., & Jelinek, F. (2003). Training connectionist models for thestructured language model. In M. Collins, & M. Steedman (Eds.), Proceedings of the 2003conference on empirical methods in natural language processing. Sapporo, Japan: (pp. 160–167). Association for Computational Linguistics.
Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn languagemodels? In Proceedings of 6th International Conference on Spoken Language Processing. Beijing, China.

Download references

Author information

Authors and Affiliations

Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD
Ahmad Emami & Frederick Jelinek

Authors

Ahmad Emami
View author publications
You can also search for this author in PubMed Google Scholar
Frederick Jelinek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Emami.

Additional information

This work was supported by the National Science Foundation under grant No. IIS-0085940.

Editors:

Dan Roth and Pascale Fung

Rights and permissions

Reprints and permissions

About this article

Cite this article

Emami, A., Jelinek, F. A Neural Syntactic Language Model. Mach Learn 60, 195–227 (2005). https://doi.org/10.1007/s10994-005-0916-y

Download citation

Received: 15 October 2003
Revised: 26 June 2004
Accepted: 15 November 2004
Published: 02 June 2005
Issue Date: September 2005
DOI: https://doi.org/10.1007/s10994-005-0916-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Neural Syntactic Language Model

Abstract

Article PDF

Similar content being viewed by others

Hybrid Learning Model for Syntactic Pattern Recognition

Recurrent Greedy Parsing with Neural Networks

Syntactic Analysis of Sentences Using Deep Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editors:

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Neural Syntactic Language Model

Abstract

Article PDF

Similar content being viewed by others

Hybrid Learning Model for Syntactic Pattern Recognition

Recurrent Greedy Parsing with Neural Networks

Syntactic Analysis of Sentences Using Deep Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editors:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation