Structured Prediction of Sequences and Trees Using Infinite Contexts

Shareghi, Ehsan; Haffari, Gholamreza; Cohn, Trevor; Nicholson, Ann

doi:10.1007/978-3-319-23525-7_23

Structured Prediction of Sequences and Trees Using Infinite Contexts

Ehsan Shareghi¹⁰,
Gholamreza Haffari¹⁰,
Trevor Cohn¹¹ &
…
Ann Nicholson¹⁰

Conference paper
First Online: 01 January 2015

4046 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9285))

Abstract

Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on three tasks: morphological parsing, syntactic parsing and part-of-speech tagging.

Download to read the full chapter text

Chapter PDF

References

Beal, M.J., Ghahramani, Z., Rasmussen, C.E.: The infinite hidden markov model. In: Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 577–584 (2001)
Google Scholar
Brants, T.: Tnt - A statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual meeting on Association for Computational Linguistics, pp. 310–318. Association for Computational Linguistics (1996)
Google Scholar
Cocke, J., Schwartz, J.T.: Programming languages and their compilers : preliminary notes. Technical report (1970)
Google Scholar
Cohn, T., Blunsom, P., Goldwater, S.: Inducing tree-substitution grammars. The Journal of Machine Learning Research 11, 3053–3096 (2010)
MathSciNet MATH Google Scholar
Finkel, J., Grenager, T., Manning, C.: The infinite tree. In: Proceedings of the 45th Annual Meeting of Association for Computational Linguistics, pp. 272–279 (2007)
Google Scholar
Gasthaus, J., Teh, Y.W.: Improvements to the sequence memoizer. In: Advances in Neural Information Processing Systems, pp. 685–693 (2010)
Google Scholar
Gelling, D., Cohn, T., Blunsom, P., Graca, J.: The PASCAL challenge on grammar induction. In: Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure, pp. 64–80. Association for Computational Linguistics (2012)
Google Scholar
Goodman, J.: Parsing algorithms and metrics. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, ACL 1996, Stroudsburg, PA, USA, pp. 177–183. Association for Computational Linguistics (1996)
Google Scholar
Johnson, M.: Pcfg models of linguistic tree representations. Computational Linguistics 24(4), 613–632 (1998). ISSN 0891–2017
Google Scholar
Johnson, M.: Unsupervised word segmentation for Sesotho using adaptor grammars. In: Proceedings of the 10th Meeting of ACL Special Interest Group on Computational Morphology and Phonology, Columbus, Ohio, pp. 20–27. Association for Computational Linguistics, June 2008
Google Scholar
Johnson, M., Griffiths, T.L., Goldwater, S.: Adaptor grammars: A framework for specifying compositional nonparametric bayesian models. In: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4–7, pp. 641–648 (2006)
Google Scholar
Johnson, M., Griffiths, T.L., Goldwater, S.: Bayesian inference for pcfgs via markov chain monte carlo. In: HLT-NAACL, pp. 139–146 (2007)
Google Scholar
Klein, D., Manning, C.D.: Parsing and hypergraphs. In: Proceedings of the Seventh International Workshop on Parsing Technologies (IWPT-2001), October 17–19, Beijing, China (2001)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)
Google Scholar
Levenberg, A., Dyer, C., Blunsom, P.: A bayesian model for learning scfgs with discontiguous rules. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 223–232. Association for Computational Linguistics (2012)
Google Scholar
Liang, P., Petrov, S., Jordan, M., Klein., D.: The infinite PCFG using hierarchical dirichlet processes. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 688–697 (2007)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1993)
MATH Google Scholar
Matsuzaki, T., Miyao, Y., Tsujii, J.: Probabilistic cfg with latent annotations. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA, pp. 75–82. Association for Computational Linguistics (2005). doi:10.3115/1219840.1219850
Mochihashi, D., Sumita, E.: The infinite markov model. In: Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Systems, Vancouver, British Columbia, Canada (2007)
Google Scholar
Petrov, S., Klein, D.: Learning and inference for hierarchically split PCFGs. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada (2007)
Google Scholar
Petrov, S., Barrett, L., Thibaux, R., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 433–440. Association for Computational Linguistics (2006)
Google Scholar
Teh, Y.W.: A hierarchical bayesian language model based on pitman-yor processes. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 985–992. Association for Computational Linguistics (2006)
Google Scholar
Thede, S.M., Harper, M.P.:. A second-order hidden markov model for part-of-speech tagging. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL 1999, Stroudsburg, PA, USA, pp. 175–182. Association for Computational Linguistics (1999). ISBN 1-55860-609-3
Google Scholar
Toutanova, K., Manning, M.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 63–70. Association for Computational Linguistics (2000)
Google Scholar
Wood, F., Archambeau, C., Gasthaus, J., James, L., Teh, Y.W.: A stochastic memoizer for sequence data. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, p. 142 (2009a)
Google Scholar
Wood, F., Archambeau, C., Gasthaus, J., James, L., Teh, Y.W.: A stochastic memoizer for sequence data. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1129–1136. ACM (2009b)
Google Scholar
Wood, F., Gasthaus, J., Archambeau, C., James, L., Teh, Y.W.: The sequence memoizer. Commun. ACM 54(2), 91–98 (2011)
Article Google Scholar
Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software (TOMS) 23(4), 550–560 (1997)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Monash University, Melbourne, Australia
Ehsan Shareghi, Gholamreza Haffari & Ann Nicholson
University of Melbourne, Melbourne, Australia
Trevor Cohn

Authors

Ehsan Shareghi
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Haffari
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Ann Nicholson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ehsan Shareghi .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
Universidade do Porto, Porto, Portugal
Vítor Santos Costa
University of Porto - INESC TEC, Porto, Portugal
João Gama
University of Porto - INESC TEC, Porto, Portugal
Alípio Jorge
University of Porto - INESC TEC, Porto, Portugal
Carlos Soares

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shareghi, E., Haffari, G., Cohn, T., Nicholson, A. (2015). Structured Prediction of Sequences and Trees Using Infinite Contexts. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-23525-7_23
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics