ABSTRACT
There are two decoding algorithms essential to the area of natural language processing. One is the Viterbi algorithm for linear-chain models, such as HMMs or CRFs. The other is the CKY algorithm for probabilistic context free grammars. However, tasks such as noun phrase chunking and relation extraction seem to fall between the two, neither of them being the best fit. Ideally we would like to model entities and relations, with two layers of labels. We present a tractable algorithm for exact inference over two layers of labels and chunks with time complexity O(n2), and provide empirical results comparing our model with linear-chain models.
- M. Collins. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proc. of Empirical Methods in Natural Language Processing (EMNLP) Google ScholarDigital Library
- K. Crammer and Y. Singer. 2003. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research Google ScholarDigital Library
- K. Crammer, O. Dekel, S. Shalev-Shwartz, and Y. Singer. 2003. Online passive aggressive algorithms. In Advances in Neural Information Processing Systems 15Google Scholar
- K. Crammer, R. McDonald, and F. Pereira. 2004. New large margin algorithms for structured prediction. In Learning with Structured Outputs Workshop (NIPS)Google Scholar
- Y. Freund and R. Schapire 1999. Large Margin Classification using the Perceptron Algorithm. In Machine Learning, 37(3):277--296. Google ScholarDigital Library
- T. S. Jaakkola, M. Diekhans, and D. Haussler. 2000. A discriminative framework for detecting remote protein homologies. Journal of Computational BiologyGoogle ScholarCross Ref
- T. Kudo 2005. CRF++: Yet Another CRF toolkit. Available at http://chasen.org/~taku/software/CRF++/Google Scholar
- J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of the 18th International Conference on Machine Learning (ICML) Google ScholarDigital Library
- F. Peng and A. McCallum. 2004. Accurate Information Extraction from Research Papers using Conditional Random Fields. In Proc. of the Human Language Technology Conf. (HLT)Google Scholar
- F. Sha and F. Pereira. 2003. Shallow parsing with conditional random fields. In Proc. of the Human Language Technology Conf. (HLT) Google ScholarDigital Library
- C. Manning and H. Schutze. 1999. Foundations of Statistical Natural Language Processing MIT Press. Google ScholarDigital Library
- A. McCallum, K. Rohanimanesh and C. Sutton. 2003. Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences. In Proc. of Workshop on Syntax, Semantics, Statistics. (NIPS)Google Scholar
- R. McDonald, K. Crammer, and F. Pereira. 2005. Online large-margin training of dependency parsers. In Proc. of the 43rd Annual Meeting of the ACL Google ScholarDigital Library
- L. Ramshaw and M. Marcus. 1995. Text chunking using transformation-based learning. In Proc. of Third Workshop on Very Large Corpora. ACLGoogle Scholar
- C. Sutton, K. Rohanimanesh and A. McCallum. 2004. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. In Proc. of the 21st International Conference on Machine Learning (ICML) Google ScholarDigital Library
- B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning 2004. Max Margin Parsing. In Proc. of Empirical Methods in Natural Language Processing (EMNLP)Google Scholar
- B. Taskar and D. Klein. 2005. Max-Margin Methods for NLP: Estimation, Structure, and Applications Available at http://www.cs.berkeley.edu/~taskar/pubs/max-margin-acl05-tutorial.pdfGoogle Scholar
- E. F. Tjong Kim Sang and S. Buchholz. 2000. Introduction to the CoNLL-2000 shared task: Chunking. In Proc. of the 4th Conf. on Computational Natural Language Learning (CoNLL) Google ScholarDigital Library
- T. Zhang. 2001. Regularized winnow methods. In Advances in Neural Information Processing Systems 13Google Scholar
- Exact decoding for jointly labeling and chunking sequences
Recommendations
Jointly labeling multiple sequences: a factorial HMM approach
ACLstudent '05: Proceedings of the ACL Student Research WorkshopWe present new statistical models for jointly labeling multiple sequences and apply them to the combined task of part-of-speech tagging and noun phrase chunking. The model is based on the Factorial Hidden Markov Model (FHMM) with distributed hidden ...
Chinese Noun Phrases Chunking: A Latent Discriminative Model with Global Features
CSE '11: Proceedings of the 2011 14th IEEE International Conference on Computational Science and EngineeringIn the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. In stead of rule-based ...
Efficient staggered decoding for sequence labeling
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational LinguisticsThe Viterbi algorithm is the conventional decoding algorithm most widely adopted for sequence labeling. Viterbi decoding is, however, prohibitively slow when the label set is large, because its time complexity is quadratic in the number of labels. This ...
Comments