ABSTRACT
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.
- Aji, S., Horn, G., & McEliece, R. (1998). The convergence of iterative decoding on graphs with a single cycle. Proc. IEEE Int'l Symposium on Information Theory.Google Scholar
- Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39--71. Google ScholarDigital Library
- Brill, E. (1994). Some advances in rule-based part of speech tagging. Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94). Google ScholarDigital Library
- Bui, H. H., Venkatesh, S., & West, G. (2002). Policy recognition in the Abstract Hidden Markov Model. Journal of Artificial Intelligence Research, 17. Google ScholarDigital Library
- Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5(3), 142--150. Google ScholarDigital Library
- Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32, 41--62. Google ScholarDigital Library
- Frietag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. AAAI Workshop on Machine Learning for Information Extraction.Google Scholar
- Ghahramani, Z., & Jordan, M. I. (1997). Factorial hidden Markov models. Machine Learning, 245--273. Google ScholarDigital Library
- Kudo, T., & Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001. Google ScholarDigital Library
- Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning. Google ScholarDigital Library
- Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002) (pp. 49--55). Google ScholarDigital Library
- Manning, C. D., & Schüütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: The MIT Press. Google ScholarDigital Library
- Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313--330. Google ScholarDigital Library
- McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. Proc. 17th International Conf. on Machine Learning (pp. 591--598). Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Minka, T. (2001). A family of algorithms for approximate Bayesian inference. Doctoral dissertation, MIT. Google ScholarDigital Library
- Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16, 69--88.Google ScholarDigital Library
- Murphy, K., & Paskin, M. A. (2001). Linear time inference in hierarchical HMMs. Proceedings of Fifteenth Annual Conference on Neural Information Processing Systems.Google Scholar
- Murphy, K. P. (2002). Dynamic Bayesian Networks: Representation, inference and learning. Doctoral dissertation, U.C. Berkeley. Google ScholarDigital Library
- Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999). Loopy belief propagation for approximate inference: An empirical study. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 467--475). Google ScholarDigital Library
- Nefian, A., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., & Murphy, K. (2002). A coupled HMM for audio-visual speech recognition. IEEE Int'l Conference on Acoustics, Speech and Signal Processing (pp. 2013--2016).Google Scholar
- Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).Google Scholar
- Peshkin, L., & Pfeffer, A. (2003). Bayesian information extraction network. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarDigital Library
- Pinto, D., McCallum, A., Wei, X., & Croft, W. B. (2003). Table extraction using conditional random fields. Proceedings of the ACM SIGIR. Google ScholarDigital Library
- Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257 -- 286.Google ScholarCross Ref
- Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. Proceedings of the Third ACL Workshop on Very Large Corpora.Google Scholar
- Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proc. of the 1996 Conference on Empirical Methods in Natural Language Proceeding (EMNLP 1996).Google Scholar
- Sang, E. F. T. K., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. Proceedings of CoNLL-2000 and LLL-2000. See http://lcg-www.uia.ac.be/~erikt/research/np-chunking.html. Google ScholarDigital Library
- Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. Proceedings of HLT-NAACL 2003. Google ScholarDigital Library
- Skounakis, M., Craven, M., & Ray, S. (2003). Hierarchical hidden Markov models for information extraction. Proceedings of the 18th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02). Google ScholarDigital Library
- Theocharous, G., Rohanimanesh, K., & Mahadevan, S. (2001). Learning hierarchical partially observable Markov decision processes for robot navigation. Proceedings of the IEEE Conference on Robotics and Automation.Google Scholar
- Wainwright, M. (2002). Stochastic processes on graphs with cycles: geometric and variational approaches. Doctoral dissertation, MIT. Google ScholarDigital Library
- Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on graphs with cycles. Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Yedidia, J., Freeman, W., & Weiss, Y. (2000). Generalized belief propagation. Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data
Recommendations
Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent OptimizationHidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Variational Infinite Hidden Conditional Random Fields
Hidden conditional random fields (HCRFs) are discriminative latent variable models which have been shown to successfully learn the hidden structure of a given classification problem. An Infinite hidden conditional random field is a hidden conditional ...
Triangular-Chain Conditional Random Fields
Sequential modeling is a fundamental task in scientific fields, especially in speech and natural language processing, where many problems of sequential data can be cast as a sequential labeling or a sequence classification. In many applications, the two ...
Comments