Article

Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

Authors:
Charles Sutton

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Khashayar Rohanimanesh

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Andrew McCallum

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

ICML '04: Proceedings of the twenty-first international conference on Machine learningJuly 2004https://doi.org/10.1145/1015330.1015422

Published:04 July 2004Publication History

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.

References

Aji, S., Horn, G., & McEliece, R. (1998). The convergence of iterative decoding on graphs with a single cycle. Proc. IEEE Int'l Symposium on Information Theory.Google Scholar
Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39--71. Google ScholarDigital Library
Brill, E. (1994). Some advances in rule-based part of speech tagging. Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94). Google ScholarDigital Library
Bui, H. H., Venkatesh, S., & West, G. (2002). Policy recognition in the Abstract Hidden Markov Model. Journal of Artificial Intelligence Research, 17. Google ScholarDigital Library
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5(3), 142--150. Google ScholarDigital Library
Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32, 41--62. Google ScholarDigital Library
Frietag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. AAAI Workshop on Machine Learning for Information Extraction.Google Scholar
Ghahramani, Z., & Jordan, M. I. (1997). Factorial hidden Markov models. Machine Learning, 245--273. Google ScholarDigital Library
Kudo, T., & Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001. Google ScholarDigital Library
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning. Google ScholarDigital Library
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002) (pp. 49--55). Google ScholarDigital Library
Manning, C. D., & Schüütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: The MIT Press. Google ScholarDigital Library
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313--330. Google ScholarDigital Library
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. Proc. 17th International Conf. on Machine Learning (pp. 591--598). Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Minka, T. (2001). A family of algorithms for approximate Bayesian inference. Doctoral dissertation, MIT. Google ScholarDigital Library
Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16, 69--88.Google ScholarDigital Library
Murphy, K., & Paskin, M. A. (2001). Linear time inference in hierarchical HMMs. Proceedings of Fifteenth Annual Conference on Neural Information Processing Systems.Google Scholar
Murphy, K. P. (2002). Dynamic Bayesian Networks: Representation, inference and learning. Doctoral dissertation, U.C. Berkeley. Google ScholarDigital Library
Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999). Loopy belief propagation for approximate inference: An empirical study. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 467--475). Google ScholarDigital Library
Nefian, A., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., & Murphy, K. (2002). A coupled HMM for audio-visual speech recognition. IEEE Int'l Conference on Acoustics, Speech and Signal Processing (pp. 2013--2016).Google Scholar
Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).Google Scholar
Peshkin, L., & Pfeffer, A. (2003). Bayesian information extraction network. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarDigital Library
Pinto, D., McCallum, A., Wei, X., & Croft, W. B. (2003). Table extraction using conditional random fields. Proceedings of the ACM SIGIR. Google ScholarDigital Library
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257 -- 286.Google ScholarCross Ref
Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. Proceedings of the Third ACL Workshop on Very Large Corpora.Google Scholar
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proc. of the 1996 Conference on Empirical Methods in Natural Language Proceeding (EMNLP 1996).Google Scholar
Sang, E. F. T. K., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. Proceedings of CoNLL-2000 and LLL-2000. See http://lcg-www.uia.ac.be/~erikt/research/np-chunking.html. Google ScholarDigital Library
Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. Proceedings of HLT-NAACL 2003. Google ScholarDigital Library
Skounakis, M., Craven, M., & Ray, S. (2003). Hierarchical hidden Markov models for information extraction. Proceedings of the 18th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02). Google ScholarDigital Library
Theocharous, G., Rohanimanesh, K., & Mahadevan, S. (2001). Learning hierarchical partially observable Markov decision processes for robot navigation. Proceedings of the IEEE Conference on Robotics and Automation.Google Scholar
Wainwright, M. (2002). Stochastic processes on graphs with cycles: geometric and variational approaches. Doctoral dissertation, MIT. Google ScholarDigital Library
Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on graphs with cycles. Advances in Neural Information Processing Systems (NIPS).Google Scholar
Yedidia, J., Freeman, W., & Weiss, Y. (2000). Generalized belief propagation. Advances in Neural Information Processing Systems (NIPS).Google Scholar

Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

Recommendations

Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Read More
Variational Infinite Hidden Conditional Random Fields
Hidden conditional random fields (HCRFs) are discriminative latent variable models which have been shown to successfully learn the hidden structure of a given classification problem. An Infinite hidden conditional random field is a hidden conditional ...
Read More
Triangular-Chain Conditional Random Fields

Sequential modeling is a fundamental task in scientific fields, especially in speech and natural language processing, where many problems of sequential data can be cast as a sequential labeling or a sequence classification. In many applications, the two ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
Conference Chair:
Carla Brodley
Purdue University/Tufts University
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 July 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 150
  Total Citations
  View Citations
- 788
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Hierarchical hidden conditional random fields for information extraction

Variational Infinite Hidden Conditional Random Fields

Triangular-Chain Conditional Random Fields

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

ICML '04: Proceedings of the twenty-first international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

Hierarchical hidden conditional random fields for information extraction

Variational Infinite Hidden Conditional Random Fields

Triangular-Chain Conditional Random Fields

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media