skip to main content
10.1145/1015330.1015422acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

Published:04 July 2004Publication History

ABSTRACT

In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.

References

  1. Aji, S., Horn, G., & McEliece, R. (1998). The convergence of iterative decoding on graphs with a single cycle. Proc. IEEE Int'l Symposium on Information Theory.Google ScholarGoogle Scholar
  2. Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brill, E. (1994). Some advances in rule-based part of speech tagging. Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bui, H. H., Venkatesh, S., & West, G. (2002). Policy recognition in the Abstract Hidden Markov Model. Journal of Artificial Intelligence Research, 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5(3), 142--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32, 41--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Frietag, D., & McCallum, A. (1999). Information extraction with HMMs and shrinkage. AAAI Workshop on Machine Learning for Information Extraction.Google ScholarGoogle Scholar
  8. Ghahramani, Z., & Jordan, M. I. (1997). Factorial hidden Markov models. Machine Learning, 245--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kudo, T., & Matsumoto, Y. (2001). Chunking with support vector machines. Proceedings of NAACL-2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002) (pp. 49--55). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Manning, C. D., & Schüütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. Proc. 17th International Conf. on Machine Learning (pp. 591--598). Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Minka, T. (2001). A family of algorithms for approximate Bayesian inference. Doctoral dissertation, MIT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mohri, M., Pereira, F., & Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, 16, 69--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Murphy, K., & Paskin, M. A. (2001). Linear time inference in hierarchical HMMs. Proceedings of Fifteenth Annual Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  18. Murphy, K. P. (2002). Dynamic Bayesian Networks: Representation, inference and learning. Doctoral dissertation, U.C. Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Murphy, K. P., Weiss, Y., & Jordan, M. I. (1999). Loopy belief propagation for approximate inference: An empirical study. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 467--475). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nefian, A., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., & Murphy, K. (2002). A coupled HMM for audio-visual speech recognition. IEEE Int'l Conference on Acoustics, Speech and Signal Processing (pp. 2013--2016).Google ScholarGoogle Scholar
  21. Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL'04).Google ScholarGoogle Scholar
  22. Peshkin, L., & Pfeffer, A. (2003). Bayesian information extraction network. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pinto, D., McCallum, A., Wei, X., & Croft, W. B. (2003). Table extraction using conditional random fields. Proceedings of the ACM SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257 -- 286.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ramshaw, L. A., & Marcus, M. P. (1995). Text chunking using transformation-based learning. Proceedings of the Third ACL Workshop on Very Large Corpora.Google ScholarGoogle Scholar
  26. Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Proc. of the 1996 Conference on Empirical Methods in Natural Language Proceeding (EMNLP 1996).Google ScholarGoogle Scholar
  27. Sang, E. F. T. K., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. Proceedings of CoNLL-2000 and LLL-2000. See http://lcg-www.uia.ac.be/~erikt/research/np-chunking.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. Proceedings of HLT-NAACL 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Skounakis, M., Craven, M., & Ray, S. (2003). Hierarchical hidden Markov models for information extraction. Proceedings of the 18th International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Theocharous, G., Rohanimanesh, K., & Mahadevan, S. (2001). Learning hierarchical partially observable Markov decision processes for robot navigation. Proceedings of the IEEE Conference on Robotics and Automation.Google ScholarGoogle Scholar
  32. Wainwright, M. (2002). Stochastic processes on graphs with cycles: geometric and variational approaches. Doctoral dissertation, MIT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wainwright, M., Jaakkola, T., & Willsky, A. (2001). Tree-based reparameterization for approximate estimation on graphs with cycles. Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  34. Yedidia, J., Freeman, W., & Weiss, Y. (2000). Generalized belief propagation. Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  1. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICML '04: Proceedings of the twenty-first international conference on Machine learning
            July 2004
            934 pages
            ISBN:1581138385
            DOI:10.1145/1015330
            • Conference Chair:
            • Carla Brodley

            Copyright © 2004 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 July 2004

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate140of548submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader