skip to main content
10.5555/1620754.1620843dlproceedingsArticle/Chapter ViewAbstractPublication PagesnaaclConference Proceedingsconference-collections
research-article
Free Access

Online EM for unsupervised models

Authors Info & Claims
Published:31 May 2009Publication History

ABSTRACT

The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and word alignment.

References

  1. L. Bottou and O. Bousquet. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  2. O. Cappé and E. Moulines. 2009. Online expectation-maximization algorithm for latent data models. Journal of the Royal Statistics Society: Series B (Statistical Methodology), 71.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Collins, A. Globerson, T. Koo, X. Carreras, and P. Bartlett. 2008. Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research, 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Collins. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with Perceptron algorithms. In Empirical Methods in Natural Language Processing (EMNLP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. R. Finkel, A. Kleeman, and C. Manning. 2008. Efficient, feature-based, conditional random field parsing. In Human Language Technology and Association for Computational Linguistics (HLT/ACL).Google ScholarGoogle Scholar
  6. D. Gildea and T. Hofmann. 1999. Topic-based language models using EM. In Eurospeech.Google ScholarGoogle Scholar
  7. S. Goldwater and T. Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Association for Computational Linguistics (ACL).Google ScholarGoogle Scholar
  8. S. Goldwater, T. Griffiths, and M. Johnson. 2006. Contextual dependencies in unsupervised word segmentation. In International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Johnson. 2007. Why doesn't EM find good HMM POS-taggers? In Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL).Google ScholarGoogle Scholar
  10. M. Johnson. 2008. Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure. In Human Language Technology and Association for Computational Linguistics (HLT/ACL), pages 398--406.Google ScholarGoogle Scholar
  11. J. Kuo, H. Li, and C. Lin. 2008. Mining transliterations from web query results: An incremental approach. In Sixth SIGHAN Workshop on Chinese Language Processing.Google ScholarGoogle Scholar
  12. P. Liang and D. Klein. 2008. Analyzing the errors of unsupervised learning. In Human Language Technology and Association for Computational Linguistics (HLT/ACL).Google ScholarGoogle Scholar
  13. P. Liang, D. Klein, and M. I. Jordan. 2008. Agreement-based learning. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  14. R. McDonald, K. Crammer, and F. Pereira. 2005. Online large-margin training of dependency parsers. In Association for Computational Linguistics (ACL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Neal and G. Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29:19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Sato and S. Ishii. 2000. On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12:407--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Seginer. 2007. Fast unsupervised incremental parsing. In Association for Computational Linguistics (ACL).Google ScholarGoogle Scholar
  19. S. Shalev-Shwartz and N. Srebro. 2008. SVM optimization: Inverse dependence on training set size. In International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Venkataraman. 2001. A statistical model for word discovery in transcribed speech. Computational Linguistics, 27:351--372. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online EM for unsupervised models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
          May 2009
          716 pages
          ISBN:9781932432411

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 31 May 2009

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate21of29submissions,72%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader