ABSTRACT
The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and word alignment.
- L. Bottou and O. Bousquet. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
- O. Cappé and E. Moulines. 2009. Online expectation-maximization algorithm for latent data models. Journal of the Royal Statistics Society: Series B (Statistical Methodology), 71.Google ScholarCross Ref
- M. Collins, A. Globerson, T. Koo, X. Carreras, and P. Bartlett. 2008. Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research, 9. Google ScholarDigital Library
- M. Collins. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with Perceptron algorithms. In Empirical Methods in Natural Language Processing (EMNLP). Google ScholarDigital Library
- J. R. Finkel, A. Kleeman, and C. Manning. 2008. Efficient, feature-based, conditional random field parsing. In Human Language Technology and Association for Computational Linguistics (HLT/ACL).Google Scholar
- D. Gildea and T. Hofmann. 1999. Topic-based language models using EM. In Eurospeech.Google Scholar
- S. Goldwater and T. Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Association for Computational Linguistics (ACL).Google Scholar
- S. Goldwater, T. Griffiths, and M. Johnson. 2006. Contextual dependencies in unsupervised word segmentation. In International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL). Google ScholarDigital Library
- M. Johnson. 2007. Why doesn't EM find good HMM POS-taggers? In Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL).Google Scholar
- M. Johnson. 2008. Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure. In Human Language Technology and Association for Computational Linguistics (HLT/ACL), pages 398--406.Google Scholar
- J. Kuo, H. Li, and C. Lin. 2008. Mining transliterations from web query results: An incremental approach. In Sixth SIGHAN Workshop on Chinese Language Processing.Google Scholar
- P. Liang and D. Klein. 2008. Analyzing the errors of unsupervised learning. In Human Language Technology and Association for Computational Linguistics (HLT/ACL).Google Scholar
- P. Liang, D. Klein, and M. I. Jordan. 2008. Agreement-based learning. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
- R. McDonald, K. Crammer, and F. Pereira. 2005. Online large-margin training of dependency parsers. In Association for Computational Linguistics (ACL). Google ScholarDigital Library
- R. Neal and G. Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models. Google ScholarDigital Library
- F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29:19--51. Google ScholarDigital Library
- M. Sato and S. Ishii. 2000. On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12:407--432. Google ScholarDigital Library
- Y. Seginer. 2007. Fast unsupervised incremental parsing. In Association for Computational Linguistics (ACL).Google Scholar
- S. Shalev-Shwartz and N. Srebro. 2008. SVM optimization: Inverse dependence on training set size. In International Conference on Machine Learning (ICML). Google ScholarDigital Library
- A. Venkataraman. 2001. A statistical model for word discovery in transcribed speech. Computational Linguistics, 27:351--372. Google ScholarDigital Library
Index Terms
- Online EM for unsupervised models
Recommendations
Unsupervised estimation for noisy-channel models
ICML '07: Proceedings of the 24th international conference on Machine learningShannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech processing. The model factors into two components: a language model to characterize ...
Random swap EM algorithm for Gaussian mixture models
Expectation maximization (EM) algorithm is a popular way to estimate the parameters of Gaussian mixture models. Unfortunately, its performance highly depends on the initialization. We propose a random swap EM for the initialization of EM. Instead of ...
An unsupervised method for word sense disambiguation
AbstractWord sense disambiguation (WSD) finds the actual meaning of a word according to its context. This paper presents a novel WSD method to find the correct sense of a word present in a sentence. The proposed method uses both the WordNet ...
Comments