skip to main content
10.1145/1553374.1553410acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Accounting for burstiness in topic models

Published:14 June 2009Publication History

ABSTRACT

Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art topic models suffer from the important flaw that they do not capture the tendency of words to appear in bursts; it is a fundamental property of language that if a word is used once in a document, it is more likely to be used again. We introduce a topic model that uses Dirichlet compound multinomial (DCM) distributions to model this burstiness phenomenon. On both text and non-text datasets, the new model achieves better held-out likelihood than standard latent Dirichlet allocation (LDA). It is straightforward to incorporate the DCM extension into topic models that are more complex than LDA.

References

  1. Airoldi, E. M., Fienberg, S. E., & Xing, E. P. (2007). Mixed membership analysis of genome-wide expression data. Arxiv preprint arXiv:0711.2520.Google ScholarGoogle Scholar
  2. Blei, D., & Lafferty, J. (2005). Correlated topic models. Advances in Neural Information Processing Systems 18 (pp. 147--154).Google ScholarGoogle Scholar
  3. Blei, D., Ng, A., & Jordan, M. (2001). Latent Dirichlet allocation. Advances in Neural Information Processing Systems 14 (pp. 601--608).Google ScholarGoogle Scholar
  4. Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. J. of Machine Learning Research, 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Celeux, G., Chaveau, D., & Diebolt, J. (1996). Stochastic versions of the EM algorithm: An experimental study in the mixture case. J. of Statistical Computation and Simulation, 55, 287--314.Google ScholarGoogle ScholarCross RefCross Ref
  6. Church, K., & Gale, W. A. (1995). Poisson mixtures. Natural Language Engineering, 1, 163--190.Google ScholarGoogle ScholarCross RefCross Ref
  7. Elkan, C. (2006). Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. Proceedings of the 23rd International Conference on Machine Learning (pp. 289--296). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 524--531). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Globerson, A., Chechik, G., Pereira, F., & Tishby, N. (2004). Euclidean embedding of co-occurrence data. Advances in Neural Information Processing Systems 17 (pp. 497--504).Google ScholarGoogle Scholar
  10. Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 104, 5228--5235.Google ScholarGoogle ScholarCross RefCross Ref
  11. Griffiths, T., Steyvers, M., Blei, D., & Tenenbaum, J. (2004). Integrating topics and syntax. Advances in Neural Information Processing Systems 17 (pp. 537--544).Google ScholarGoogle Scholar
  12. Heinrich, G. (2005). Parameter estimation for text analysis. Available at http://www.arbylon.net/publications/text-est.pdf.Google ScholarGoogle Scholar
  13. Li, W., & McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. Proceedings of the 23rd International Conference on Machine Learning (pp. 577--584). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Li, W., & McCallum, A. (2008). Pachinko allocation: Scalable mixture models of topic correlations. J. of Machine Learning Research. Submitted.Google ScholarGoogle Scholar
  15. Madsen, R., Kauchak, D., & Elkan, C. (2005). Modeling word burstiness using the Dirichlet distribution. Proceedings of the 22nd International Conference on Machine Learning (pp. 545--552). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Newton, M., & Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society B, 56, 3--48.Google ScholarGoogle ScholarCross RefCross Ref
  17. Rennie, J. D. M., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive Bayes text classifiers. Proceedings of 20th International Conference on Machine Learning (pp. 616--623).Google ScholarGoogle Scholar
  18. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGS-B: Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23, 550--560. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accounting for burstiness in topic models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
            June 2009
            1331 pages
            ISBN:9781605585161
            DOI:10.1145/1553374

            Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 June 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate140of548submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader