A Spectral Algorithm for Latent Dirichlet Allocation

Anandkumar, Anima; Foster, Dean P.; Hsu, Daniel; Kakade, Sham M.; Liu, Yi-Kai

doi:10.1007/s00453-014-9909-1

A Spectral Algorithm for Latent Dirichlet Allocation

Published: 03 July 2014

Volume 72, pages 193–214, (2015)
Cite this article

Algorithmica Aims and scope Submit manuscript

Anima Anandkumar¹,
Dean P. Foster²,
Daniel Hsu³,
Sham M. Kakade⁴ &
…
Yi-Kai Liu⁵

1583 Accesses
18 Citations
Explore all metrics

Abstract

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new method of moments for latent variable models

Article 22 May 2018

Robust Initialization for Learning Latent Dirichlet Allocation

Group topic model: organizing topics into groups

Article 10 September 2014

Notes

The technique of [23] is actually attributed to Robert Jennrich.
By additive noise, we mean a model in which \({\varvec{x}}_v = {\varvec{O}}^{(v)}{\varvec{h}}+ {\varvec{\eta }}_v\) where \({\varvec{\eta }}_v\) is a zero-mean random vector independent of \({\varvec{h}}\).

References

Achlioptas, D., McSherry, F.: On spectral learning of mixtures of distributions. Eighteenth Annual Conference on Learning Theory, pp. 458–469. Springer, Bertinoro (2005)
Google Scholar
Anandkumar, A., Chaudhuri, K., Hsu, D., Kakade, S.M., Song, L., Zhang, T.: Spectral methods for learning multivariate latent tree structure. Adv. Neural Inf. Process. Syst. 24, 2025–2033 (2011)
Google Scholar
Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S.M., Liu, Y.K.: A spectral algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 25, 917–925 (2012)
Google Scholar
Anandkumar, A., Foster, D.P., Hsu, D., Kakade, S.M., Liu, Y.K.: Two SVDs suffice: spectral decompositions for probabilistic topic models and latent Dirichlet allocation (2012). arXiv:1204.6703v1
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. (2014). To appear.
Anandkumar, A., Hsu, D., Kakade, S.M.: A method of moments for mixture models and hidden Markov models. In: Twenty-Fifth Annual Conference on Learning Theory, vol. 23, pp. 33.1-33.34 (2012)
Ando, R., Zhang, T.: Two-view feature generation model for semi-supervised learning. In: Twenty-Fourth International Conference on Machine Learning, pp. 25–32 (2007)
Arora, S., Ge, R., Moitra, A.: Learning topic models – going beyond SVD. In: Fifty-Third IEEE Annual Symposium on Foundations of Computer Science, pp. 1–10 (2012)
Arora, S., Ge, R., Moitra, A., Sachdeva, S.: Provable ICA with unknown Gaussian noise, with implications for Gaussian mixtures and autoencoders. Adv. Neural Inf. Process. Syst. 25, 2375–2383 (2012)
Google Scholar
Arora, S., Kannan, R.: Learning mixtures of separated nonspherical Gaussians. Ann. Appl. Probab. 15(1A), 69–92 (2005)
Article MATH MathSciNet Google Scholar
Belkin, M., Sinha, K.: Polynomial learning of distribution families. In: Fifty-First Annual IEEE Symposium on Foundations of Computer Science, pp. 103–112 (2010)
Blei, D.M., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Canny, J.: GaP: A factor model for discrete data. In: Proceedings of the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 122–129 (2004)
Cardoso, J.F., Comon, P.: Independent component analysis, a survey of some algebraic methods. In: IEEE International Symposium on Circuits and Systems, pp. 93–96 (1996)
Chang, J.T.: Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51–73 (1996)
Article MATH MathSciNet Google Scholar
Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K.: Multi-view clustering via canonical correlation analysis. In: Twenty-Sixth Annual International Conference on Machine Learning, pp. 129–136 (2009)
Chaudhuri, K., Rao, S.: Learning mixtures of product distributions using correlations and independence. In: Twenty-First Annual Conference on Learning Theory, pp. 9–20 (2008)
Comon, P., Jutten, C.: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Waltham (2010)
Google Scholar
Dasgupta, S.: Learning mixutres of Gaussians. In: Fortieth Annual IEEE Symposium on Foundations of Computer Science, pp. 634–644 (1999)
Dasgupta, S., Schulman, L.: A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. J. Mach. Learn. Res. 8, 203–226 (2007)
MATH MathSciNet Google Scholar
Frieze, A.M., Jerrum, M., Kannan, R.: Learning linear transformations. In: Thirty-Seventh Annual Symposium on Foundations of Computer Science, pp. 359–368 (1996)
Griffiths, T.: Gibbs sampling in the generative model of latent Dirichlet allocation. Tech. rep., Stanford University (2002)
Harshman, R.: Foundations of the PARAFAC procedure: model and conditions for an ‘explanatory’ multi-mode factor analysis. Tech. rep., UCLA Working Papers in Phonetics (1970)
Hitchcock, F.: The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 6, 164–189 (1927)
MATH Google Scholar
Hitchcock, F.: Multiple invariants and generalized rank of a p-way matrix or tensor. J. Math. Phys. 7, 39–79 (1927)
MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the Twenty-Second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Hotelling, H.: The most predictable criterion. J. Educ. Psychol. 26(2), 139–142 (1935)
Article Google Scholar
Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Fourth Innovations in Theoretical Computer Science (2013)
Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden Markov models. J. Comput. Syst. Sci. 78(5), 1460–1480 (2012). http://www.sciencedirect.com/science/article/pii/S0022000012000244
Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process. 24, 1–10 (1991)
Article MATH Google Scholar
Kakade, S.M., Foster, D.P.: Multi-view regression via canonical correlation analysis. In: Twentieth Annual Conference on Learning Theory, pp. 82–96 (2007)
Kalai, A.T., Moitra, A., Valiant, G.: Efficiently learning mixtures of two Gaussians. In: Forty-second ACM Symposium on Theory of Computing, pp. 553–562 (2010)
Kannan, R., Salmasian, H., Vempala, S.: The spectral method for general mixture models. SIAM J. Comput. 38(3), 1141–1156 (2008)
Article MATH MathSciNet Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MATH MathSciNet Google Scholar
Kruskal, J.B.: Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Its Appl. 18(2), 95–138 (1977)
Article MATH MathSciNet Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Leurgans, S., Ross, R., Abel, R.: A decomposition for three-way arrays. SIAM J. Matrix Anal. Appl. 14(4), 1064–1083 (1993)
Article MATH MathSciNet Google Scholar
Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaussians. In: Fifty-First Annual IEEE Symposium on Foundations of Computer Science, pp. 93–102 (2010)
Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16(2), 583–614 (2006)
Article MATH MathSciNet Google Scholar
Nguyen, P.Q., Regev, O.: Learning a parallelepiped: cryptanalysis of GGH and NTRU signatures. J. Cryptol. 22(2), 139–160 (2009)
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)
Article MATH MathSciNet Google Scholar
Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)
Article MATH Google Scholar
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)
Article MATH MathSciNet Google Scholar
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)
Article MathSciNet Google Scholar
Vempala, S., Wang, G.: A spectral algorithm for learning mixtures models. J. Comput. Syst. Sci. 68(4), 841–860 (2004)
Article MATH MathSciNet Google Scholar
Zou, J., Hsu, D., Parkes, D., Adams, R.: Contrastive learning using spectral methods. Adv. Neural Inf. Process. Syst. 26, 2238–2246 (2013)
Google Scholar

Download references

Acknowledgments

We thank Kamalika Chaudhuri, Adam Kalai, Percy Liang, Chris Meek, David Sontag, and Tong Zhang for valuable insights. We also thank Rong Ge for sharing preliminary results (in [8]) and the anonymous reviewers for their comments, suggestions, and pointers to references. Part of this work was completed while DH was a postdoctoral researcher at Microsoft Research New England, and while DPF, YKL, and AA were visiting the same lab. AA is supported in part by Microsoft Faculty Fellowship, NSF Career award CCF-1254106, NSF Award CCF-1219234, NSF BIGDATA IIS-1251267 and ARO YIP Award W911NF-13-1-0084.

Author information

Authors and Affiliations

University of California, Irvine, Irvine, CA, USA
Anima Anandkumar
Yahoo! Labs, New York, NY, USA
Dean P. Foster
Columbia University, New York, NY, USA
Daniel Hsu
Microsoft Research, Cambridge, MA, USA
Sham M. Kakade
National Institute of Standards and Technology, Gaithersburg, MD, USA
Yi-Kai Liu

Authors

Anima Anandkumar
View author publications
You can also search for this author in PubMed Google Scholar
Dean P. Foster
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Sham M. Kakade
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Kai Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Hsu.

Additional information

Preliminary versions of this article appeared as [3, 4].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anandkumar, A., Foster, D.P., Hsu, D. et al. A Spectral Algorithm for Latent Dirichlet Allocation. Algorithmica 72, 193–214 (2015). https://doi.org/10.1007/s00453-014-9909-1

Download citation

Received: 01 October 2013
Accepted: 12 June 2014
Published: 03 July 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s00453-014-9909-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Spectral Algorithm for Latent Dirichlet Allocation

Abstract

Access this article

Similar content being viewed by others

A new method of moments for latent variable models

Robust Initialization for Learning Latent Dirichlet Allocation

Group topic model: organizing topics into groups

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Spectral Algorithm for Latent Dirichlet Allocation

Abstract

Access this article

Similar content being viewed by others

A new method of moments for latent variable models

Robust Initialization for Learning Latent Dirichlet Allocation

Group topic model: organizing topics into groups

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation