Abstract
PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.
Chapter PDF
Similar content being viewed by others
References
Mei, Q., Zhai, C.: A note on EM algorithm for probabilistic latent semantic analysis. In: Proceedings of the International Conference on Information and Knowledge Management, CIKM (2001)
Steyvers, M.: Probabilistic Topic Models. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of 15th Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann, San Francisco (1999)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proc. of Operating Systems Design and Implementation, San Francisco, CA, pp. 137–150 (2004)
Ghemawat, S., Gobioff, H., Leung, S.: The Google File System. In: Symposium on Operating Systems Principles, pp. 29–43 (2003)
Hadoop: Open source implementation of MapReduce (June 24, 2010), http://hadoop.apache.org
Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann (2001)
Lammel, R.: Google’s MapReduce Programming Model - Revisited. Science of Computer Programming 70, 1–30 (2008)
Jin, Y., Gao, Y., Shi, Y., Shang, L., Wang, R., Yang, Y.: P2LSA and P2LSA+: Two Paralleled Probabilistic Latent Semantic Analysis Algorithms Based on the MapReduce Model. In: Yin, H., Wang, W., Rayward-Smith, V. (eds.) IDEAL 2011. LNCS, vol. 6936, pp. 385–393. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Li, N., Zhuang, F., He, Q., Shi, Z. (2012). PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce. In: Shi, Z., Leake, D., Vadera, S. (eds) Intelligent Information Processing VI. IIP 2012. IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32891-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-32891-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32890-9
Online ISBN: 978-3-642-32891-6
eBook Packages: Computer ScienceComputer Science (R0)