Abstract
Mixture models, such as Gaussian Mixture Model, have been widely used in many applications for modeling data. Gaussian mixture model (GMM) assumes that data points are generated from a set of Gaussian models with the same set of mixture weights. A natural extension of GMM is the probabilistic latent semantic analysis (PLSA) model, which assigns different mixture weights for each data point. Thus, PLSA is more flexible than the GMM method. However, as a tradeoff, PLSA usually suffers from the overfitting problem. In this paper, we propose a regularized probabilistic latent semantic analysis model (RPLSA), which can properly adjust the amount of model flexibility so that not only the training data can be fit well but also the model is robust to avoid the overfitting problem. We conduct empirical study for the application of speaker identification to show the effectiveness of the new model. The experiment results on the NIST speaker recognition dataset indicate that the RPLSA model outperforms both the GMM and PLSA models substantially. The principle of RPLSA of appropriately adjusting model flexibility can be naturally extended to other applications and other types of mixture models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bellegarda, J.R., Nahamoo, D.: Tied mixture continuous parameter modeling for speech recognition. IEEE Trans. Acoustic., Speech, Signal Processing 38 (1990)
Blei, D., Ng, A., Jordan., M.: Latent Dirichlet allocation. Journal of Machine Learning Research, 993–1022 (2003)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B39, 1–38 (1977)
Furui, S.: An overview of speaker recognition technology. In: Lee, C., Soong, F., Paliwal, K. (eds.) Automatic speech and speaker Recognition. Kluwer Academic Press, Dordrecht (1996)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the 15th Conference on Uncertainty in AI (UAI) (1999)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR) (1999)
Permuter, H., Francos, J., Jermyn, I.H.: Gaussian mixture models of texture and colour for image database retrieval. In: Proc. ICASSP, vol. 1, pp. 25–88 (2003)
Povinelli, R.J., Johnson, M.T., Lindgren, A.C., Ye, J.J.: Time Series Classification Using Gaussian Mixture Models of Reconstructed Phase Spaces. IEEE Transactions on Knowledge and Data Engineering 16(6) (2004)
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication (17), 91–108 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Si, L., Jin, R. (2005). Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_72
Download citation
DOI: https://doi.org/10.1007/11430919_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)