Abstract
We present an adaptation of model-based clustering for partially labeled data, that is capable of finding hidden cluster labels. All the originally known and discoverable clusters are represented using localized feature subset selections (subspaces), obtaining clusters unable to be discovered by global feature subset selection. The semi-supervised projected model-based clustering algorithm (SeSProC) also includes a novel model selection approach, using a greedy forward search to estimate the final number of clusters. The quality of SeSProC is assessed using synthetic data, demonstrating its effectiveness, under different data conditions, not only at classifying instances with known labels, but also at discovering completely hidden clusters in different subspaces. Besides, SeSProC also outperforms three related baseline algorithms in most scenarios using synthetic and real data sets.
Similar content being viewed by others
Notes
Note that “class”, “component”, and “cluster” are equivalent concepts at the end of the classification, but each concept will be used here to refer, respectively to a priori knowledge about instances (classes), mixture components (components) or identified groups (clusters).
Note that, for legibility, the notation related to iterations is used with \(\varTheta \), but not with \(\varvec{\theta }\) throughout the paper.
Note that the classification term only iterates theoretically until \(m = C\), but we can assume that this iteration finishes at \(m=C+1\) with \(z_{i,C+1} = 0\), \(\forall i = 1,\ldots ,L\).
References
Aggarwal C, Yu P (2000) Finding generalized projected clusters in high dimensional spaces. SIGMOD Rec 29(2):70–81
Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of 30th international conference on very large data bases, pp 852–863
Aggarwal C, Procopiuc C, Wolf J, Yu P, Park J (1999) Fast algorithms for projected clustering. SIGMOD Rec 28(2):61–72
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27:94–105
Ahmed M, Khan L (2009) SISC: a text classification approach using semi supervised subspace clustering. In: IEEE international conference on data mining workshops, pp 1–6
Alexandridis R, Lin S, Irwin M (2004) Class discovery and classification of tumor samples using mixture modeling of gene expression data, a unified approach. Bioinformatics 20(16):2545–2552
Basu S, Banjeree A, Mooney E, Banerjee A, Mooney R (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 333–344
Basu S, Davidson I, Wagstaff K (eds) (2009) Constrained clustering: advances in algorithms, theory and applications. Chapman and Hall/CRC, Boca Raton
Bishop C (2007) Pattern recognition and machine learning. Springer, New York
Boutemedjet S, Ziou D, Bouguila N (2010) Model based subspace clustering of non-Gaussian data. Neurocomputing 73(10–12):1730–1739
Chandel A, Tiwari A, Chaudhari N (2009) Constructive semi-supervised classification algorithm and its implement in data mining. In: Proceedings of the 3rd international conference on pattern recognition and machine intelligence. Springer, Berlin, pp 62–67
Chapelle O, Schölkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge
Chawla N, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. J Artif Intell Res 23:331–366
Chen L, Jiang Q, Wang S (2012) Model-based method for projective clustering. IEEE Trans Knowl Data Eng 24(7):1291–1305
Cheng C, Fu A, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 84–93
Cheng H, Hua K, Vu K (2008) Constrained locally weighted clustering. In: Proceedings of the 34th international conference on very large data bases, vol 1, Auckland, pp 90–101
Cordeiro R, Traina A, Faloutsos C, Traina C (2010) Finding clusters in subspaces of very large, multi-dimensional datasets. In: International conference on data engineering, Long Beach, pp 625–636
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
Fraley C, Raftery A (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Fraley C, Raftery A (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classication and density estimation. Technical report no. 597, Department of Statistics, University of Washington, Seatlle
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Friedman J, Meulman J (2004) Clustering objects on subsets of attributes. J R Stat Soc 66:815–849
Fromont E, Prado A, Robardet C (2009) Constraint-based subspace clustering. In: Proceedings of the 9th SIAM international conference on data mining, pp 26–37
Goil S, Nagesh H, Choudhary A (1999) MAFIA: efficient and scalable subspace clustering for very large data sets. In: International conference on data engineering
Graham M, Miller D (2006) Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Trans Signal Process 54(4):1289–1303
Günnemann S, Färber I, Müller E, Seidl T (2010) ASCLU: alternative subspace clustering. In: Multiclust: first international workshop on discovering, summarizing and using multiple clustering, held in conjunction with KDD 2010
Günnemann S, Färber I, Virochsiri K, Seidl T (2012) Subspace correlation clustering: finding locally correlated dimensions in subspace projections of the data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 352–360
Hoff P (2005) Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics 61(4):1027–1036
Hoff P (2006) Model based subspace clustering. Bayesian. Analysis 1(2):321–344
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Kriegel H, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans Knowl Disc Data 3(1):1–58
Kriegel H, Kröger P, Ntoutsi I, Zimek A (2011) Density based subspace clustering over dynamic data. In: Proceedings of the 23rd international conference on scientific and statistical database management, pp 387–404
Kriegel H, Kröger P, Zimek A (2012) Subspace clustering. Wiley Interdiscip. Rev 2(4):351–364
Lange T, Law M, Jain A, Buhmann J (2005) Learning with constrained and unlabelled data. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 731–738
Law M, Figueiredo M, Jain A (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Li Y, Dong M, Hua J (2007) A Gaussian mixture model to detect clusters embedded in feature subspace. J Commun Inf Syst 7(4):337–352
Li Y, Dong M, Hua J (2009) Simultaneous localized feature selection and model detection for Gaussian mixtures. IEEE Trans Pattern Anal Mach Intell 31(5):953–960
Lu Z, Leen T (2005) Semi-supervised learning with penalized probabilistic clustering. Adv Neural Inf Process Syst 17:849–856
Maitra R, Melnykov V (2010) Simulating data to study performance of finite mixture modeling and clustering algorithms. J Computd Graph Stat 19(2):354–376
Markley S, Miller D (2010) Joint parsimonious modeling and model order selection for multivariate Gaussian mixtures. IEEE J Sel Top Signal Process 4(3):548–559
McLachlan G, Basford K (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York
McLachlan G, Peel D (2000) Finite mixture models. Wiley-Interscience, New York
Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116
Melnykov V, Chen W, Maitra R (2012) MixSim: an R package for simulating data to study performance of clustering algorithms. J Stat Softw 51(12):1–25
Miller D, Browning J (2003) A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets. IEEE Trans Pattern Anal Mach Intell 25(11):1468–1483
Miller D, Chu-Fang L, Kesidis G, Collins C (2009) Semisupervised mixture modeling with fine-grained component-conditional class labeling and transductive inference. In: IEEE international workshop on machine learning for signal processing, pp 1–6
Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14(3):273–298
Müller E, Assent I, Seidl T (2009) HSM: heterogeneous subspace mining in high dimensional. In: Proceedings of the 21st international conference on scientific and statistical database management, pp 497–516
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–105
Procopiuc C, Jones M, Agarwal P, Murali T (2002) A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the ACM international conference on management of data, pp 418–427
R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statisical Computing, Vienna
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Shental N, Bar-Hillel A, Hertz T, Weinshall D (2003) Computing Gaussian mixture models with EM using equivalence constraints. Adv Neural Inf Process Syst 16:1–8
Sim K, Gopalkrishnan V, Zimek A, Cong G (2012) A survey on enhanced subspace clustering. Data Min Knowl Discov. doi:10.1007/s10618-012-0258-x
Wang F, Zhang C, Shen H, Wang J (2006) Semi-supervised classification using linear neighborhood propagation. In: IEEE Computer Society Conference on Computer Vision and. Pattern Recognition, vol 1:160–167
Watanabe M, Yamaguchi K (2003) The EM algorithm and related statistical models. CRC Press, Boca Raton
Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington
Woo K, Lee J, Kim M, Lee Y (2004) FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting. Inf Softw Technol 46(4):255–271
Yip K, Cheung D, Ng M (2004) HARP: a practical projected clustering algorithm. IEEE Trans Knowl Data Eng 16:1387–1397
Yip K, Cheung D, Ng M (2005) On discovery of extremely low-dimensional clusters using semi-supervised projected clustering. In: International conference on data engineering, pp 329–340
Zhang X, Wu Y, Qiu Y (2010) Constraint based dimension correlation and distance divergence for clustering high-dimensional data. In: IEEE 10th International conference on data mining, pp 629–638
Zhang X, Qiu Y, Wu Y (2011) Exploiting constraint inconsistence for dimension selection in subspace clustering: a semi-supervised approach. Neurocomputing 74(17):3598–3608
Zhu X (2005) Semi-supervised learning literature survey. Tech. rep., Computer Sciences, University of Wisconsin-Madison
Zhu X, Goldberg A (2009) Introduction to semi-supervised learning. Morgan & Claypool Publishers, New York
Acknowledgments
This research is partially supported by the Spanish Ministry of Economy and Competitiveness TIN2010-20900-C04-04 and TIN2010-21289-C02-02 projects, the Cajal Blue Brain project and Consolider Ingenio 2010-CSD2007-00018. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Centro de Supercomputación y Visualización de Madrid (CeSViMa). The authors are also very grateful for the useful comments and suggestions proposed by the anonymous reviewers, which have contributed definitely to the improvement of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Charu Aggarwal.
Appendices
Appendices
1.1 Appendix 1: Basic EM theory
The density function of an instance \(\mathbf x _{i}\) is
We can define a binary random variable \(\mathbf z _{i} = (z_{i1}, \ldots ,z_{iK})\), with \(z_{im} = 1\) if instance \(\mathbf x _{i}\) belongs to component \(m\) and with all other elements \(z_{im^{\prime }} = 0\), \(\forall \) \(m^{\prime } \ne m\). Besides, \(p(z_{im} = 1) = \pi _{m}\). Therefore, we can write
Also, \(p(\mathbf x _{i}\mid z_{im}=1) = p(\mathbf x _{i}\mid \varvec{\theta }_{m})\), which, extended, is
Using Eqs. (10) and (11), Eq. (1) is obtained by summing over all possible states of \(\mathbf z _{i}\)
This mixture of distributions has unknown parameters in \(\varTheta \) that must be estimated. These parameters can be obtained using the maximum likelihood estimation method. Therefore, assuming that each instance is independent and identically distributed (i.i.d.), and building the log-likelihood function (\(\log L\)) from Eq. (1) and extending it to all the instances, we obtain
This log-likelihood function is difficult to maximize because the summation over the components is inside the logarithm function. The log-likelihood function would change if both the latent variables (\(\mathcal{Z }\)) and the observable data (\(\mathcal{X }\)) were known. Then, based on Eqs. (10) and (11), we can define the complete-data log-likelihood function as
The maximization of this complete-data log-likelihood function is straightforward because the summation is outside the logarithm. Since the latent variables are unknown we cannot use this function directly. However, we can obtain the expectation of this log-likelihood function with respect to the posterior distribution of the latent variables. This expectation is calculated in iteration \(t\), having fixed the parameters from the previous iteration \(t-1\), in the E-step of the EM algorithm. After this, the parameters of the distributions are recalculated to maximize this expectation (M-step). These two steps are repeated until a convergence criterion is reached. Hence, the expectation of the complete-data log-likelihood function is given by
where the posterior distribution of the latent variables given the data and the parameters of the previous iteration \(t-1\) using Eq. (12), is
This factorizes over \(i\) so that the \(\{ \mathbf z _{i} \}\) in this distribution are independent. Using this posterior distribution and Bayes’ theorem, we can calculate the expected value of each \(z_{im}\) (responsibility) as
which we can use to calculate the expectation of the complete-data log-likelihood, as
1.2 Appendix 2: Including subspaces
Defining, for each component and feature, \(\rho _{mj} = p(v_{mj}=1)\), the probability of feature \(j\) being relevant to component \(m\), the new density function, including the search for subspaces, is
To prove this new density function, we can obtain for a component \(m\) and an instance \(i\),
This can be extended for all components as
Besides, we can extend Eq. (11) introducing \(\mathcal{V }\), as
The new density function, based on Eq. (1), and using Eqs. (10), (16), and (17), is,
The summation over \(\mathbf z _{i}\) is solved as in Eq. (1), obtaining,
And we can solve the summation over \(\mathcal{V }\), summing over all the possible states of each \(v_{mj}\), as,
Taking into account that each component can be described in a different feature subspace, this is the new density function of an instance, as shown in Eq. (15).
The new log-likelihood function that should be maximized, by extending Eq. (15) to all the instances, is
This is again difficult to compute since the summation over the components is inside the logarithm function. This equation would change if we knew the sets of latent variables, \(\mathcal{Z }\) and \(\mathcal V \). Again by extending Eqs. (10), (16), and (17) to all the data, we can write
which can be simplified to
We can obtain the complete-data log-likelihood function by taking the logarithm of the previous function as,
and operating again,
1.3 Appendix 3: Expectation of the complete-data log-likelihood function
Similarly to Eq. (13), the expectation of the complete-data log-likelihood function can be written as
As in Eq. (14), the posterior distribution of the latent variables given the data, having fixed the parameters of the previous iteration \(t-1\), and using Eq. (18), can be written as
Before computing the expected values of each \(v_{mj}\) and each \(z_{im}\), we need to define some other necessary probabilities:
and, similarly
Taking both expressions into account, we have
Now, as detailed after Eq. (14), we can calculate the expected value of each \(v_{mj}\), as
Using this, we calculate the expected value of each \(z_{im}\)
Thus the expectation of the complete-data log-likelihood, as in Eq. (3) and using Eq. (19), is
Then, for simplicity’s sake, we define
Now we can obtain the expectation of the complete-data log-likelihood as
1.4 Appendix 4: M-step
The parameters are recalculated in the M-step to maximize the value of the expectation of the complete-data log-likelihood function. As already mentioned, these updates are obtained by computing the partial derivatives of this expectation and equaling to zero. The univariate Gaussian distribution for each feature and component is used for this explanation. Therefore \(\theta _{mj}\) = (\(\mu _{\theta _{mj}}\), \(\sigma _{\theta _{mj}}^{2})\), and
All the detailed steps of how to update each parameter, follow.
-
\(\pi _{m}\) is updatedFootnote 3 using a Lagrange multiplier to enforce constraint \(\sum _{m=1}^{C+1}\pi _{m} = 1\):
$$\begin{aligned}&\frac{{\partial }}{{\partial \pi _{m}}}\left( \sum _{i=1}^L\sum _{m=1}^C z_{im}\log \pi _{m}\right. \\&\quad \qquad \;\;+ \sum _{i=L+1}^N\sum _{m=1}^{C+1}\gamma (z_{im})\log \pi _{m} \\&\quad \qquad \left. \;\;+\, \lambda \left( \sum _{m=1}^{C+1}\pi _{m} -1 \right) \right) = 0, \quad \forall m = 1, \ldots , C+1, \end{aligned}$$whose derivative is
$$\begin{aligned} \sum _{i=1}^L z_{im} \frac{1}{\pi _{m}}+ \sum _{i=L+1}^N \gamma (z_{im}) \frac{1}{\pi _{m}} + \lambda = 0. \end{aligned}$$Multiplying both sides by \(\pi _{m}\) and summing over \(m\), with \(m = 1, \ldots , C+1\), we have \(\lambda = -N\), as
$$\begin{aligned} - \lambda = \sum _{i=1}^L\sum _{m=1}^{C+1} z_{im} + \sum _{i=L+1}^N\sum _{m=1}^{C+1} \gamma (z_{im}) = N, \end{aligned}$$and then we update each \(\pi _{m}\) by using
$$\begin{aligned} \pi _{m} = \frac{\sum _{i=1}^L z_{im}+\sum _{i=L+1}^N \gamma (z_{im})}{N}. \end{aligned}$$ -
\(\mu _{\theta _{mj}}\) is updated solving the following partial derivative equation:
$$\begin{aligned}&\frac{{\partial }}{{\partial \mu _{\theta _{mj}}}}\Biggl ( \sum _{i=1}^L\sum _{m=1}^C \sum _{j=1}^F \gamma (u_{imj})\log p(x_{ij}\mid \theta _{mj}) \\&\qquad \qquad + \sum _{i=L+1}^N \sum _{m=1}^{C+1} \sum _{j=1}^F\gamma (u_{imj})\log p(x_{ij}\mid \theta _{mj}) \Biggr ) =0. \end{aligned}$$Then the result is
$$\begin{aligned}&\sum _{i=1}^L \Bigl ( \gamma (u_{imj})\sigma _{\theta _{mj}}^{-2}x_{ij} - \gamma (u_{imj})\sigma _{\theta _{mj}}^{-2}\mu _{\theta _{mj}} \Bigr ) \\&\quad + \sum _{i=L+1}^N \Bigl (\gamma (u_{imj})\sigma _{\theta _{mj}}^{-2}x_{ij} - \gamma (u_{imj})\sigma _{\theta _{mj}}^{-2}\mu _{\theta _{mj}}\Bigr ) = 0, \end{aligned}$$and the value of the parameter can be found as
$$\begin{aligned} \mu _{\theta _{mj}}&= \frac{\sum _{i=1}^L \gamma (u_{imj})x_{ij} + \sum _{i=L+1}^N \gamma (u_{imj})x_{ij}}{\sum _{i=1}^L \gamma (u_{imj})+ \sum _{i=L+1}^N \gamma (u_{imj})} \\&= \frac{\sum _{i=1}^N \gamma (u_{imj})x_{ij}}{\sum _{i=1}^N \gamma (u_{imj})},\quad \forall m = 1, \ldots , C+1; j = 1,\ldots ,F. \end{aligned}$$ -
And for \(\sigma _{\theta _{mj}}^{2}\),
$$\begin{aligned}&\frac{{\partial }}{{\partial \sigma _{\theta _{mj}}^{2}}}\Biggl ( \sum _{i=1}^L\sum _{m=1}^C\sum _{j=1}^F \gamma (u_{imj})\log p(x_{ij}\mid \theta _{mj}) \\&\qquad \qquad + \sum _{i=L+1}^N \sum _{m=1}^{C+1}\sum _{j=1}^F \gamma (u_{imj})\log p(x_{ij}\mid \theta _{mj})\Biggr ) = 0. \end{aligned}$$The derivative is
$$\begin{aligned}&\sum _{i=1}^L \Bigl (\gamma (u_{imj})(x_{ij}-\mu _{mj})^2 \sigma _{\theta _{mj}}^{-4} -\gamma (u_{imj})\sigma _{\theta _{mj}}^{-2}\Bigr ) \\&\quad + \sum _{i=L+1}^N \Bigl (\gamma (u_{imj})(x_{ij}-\mu _{mj})^2 \sigma _{\theta _{mj}}^{-4} -\gamma (u_{imj})\sigma _{\theta _{mj}}^{-2}\Bigr ) = 0, \end{aligned}$$and the parameter update is
$$\begin{aligned} \sigma _{\theta _{mj}}^{2}&= \frac{\sum _{i=1}^L \gamma (u_{imj})(x_{ij}-\mu _{\theta _{mj}})^2+ \sum _{i=L+1}^N \gamma (u_{imj})(x_{ij}-\mu _{\theta _{mj}})^2}{\sum _{i=1}^L \gamma (u_{imj})+ \sum _{i=L+1}^N\gamma (u_{imj})}\\&= \frac{\sum _{i=1}^N \gamma (u_{imj})(x_{ij}-\mu _{\theta _{mj}})^2}{\sum _{i=1}^N\gamma (u_{imj})},\quad \forall m = 1, \ldots , C+1; j = 1,\ldots ,F. \end{aligned}$$The same development is valid for \(\lambda _{mj} = (\mu _{\lambda _{mj}}\), \(\sigma _{\lambda _{mj}}^{2})\) but using \(\gamma (w_{imj})\) instead of \(\gamma (u_{imj})\) to indicate that feature \(j\) is irrelevant for component \(m\).
-
Then, we update \(\mu _{\lambda _{mj}}\) as
$$\begin{aligned} \mu _{\lambda _{mj}} = \frac{\sum _{i=1}^N\gamma (w_{imj})x_{ij}}{\sum _{i=1}^N \gamma (w_{imj})}, \quad \forall m = 1, \ldots , C+1; j = 1,\ldots ,F. \end{aligned}$$ -
And for \(\sigma _{\lambda _{mj}}^{2}\)
$$\begin{aligned} \sigma _{\lambda _{mj}}^{2} = \frac{\sum _{i=1}^N \gamma (w_{imj})(x_{ij}-\mu _{\lambda _{mj}})^2}{\sum _{i=1}^N \gamma (w_{imj})}, \quad \forall m = 1, \ldots , C+1; j = 1,\ldots ,F. \end{aligned}$$ -
Finally in \(\mathcal{M }^1\), \(\rho _{mj}\) is updated by
$$\begin{aligned}&\frac{{\partial }}{{\partial \rho _{mj}}} \Biggl ( \sum _{i=1}^L\sum _{m=1}^C \sum _{j=1}^F \gamma (u_{imj})\log \rho _{mj}\\&\qquad \quad \;\;+ \sum _{i=L+1}^N \sum _{m=1}^{C+1}\sum _{j=1}^F \gamma (u_{imj})\log \rho _{mj} \\&\qquad \quad \;\;+\sum _{i=1}^L \sum _{m=1}^C \sum _{j=1}^F \gamma (w_{imj})\log (1- \rho _{mj}) \\&\qquad \quad \;\;+ \sum _{i=L+1}^N \sum _{m=1}^{C+1}\sum _{j=1}^F \gamma (w_{imj})\log (1- \rho _{mj}) \Biggr )= 0, \end{aligned}$$whose partial derivative solution is,
$$\begin{aligned} \sum _{i=1}^N \gamma (u_{imj})\frac{1}{\rho _{mj}} - \sum _{i=1}^N \gamma (w_{imj}) \frac{1}{1-\rho _{mj}}=0. \end{aligned}$$This parameter is updated by
$$\begin{aligned} \rho _{mj} = \frac{\sum _{i=1}^N \gamma (u_{imj})}{\sum _{i=1}^L z_{im} + \sum _{i=L+1}^N \gamma (z_{im})},\quad \forall m = 1, \ldots , C+1; j = 1,\ldots ,F. \end{aligned}$$Note that \(z_{i,C+1} = 0\) for \(i = 1,\ldots ,L\) for the three sets of parameters, \(\theta _{mj}\), \(\lambda _{mj}\) and \(\rho _{mj}\).
Rights and permissions
About this article
Cite this article
Guerra, L., Bielza, C., Robles, V. et al. Semi-supervised projected model-based clustering. Data Min Knowl Disc 28, 882–917 (2014). https://doi.org/10.1007/s10618-013-0323-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0323-0