ABSTRACT
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective. A recent theoretical connection between kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.
- Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. Proc. of the 43rd IEEE Symp. on Foundations of Computer Science (FOCS-02) (pp. 238--247). Google ScholarDigital Library
- Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2003). Learning distance functions using equivalence relations. Proc. 20th Intl. Conf. on Machine Learning.Google Scholar
- Basu, S., Bilenko, M., & Mooney, R. (2004). A probabilistic framework for semi-supervised clustering. Proc. 10th Intl. Conf. on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Chan, P., Schlag, M., & Zien, J. (1994). Spectral k-way ratio cut partitioning. IEEE Trans. CAD-Integrated Circuits and Systems, 13, 1088--1096.Google ScholarDigital Library
- Cristianini, N., & Shawe-Taylor, J. (2000). Introduction to support vector machines. Cambridge University Press. Google ScholarDigital Library
- Dhillon, I., Guan, Y., & Kulis, B. (2004a). Kernel k-means, spectral clustering and normalized cuts. Proc. 10th Intl. Conf. on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Dhillon, I., Guan, Y., & Kulis, B. (2004b). A unified view of kernel k-means, spectral clustering and graph cuts (Technical Report TR-04-25). University of Texas at Austin.Google Scholar
- Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. Wiley.Google Scholar
- Kamvar, S. D., Klein, D., & Manning, C. (2003). Spectral learning. Proc. 17th Intl. Joint Conf. on Artificial Intelligence. Google ScholarDigital Library
- Klein, D., Kamvar, D., & Manning, C. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. Proc. 19th Intl. Conf. on Machine Learning. Google ScholarDigital Library
- Lee, I., Date, S. V., Adai, A. T., & Marcotte, E. M. (2004). A probabilistic functional network of yeast genes. Science, 306(5701), 1555--1558.Google ScholarCross Ref
- Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., & Kanehisa, M. (1999). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 27, 29--34.Google ScholarCross Ref
- Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22, 888--905. Google ScholarDigital Library
- Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. Workshop on Artificial Intelligence for Web Search (AAAI).Google Scholar
- Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. Proc. 18th Intl. Conf. on Machine Learning. Google ScholarDigital Library
- Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15.Google Scholar
- Yu, S., & Shi, J. (2004). Segmentation given partial grouping constraints. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26, 173--183. Google ScholarDigital Library
- Semi-supervised graph clustering: a kernel approach
Recommendations
Semi-supervised Hierarchical Clustering
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data MiningSemi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional ...
Density-based semi-supervised clustering
Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Semi-supervised graph clustering: a kernel approach
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are ...
Comments