skip to main content
10.1145/1102351.1102409acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Semi-supervised graph clustering: a kernel approach

Published:07 August 2005Publication History

ABSTRACT

Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective. A recent theoretical connection between kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.

References

  1. Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. Proc. of the 43rd IEEE Symp. on Foundations of Computer Science (FOCS-02) (pp. 238--247). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2003). Learning distance functions using equivalence relations. Proc. 20th Intl. Conf. on Machine Learning.Google ScholarGoogle Scholar
  3. Basu, S., Bilenko, M., & Mooney, R. (2004). A probabilistic framework for semi-supervised clustering. Proc. 10th Intl. Conf. on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chan, P., Schlag, M., & Zien, J. (1994). Spectral k-way ratio cut partitioning. IEEE Trans. CAD-Integrated Circuits and Systems, 13, 1088--1096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristianini, N., & Shawe-Taylor, J. (2000). Introduction to support vector machines. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dhillon, I., Guan, Y., & Kulis, B. (2004a). Kernel k-means, spectral clustering and normalized cuts. Proc. 10th Intl. Conf. on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dhillon, I., Guan, Y., & Kulis, B. (2004b). A unified view of kernel k-means, spectral clustering and graph cuts (Technical Report TR-04-25). University of Texas at Austin.Google ScholarGoogle Scholar
  8. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. Wiley.Google ScholarGoogle Scholar
  9. Kamvar, S. D., Klein, D., & Manning, C. (2003). Spectral learning. Proc. 17th Intl. Joint Conf. on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Klein, D., Kamvar, D., & Manning, C. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. Proc. 19th Intl. Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lee, I., Date, S. V., Adai, A. T., & Marcotte, E. M. (2004). A probabilistic functional network of yeast genes. Science, 306(5701), 1555--1558.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., & Kanehisa, M. (1999). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 27, 29--34.Google ScholarGoogle ScholarCross RefCross Ref
  13. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22, 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. Workshop on Artificial Intelligence for Web Search (AAAI).Google ScholarGoogle Scholar
  15. Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. Proc. 18th Intl. Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15.Google ScholarGoogle Scholar
  17. Yu, S., & Shi, J. (2004). Segmentation given partial grouping constraints. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26, 173--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Semi-supervised graph clustering: a kernel approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '05: Proceedings of the 22nd international conference on Machine learning
        August 2005
        1113 pages
        ISBN:1595931805
        DOI:10.1145/1102351

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 August 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader