ABSTRACT
Unsupervised learning methods often involve summarizing the data using a small number of parameters. In certain domains, only a small subset of the available data is relevant for the problem. One-Class Classification or One-Class Clustering attempts to find a useful subset by locating a dense region in the data. In particular, a recently proposed algorithm called One-Class Information Ball (OC-IB) shows the advantage of modeling a small set of highly coherent points as opposed to pruning outliers. We present several modifications to OC-IB and integrate it with a global search that results in several improvements such as deterministic results, optimality guarantees, control over cluster size and extension to other cost functions. Empirical studies yield significantly better results on various real and artificial data.
- Ash A. Alizadeh et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, 503--511.Google ScholarCross Ref
- Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2004). Clustering with Bregman divergences. Proc. SDM2004 (pp. 234--245).Google ScholarCross Ref
- Crammer, K., & Chechik, G. (2004). A needle in a haystack: Local one-class optimization. ICML. Banff, Alberta, Canada. Google ScholarDigital Library
- Dhillon, I. S., Marcotte, E. M., & Roshan, U. (2003). Diametrical clustering for identifying anti-correlated gene clusters. Bioinformatics, 19, 1612--1619.Google ScholarCross Ref
- Dhillon, I. S., & Modha, D. S. (2001). Concept decompositions for large sparse text data using clustering. Machine Learning, 42, 143--175. Google ScholarDigital Library
- Gasch A. P. et al. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell, 11, 4241--4257.Google ScholarCross Ref
- Mansson, R., Tsapogas, P., Akerlund, M., & et. al. (2004). Pearson correlation analysis of microarray data allows for the identification of genetic targets for early b-cell factor. J. Biol. Chem., 279, 17905--17913.Google ScholarCross Ref
- Pietra, S. D., Pietra, V. D., & Lafferty, J. (2001). Duality and auxiliary functions for bregman distances. Technical Report CMU-CS-01-109, School of Computer Science. Carnegie Mellon University.Google Scholar
- Rockafeller, R. T. (1970). Convex analysis. Princeton University Press.Google Scholar
- Schölkopf, B., Burges, C., & Vapnik, V. (1995). Extracting support data for a given task. KDD. Menlo Park, CA: AAAI Press.Google Scholar
- Schölkopf, B., Platt, J. C., Shawe-Taylor, J. S., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443--1471. Google ScholarDigital Library
- Sharan, R., & Shamir, R. (2000). Click: A clustering algorithm with applications to gene expression analysis. Proc. 8th ISMB, 307--316. Google ScholarDigital Library
- Tax, D., & Duin, R., (1999). Data domain description using support vectors. Proceedings of the European Symposium on Artificial Neural Networks (pp. 251--256).Google Scholar
- White, D., & Jain, R. (1996). Similarity indexing with ss-tree. 12th International Conf. on Data Engineering (pp. 516--523). New Orleans. Google ScholarDigital Library
- Robust one-class clustering using hybrid global and local search
Recommendations
A Novel Clustering Algorithm Based on One-Class SVM
GCIS '09: Proceedings of the 2009 WRI Global Congress on Intelligent Systems - Volume 03One-class Support Vector Machine(OC-SVM),which is proposed to deal with the problems of classification ,intends to find the smallest hyper-sphere containing the positive data. As for the test point, one-class svm only judges it whether the test point ...
Using a Nearest Neighbor Rule for the Clustering Method Based on One-Class Support Vector Machines
CSSS '12: Proceedings of the 2012 International Conference on Computer Science and Service SystemIn this paper, a nearest neighbor rule is applied to the clustering method based on one-class support vector machines. Although the traditional clustering method inspired the k-means clustering employs the kernel-based one-class support vector machines ...
A Coreset-Based Semi-supverised Clustering Using One-Class Support Vector Machines
ICCECT '12: Proceedings of the 2012 International Conference on Control Engineering and Communication TechnologyThe traditional one-class support vector machines problem can be transformed into solving the minimum enclose-ing ball problem by the use of the corset. In this paper, the notion of the corset is applied to a semi-supervised clustering using one-class ...
Comments