Abstract
Clustering is an important family of unsupervised machine learning methods. Cluster validity indices are widely used to assess the quality of obtained clustering results. The C index is one of the most popular cluster validity indices. This paper shows that the C index can be used not only to validate but also to actually find clusters. This leads to difficult discrete optimization problems which can be approximately solved by a canonical genetic algorithm. Numerical experiments compare this novel approach to the well-known c-means and single linkage clustering algorithms. For all five considered popular real-world benchmark data sets the proposed method yields a better C index than any of the other (pure) clustering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Sneath, P., Sokal, R.: Numerical Taxonomy. Freeman, San Francisco (1973)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, pp. 281–297 (1967)
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., de Carvalho, A.C.P.L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 39(2), 133–155 (2009)
Handl, J., Knowles, J., Dorigo, M.: Strategies for the increased robustness of ant-based clustering. In: Di Marzo Serugendo, G., Karageorgos, A., Rana, O.F., Zambonelli, F. (eds.) ESOA 2003. LNCS (LNAI), vol. 2977, pp. 90–104. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24701-2_7
Kanade, P.M., Hall, L.O.: Fuzzy ants as a clustering concept. In: NAFIPS International Conference, Chicago, pp. 227–232, July 2003
Ji, C., Zhang, Y., Gao, S., Yuan, P., Li, Z.: Particle swarm optimization for mobile ad hoc networks clustering. In: International Conference on Networking, Sensing and Control, Taipeh, Taiwan, pp. 372–375, March 2004
Omran, M.G.H., Salman, A., Engelbrecht, A.P.: Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal. Appl. 8(4), 332 (2006)
Tillett, J., Rao, R., Sahin, F., Rao, T.M.: Particle swarm optimization for the clustering of wireless sensors. Digit. Wirel. Commun. 5100, 73–83 (2003)
Xiao, X., Dow, E.R., Eberhart, R., Miled, Z.B., Oppelt, R.J.: Gene clustering using self-organizing maps and particle swarm optimization. In: International Parallel and Distributed Processing Symposium, Nice, France, April 2003
Dalrymple-Alford, E.C.: Measurement of clustering in free recall. Psychol. Bull. 74(1), 32 (1970)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Hubert, L.J., Levin, J.R.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976)
Bezdek, J.C., Moshtaghi, M., Runkler, T.A., Leckie, C.: The generalized C index for (internal) fuzzy cluster validity. IEEE Trans. Fuzzy Syst. 24(6), 1500–1512 (2017)
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Rathore, P., Ghafoori, Z., Bezdek, J.C., Palaniswami, M., Leckie, C.: Approximating Dunn’s cluster validity indices for partitions of big data. IEEE Trans. Cybern. 9, 1–13 (2019)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1 2, 224–227 (1979)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Ibrahim, O.A., Keller, J.M., Bezdek, J.C.: Analysis of streaming clustering using an incremental validity index. In: IEEE International Conference on Fuzzy Systems, Rio de Janeiro, Brazil, pp. 1–8 (2018)
Moshtaghi, M., Bezdek, J.C., Erfani, S.M., Leckie, C., Bailey, J.: Online cluster validity indices for performance monitoring of streaming data clustering. Int. J. Intell. Syst. 34, 541–563 (2019)
Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor (1975)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Brucker, P.: On the complexity of clustering problems. In: Henn, R., Korte, B., Oettli, W. (eds.) Optimization and Operations Research. LNE, vol. 157, pp. 45–54. Springer, Heidelberg (1978). https://doi.org/10.1007/978-3-642-95322-4_5
Bezdek, J.C., Hathaway, R.J.: Convergence of alternating optimization. Neural Parallel Sci. Comput. 11(4), 351–368 (2003)
Runkler, T.A., Bezdek, J.C., Hall, L.O.: Clustering very large data sets: The complexity of the fuzzy c-means algorithm. In: European Symposium on Intelligent Technologies, Hybrid Systems and their implementation on Smart Adaptive Systems, Albufeira, Portugal, pp. 420–425, September 2002
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Rose, K., Gurewitz, E., Fox, G.: A deterministic annealing approach to clustering. Pattern Recogn. Lett. 11(9), 589–594 (1990)
Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(3), 433–439 (1999)
Chen, C.Y., Ye, F.: Particle swarm optimization and its application to clustering analysis. In: International Conference on Networking, Sensing and Control, Taipeh, Taiwan, pp. 789–794, March 2004
Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium, Pasadena, pp. 185–191, June 2005
Runkler, T.A., Katz, C.: Fuzzy clustering by particle swarm optimization. In: IEEE International Conference on Fuzzy Systems, Vancouver, pp. 3065–3072, July 2006
van der Merwe, D.W., Engelbrecht, A.P.: Data clustering using particle swarm optimization. IEEE Congr. Evol. Comput. 1, 215–220 (2003)
Runkler, T.A.: Ant colony optimization of clustering models. Int. J. Intell. Syst. 20(12), 1233–1261 (2005)
Runkler, T.A.: Wasp swarm optimization of the c-means clustering model. Int. J. Intell. Syst. 23(3), 269–285 (2008)
Tambouratzis, G., Tambouratzis, T., Tambouratzis, D.: Clustering with artificial neural networks and traditional techniques. Int. J. Intell. Syst. 18(4), 405–428 (2003)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)
Davé, R.N.: Characterization and detection of noise in clustering. Pattern Recogn. Lett. 12, 657–664 (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Runkler, T.A., Bezdek, J.C. (2019). Optimizing the C Index Using a Canonical Genetic Algorithm. In: Kaufmann, P., Castillo, P. (eds) Applications of Evolutionary Computation. EvoApplications 2019. Lecture Notes in Computer Science(), vol 11454. Springer, Cham. https://doi.org/10.1007/978-3-030-16692-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-16692-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16691-5
Online ISBN: 978-3-030-16692-2
eBook Packages: Computer ScienceComputer Science (R0)