Review
A survey on nature inspired metaheuristic algorithms for partitional clustering

https://doi.org/10.1016/j.swevo.2013.11.003Get rights and content

Abstract

The partitional clustering concept started with K-means algorithm which was published in 1957. Since then many classical partitional clustering algorithms have been reported based on gradient descent approach. The 1990 kick started a new era in cluster analysis with the application of nature inspired metaheuristics. After initial formulation nearly two decades have passed and researchers have developed numerous new algorithms in this field. This paper embodies an up-to-date review of all major nature inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues involved during formulation of various metaheuristics as a clustering problem and major application areas are discussed.

Introduction

Data clustering determines a group of patterns in a dataset which are homogeneous in nature. The objective is to develop an automatic algorithm which can accurately classify an unleveled dataset into groups. Recent literature [1], [2], [3], [4], [5] broadly classifies clustering algorithms into three categories: hierarchical, partitional and overlapping. The hierarchical algorithms provide a tree structure output (dendrogram plot) which represent the nested grouping of the elements of a dataset [6], [7]. They do not require a priori knowledge about the number of clusters present in the dataset [8], [9]. However the process involved in the algorithm is assumed to be static and elements assigned to a given cluster cannot move to other clusters [10]. Therefore they exhibit poor performance when the separation of overlapping clusters is carried out.

The overlapping nature of clusters is better expressed in fuzzy clustering [11], [12], [13]. The popular algorithms include fuzzy c-means (FCM) [14] and fuzzy c-shells algorithm (FCS) [15]. In this approach each element of a dataset belongs to all the clusters with a fuzzy membership grade. The fuzzy clustering can be converted to a crisp clustering (any element belongs to one cluster only) by assigning each element to the cluster with highest measure of membership value.

The partitional clustering divides a dataset into a number of groups based upon certain criterion known as fitness measure. The fitness measure directly affects the nature of formation of clusters. Once an appropriate fitness measure is selected the partitioning task is converted into an optimization problem (example: grouping based on minimization of distance or maximization of correlation between patterns, otherwise optimizing their density in the N dimensional space etc.). These partitional techniques are popular in various research fields due to their capability to cluster large datasets (example: in signal and image processing for image segmentation [16], in wireless sensor network for classifying the sensors to enhance lifetime and coverage [17], [18], [19], [20], in communication to design accurate blind equalizers [21], in robotics to efficiently classify the humans based upon their activities [22], in computer science for web mining and pattern recognition [23], in economics research to identify the group of homogeneous consumers [24], in management studies to determine the portfolio [27], in seismology to classify the aftershocks from the regular background events [28], to perform high dimensional data analysis [29], in medical sciences to identify diseases from a group of patient reports and genomic studies [30], in library sciences for grouping books based upon the content [32], etc.). In all these applications the nature of patterns associated with the datasets is different from each other. Therefore a single partitional algorithm cannot universally solve all problems. Thus given a problem in hand an user has to carefully investigate the nature of the patterns associated with the dataset and select the appropriate clustering strategy.

The K-means algorithm is the most fundamental partitional clustering concept which was published by Lloyd of Bell Telephone laboratories in 1957 [373], [374], [375]. After 50 years of its existence, till date this algorithm is still popular and widely used for high dimensional datasets due to its simplicity and lower computational complexity [34], [376], [377]. In this case the minimization of Euclidean distance between elements and cluster center is considered as optimization criterion. Inspired by K-means a number of gradient algorithms for partitional clustering are developed by researchers which include bisecting K-means [35] (recursively dividing the dataset into two clusters in each step), sort-means [36] (means are shortened in the order of increasing distance from each mean to speed up the traditional process), kd-tree [37] (determines the closest cluster centers for all the data points), X-means [38] (determines the best number of clusters K by optimizing a criterion such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC)), k-harmonic means [39] (instead of minimum of Euclidean distance the harmonic mean is taken), k-modes algorithm [40], [41] (selects k initial modes instead of centers, followed by allocating every object to the nearest mode), Kernel K-means [42] (to detect arbitrary shaped clusters, with proper choice of the kernel function), K-medoid [43] (cluster center is represented by median of the data instead of the mean). These algorithms are computationally simpler, but are often trapped into local optimums due to hill-climbing approach (of cluster center moment in case of K-means). On the other hand, the nature inspired metaheuristics employ a population to explore the search space and thus ensure greater probability to achieve optimal cluster partitions.

Literature review [44], [45] reveals the recent trend to name all stochastic algorithms with randomization and local search as ‘metaheuristic’. The randomization process generates arbitrary solutions, which explore the search space and are responsible to achieve global solution. The local search is responsible to determine convergence and focus on achieving good solutions in a specific region. The first nature inspired metaheuristic is genetic algorithm (GA) developed by Holland and his colleagues in 1975 [46], [47]. It is followed by development of simulated annealing (SA) by Kirkpatrick in 1983 [48]. Recent literature reports many established nature inspired metaheuristics which are enlisted in Table 1. These algorithms are broadly classified into Evolutionary Algorithms, Physical Algorithms, Swarm Intelligence, Bio-inspired Algorithms and others. Table 1 lists these algorithms which are further divided into single objective and multi-objectives depending on the number of objective functions that they simultaneously optimize to achieve the solution.

The fundamental approach to develop nature inspired metaheuristics based clustering algorithm using simulated annealing was proposed by Selim and Alsultan [159] in 1991. Then Bezdek et al. [100] proposed the evolutionary approach to develop clustering using genetic algorithm in 1994. The research article by Sarkar and Yegnarayana [101] highlights the core issues involved in evolutionary programming for development of clustering algorithm. Lumer and Faieta first explored the use of swarm nature of clustering ants [191]. Subsequently the swarm intelligence algorithms like ant colony optimization [183] and particle swarm optimization [219] have been applied for cluster analysis.

This paper presents an in-depth survey of nature inspired metaheuristic algorithms used for partitional clustering. The paper focuses on the nature inspired metaheuristics that have been used for cluster analysis in the last two decades. Few interesting review articles on cluster analysis with overwhelming citation by researchers have been published by Jain et al. [34], Hruschka et al. [3], Xu and Wunsch [91], Freitas [92], Paterlini and Minerva [93], Jafar and Sivakumar [94]. To the best of our knowledge a review paper employing recently developed nature inspired metaheuristics for partitional clustering has not been reported. In 2009 Hruschka et al. [3] have focused on the initialization procedures, crossover, mutation, fitness evaluation and reselection associated with genetic type evolutionary algorithms for single and multiobjective cases. Jain et al. [34] have dealt with key issues of clustering, users dilemma and have suggested corresponding solutions. Jafar and Sivakumar [94] highlighted the developments in ant algorithm for cluster analysis. The book chapter by Abraham et al. [4] focuses on the use of PSO and ant algorithm for clustering task. The basic principles and methods of clustering are embodied in the books [1], [95], [96], [97], [98], [99].

Keeping the current research trends in mind the present paper contributes in the survey of partitional clustering in terms of four aspects: (1) systematic review on all the single objective nature inspired metaheuristics used in partitional clustering, (2) up-to-date survey on flexible partitional clustering based on multiobjective metaheuristic algorithms, (3) consolidation of recently developed cluster validation majors, and (4) exploration of the new application areas of partitional clustering algorithms.

The paper is organized as follows. Section 2 deals with the advances in single objective nature inspired metaheuristics for partitional clustering, which includes recent developments in algorithm design, fitness functions selection and cluster validity indices used for verification. The multi-objective metaheuristics used for flexible clustering are discussed in Section 3. The real life application areas of nature inspired partitional clustering are highlighted in Section 4. Finally the concluding remarks of investigation made in the survey are presented in Section 5. A number of issues on innovative future research are presented in Section 7.

Section snippets

Problem formulation

Given an unleveled dataset ZN×D={z1×D,z2×D,,zN×D} representing N patterns, each having D features, partitional approach aims to cluster the dataset into K groups (KN) such thatCkϕk=1,2,,K;CkCl=ϕk,l=1,2,Kandkl;k=1KCk=Z.The clustering operation is dependent on the similarity between elements present in the dataset. If f denotes the fitness function then the clustering task is viewed as an optimization problem asCkOptimize[f(ZN×D,Ck)]k=1,2,,KHence the optimization based clustering task

Multi-objective algorithms for flexible clustering

Recent survey article by Zhou et al. [304] highlights the basic principles, advancements and application of multi-objective algorithms to several real world optimization problems. Basically these algorithms are preferred over single objective counterparts as they incorporate additional knowledge in terms of objective functions to achieve optimal solution. In the last decade researchers have developed many nature inspired multi-objective algorithms which include non-dominated sorting GA

Real life application areas of nature inspired metaheuristics based partitional clustering

The nature inspired partitional clustering algorithms have been successfully applied to diversified areas of engineering and science. Many researchers have employed the benchmark UCI datasets to validate the performance of nature inspired clustering algorithms. Some popular UCI datasets and its uses in the corresponding algorithms are listed in Table 4. The major applications of the nature inspired clustering literature and the corresponding authors are shown in Table 5. Along with Table 5 some

Conclusion

This paper provides an up-to-date review of nature inspired metaheuristic algorithms for partitional clustering. It is observed that the traditional gradient based partitional algorithms are computationally simpler but often provide inaccurate results as the solution is trapped in the local minima. The nature inspired metaheuristics explore the entire search space with the population involved and ensure that optimal partition is achieved. Further single objective algorithms provide one optimal

Future research issues

The field of nature inspired partitional clustering is relatively young and emerging with new concepts and applications. There are many new research directions in this field which need investigations include:

  • In order to solve any partitional clustering problem the success of a particular nature inspired metaheuristic algorithm to achieve optimal partition is dependent on its design environment (i.e encoding scheme, operators, set of parameters, etc.). So for a given complex problem the design

References (377)

  • R. Xu et al.

    Clustering

    (2009)
  • S. Das et al.

    Metaheuristic Clustering

    (2009)
  • E.R. Hruschka et al.

    A survey of evolutionary algorithms for clustering

    IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.

    (2009)
  • A. Abraham, S. Das, S. Roy, Swarm intelligence algorithms for data clustering, in: Soft Computing for Knowledge...
  • S. Basu, I. Davidson, K. Wagstaff (Eds.), Constrained Clustering: Advances in Algorithms, Theory and Applications, Data...
  • H. Frigui et al.

    A robust competitive clustering algorithm with applications in computer vision

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1999)
  • Y. Leung et al.

    Clustering by scale-space filtering

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • S.C. Johnson

    Hierarchical clustering schemes

    Psychometrika

    (1967)
  • F. Murtagh

    A survey of recent advances in hierarchical clustering algorithms

    Comput. J.

    (1983)
  • A.K. Jain et al.

    Data clustering a review

    ACM Comput. Surv.

    (1999)
  • M. Sato-Ilic et al.

    Innovations in Fuzzy ClusteringTheory and Application

    (2006)
  • A. Baraldi et al.

    A survey of fuzzy clustering algorithms for pattern recognition—Part I

    IEEE Trans. Syst. Man Cybern. Part B Cybern.

    (1999)
  • A. Baraldi et al.

    A survey of fuzzy clustering algorithms for pattern recognition—Part II

    IEEE Trans. Syst. Man Cybern. Part B Cybern.

    (1999)
  • F. Hoppner et al.

    Fuzzy Cluster Analysis

    (1999)
  • F. Hoppner

    Fuzzy shell clustering algorithms in image processingfuzzy c-rectangular and 2-rectangular shells

    IEEE Trans. Fuzzy Syst.

    (1997)
  • S. Das et al.

    Automatic clustering using an improved differential evolution algorithm

    IEEE Trans. Syst. Man Cybern. Part A Syst. Hum.

    (2008)
  • O. Younis et al.

    Node clustering in wireless sensor networksrecent developments and deployment challenges

    IEEE Netw.

    (2006)
  • M. Younis, P. Munshi, G. Gupta, S.M. Elsharkawy, On efficient clustering of wireless sensor networks, in: Second IEEE...
  • P. Kumarawadu, D.J. Dechene, M. Luccini, A. Sauer, Algorithms for node clustering in wireless sensor networks: a...
  • S. Chen et al.

    Multi-Stage blind clustering equaliser

    IEEE Trans. Commun.

    (1995)
  • S.J. Nanda, G. Panda, Automatic clustering using MOCLONAL for classifying actions of 3D human models, in: IEEE...
  • S.K. Halgamuge et al.

    Classification and Clustering for Knowledge Discovery

    (2005)
  • R.C. MacGregor et al.

    Small Business Clustering TechnologiesApplications in Marketing, Management, IT and Economics

    (2007)
  • A. Brabazon et al.

    An introduction to evolutionary computation in finance

    IEEE Comput. Intell. Mag.

    (2008)
  • I. Zaliapin et al.

    Clustering analysis of seismicity and aftershock identification

    Phy. Rev. Lett.

    (2008)
  • D. Jiang et al.

    Cluster analysis for gene expression dataa survey

    IEEE Trans. Knowl. Data Eng.

    (2004)
  • A.V. Lukashin et al.

    Topology of gene expression networks as revealed by data mining and modeling

    Bioinformatics

    (2003)
  • N.O. Andrews, E.A. Fox, Recent Developments in Document Clustering, Technical Report TR-07-35, Department of Computer...
  • M. Steinbach, G. Karypis,V. Kumar, A comparison of document clustering techniques, in: KDD Workshop on Text Mining,...
  • S. Phillips, Acceleration of k-means and related clustering algorithms, in: International Workshop on Algorithm...
  • D. Pelleg, A. Moore, Accelerating exact k-means algorithms with geometric reasoning, in: Fifth ACM SIGKDD International...
  • D. Pelleg, A. Moore, x-means: extending k-means with efficient estimation of the number of clusters, in: Seventeenth...
  • B. Zhang, M. Hsu, U. Dayal, k-harmonic means: a spatial clustering algorithm with boosting, in: International Workshop...
  • Z. Huang

    Extensions to the k-means algorithm for clustering large data sets with categorical values

    Data Mining Knowl. Discov.

    (1998)
  • A. Chaturvedi et al.

    k-modes clustering

    J. Classif.

    (2001)
  • B. Scholkopf et al.

    Nonlinear component analysis as a kernel eigenvalue problem

    J. Neural Comput.

    (1998)
  • L. Kaufman et al.

    Finding Groups in DataAn Introduction to Cluster Analysis

    (2008)
  • X.S. Yang

    Nature-Inspired Metaheuristic Algorithms

    (2010)
  • J. Brownlee, Clever Algorithms: Nature-Inspired Programming Recipes, lulu.com,...
  • J.H. Holland

    Adaptation in Natural and Artificial Systems

    (1975)
  • Cited by (0)

    View full text