A survey on nature inspired metaheuristic algorithms for partitional clustering

doi:10.1016/j.swevo.2013.11.003

Swarm and Evolutionary Computation

Volume 16, June 2014, Pages 1-18

https://doi.org/10.1016/j.swevo.2013.11.003 Get rights and content

Abstract

The partitional clustering concept started with K-means algorithm which was published in 1957. Since then many classical partitional clustering algorithms have been reported based on gradient descent approach. The 1990 kick started a new era in cluster analysis with the application of nature inspired metaheuristics. After initial formulation nearly two decades have passed and researchers have developed numerous new algorithms in this field. This paper embodies an up-to-date review of all major nature inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues involved during formulation of various metaheuristics as a clustering problem and major application areas are discussed.

Introduction

Data clustering determines a group of patterns in a dataset which are homogeneous in nature. The objective is to develop an automatic algorithm which can accurately classify an unleveled dataset into groups. Recent literature [1], [2], [3], [4], [5] broadly classifies clustering algorithms into three categories: hierarchical, partitional and overlapping. The hierarchical algorithms provide a tree structure output (dendrogram plot) which represent the nested grouping of the elements of a dataset [6], [7]. They do not require a priori knowledge about the number of clusters present in the dataset [8], [9]. However the process involved in the algorithm is assumed to be static and elements assigned to a given cluster cannot move to other clusters [10]. Therefore they exhibit poor performance when the separation of overlapping clusters is carried out.

The overlapping nature of clusters is better expressed in fuzzy clustering [11], [12], [13]. The popular algorithms include fuzzy c-means (FCM) [14] and fuzzy c-shells algorithm (FCS) [15]. In this approach each element of a dataset belongs to all the clusters with a fuzzy membership grade. The fuzzy clustering can be converted to a crisp clustering (any element belongs to one cluster only) by assigning each element to the cluster with highest measure of membership value.

The partitional clustering divides a dataset into a number of groups based upon certain criterion known as fitness measure. The fitness measure directly affects the nature of formation of clusters. Once an appropriate fitness measure is selected the partitioning task is converted into an optimization problem (example: grouping based on minimization of distance or maximization of correlation between patterns, otherwise optimizing their density in the N dimensional space etc.). These partitional techniques are popular in various research fields due to their capability to cluster large datasets (example: in signal and image processing for image segmentation [16], in wireless sensor network for classifying the sensors to enhance lifetime and coverage [17], [18], [19], [20], in communication to design accurate blind equalizers [21], in robotics to efficiently classify the humans based upon their activities [22], in computer science for web mining and pattern recognition [23], in economics research to identify the group of homogeneous consumers [24], in management studies to determine the portfolio [27], in seismology to classify the aftershocks from the regular background events [28], to perform high dimensional data analysis [29], in medical sciences to identify diseases from a group of patient reports and genomic studies [30], in library sciences for grouping books based upon the content [32], etc.). In all these applications the nature of patterns associated with the datasets is different from each other. Therefore a single partitional algorithm cannot universally solve all problems. Thus given a problem in hand an user has to carefully investigate the nature of the patterns associated with the dataset and select the appropriate clustering strategy.

The K-means algorithm is the most fundamental partitional clustering concept which was published by Lloyd of Bell Telephone laboratories in 1957 [373], [374], [375]. After 50 years of its existence, till date this algorithm is still popular and widely used for high dimensional datasets due to its simplicity and lower computational complexity [34], [376], [377]. In this case the minimization of Euclidean distance between elements and cluster center is considered as optimization criterion. Inspired by K-means a number of gradient algorithms for partitional clustering are developed by researchers which include bisecting K-means [35] (recursively dividing the dataset into two clusters in each step), sort-means [36] (means are shortened in the order of increasing distance from each mean to speed up the traditional process), kd-tree [37] (determines the closest cluster centers for all the data points), X-means [38] (determines the best number of clusters K by optimizing a criterion such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC)), k-harmonic means [39] (instead of minimum of Euclidean distance the harmonic mean is taken), k-modes algorithm [40], [41] (selects k initial modes instead of centers, followed by allocating every object to the nearest mode), Kernel K-means [42] (to detect arbitrary shaped clusters, with proper choice of the kernel function), K-medoid [43] (cluster center is represented by median of the data instead of the mean). These algorithms are computationally simpler, but are often trapped into local optimums due to hill-climbing approach (of cluster center moment in case of K-means). On the other hand, the nature inspired metaheuristics employ a population to explore the search space and thus ensure greater probability to achieve optimal cluster partitions.

Literature review [44], [45] reveals the recent trend to name all stochastic algorithms with randomization and local search as ‘metaheuristic’. The randomization process generates arbitrary solutions, which explore the search space and are responsible to achieve global solution. The local search is responsible to determine convergence and focus on achieving good solutions in a specific region. The first nature inspired metaheuristic is genetic algorithm (GA) developed by Holland and his colleagues in 1975 [46], [47]. It is followed by development of simulated annealing (SA) by Kirkpatrick in 1983 [48]. Recent literature reports many established nature inspired metaheuristics which are enlisted in Table 1. These algorithms are broadly classified into Evolutionary Algorithms, Physical Algorithms, Swarm Intelligence, Bio-inspired Algorithms and others. Table 1 lists these algorithms which are further divided into single objective and multi-objectives depending on the number of objective functions that they simultaneously optimize to achieve the solution.

The fundamental approach to develop nature inspired metaheuristics based clustering algorithm using simulated annealing was proposed by Selim and Alsultan [159] in 1991. Then Bezdek et al. [100] proposed the evolutionary approach to develop clustering using genetic algorithm in 1994. The research article by Sarkar and Yegnarayana [101] highlights the core issues involved in evolutionary programming for development of clustering algorithm. Lumer and Faieta first explored the use of swarm nature of clustering ants [191]. Subsequently the swarm intelligence algorithms like ant colony optimization [183] and particle swarm optimization [219] have been applied for cluster analysis.

This paper presents an in-depth survey of nature inspired metaheuristic algorithms used for partitional clustering. The paper focuses on the nature inspired metaheuristics that have been used for cluster analysis in the last two decades. Few interesting review articles on cluster analysis with overwhelming citation by researchers have been published by Jain et al. [34], Hruschka et al. [3], Xu and Wunsch [91], Freitas [92], Paterlini and Minerva [93], Jafar and Sivakumar [94]. To the best of our knowledge a review paper employing recently developed nature inspired metaheuristics for partitional clustering has not been reported. In 2009 Hruschka et al. [3] have focused on the initialization procedures, crossover, mutation, fitness evaluation and reselection associated with genetic type evolutionary algorithms for single and multiobjective cases. Jain et al. [34] have dealt with key issues of clustering, users dilemma and have suggested corresponding solutions. Jafar and Sivakumar [94] highlighted the developments in ant algorithm for cluster analysis. The book chapter by Abraham et al. [4] focuses on the use of PSO and ant algorithm for clustering task. The basic principles and methods of clustering are embodied in the books [1], [95], [96], [97], [98], [99].

Keeping the current research trends in mind the present paper contributes in the survey of partitional clustering in terms of four aspects: (1) systematic review on all the single objective nature inspired metaheuristics used in partitional clustering, (2) up-to-date survey on flexible partitional clustering based on multiobjective metaheuristic algorithms, (3) consolidation of recently developed cluster validation majors, and (4) exploration of the new application areas of partitional clustering algorithms.

The paper is organized as follows. Section 2 deals with the advances in single objective nature inspired metaheuristics for partitional clustering, which includes recent developments in algorithm design, fitness functions selection and cluster validity indices used for verification. The multi-objective metaheuristics used for flexible clustering are discussed in Section 3. The real life application areas of nature inspired partitional clustering are highlighted in Section 4. Finally the concluding remarks of investigation made in the survey are presented in Section 5. A number of issues on innovative future research are presented in Section 7.

Section snippets

Problem formulation

Given an unleveled dataset $Z_{N \times D} = {z_{1 \times D}, z_{2 \times D}, \dots, z_{N \times D}}$ representing N patterns, each having D features, partitional approach aims to cluster the dataset into K groups $(K \leq N)$ such that $C_{k} \neq ϕ \forall k = 1, 2, \dots, K; C_{k} \cap C_{l} = ϕ \forall k, l = 1, 2, ‥ K and k \neq l; ⋃_{k = 1}^{K} C_{k} = Z .$ The clustering operation is dependent on the similarity between elements present in the dataset. If $f$ denotes the fitness function then the clustering task is viewed as an optimization problem as $\overset{Optimize}{C_{k}} [f (Z_{N \times D}, C_{k})] \forall k = 1, 2, \dots, K$ Hence the optimization based clustering task

Multi-objective algorithms for flexible clustering

Recent survey article by Zhou et al. [304] highlights the basic principles, advancements and application of multi-objective algorithms to several real world optimization problems. Basically these algorithms are preferred over single objective counterparts as they incorporate additional knowledge in terms of objective functions to achieve optimal solution. In the last decade researchers have developed many nature inspired multi-objective algorithms which include non-dominated sorting GA

Real life application areas of nature inspired metaheuristics based partitional clustering

The nature inspired partitional clustering algorithms have been successfully applied to diversified areas of engineering and science. Many researchers have employed the benchmark UCI datasets to validate the performance of nature inspired clustering algorithms. Some popular UCI datasets and its uses in the corresponding algorithms are listed in Table 4. The major applications of the nature inspired clustering literature and the corresponding authors are shown in Table 5. Along with Table 5 some

Conclusion

This paper provides an up-to-date review of nature inspired metaheuristic algorithms for partitional clustering. It is observed that the traditional gradient based partitional algorithms are computationally simpler but often provide inaccurate results as the solution is trapped in the local minima. The nature inspired metaheuristics explore the entire search space with the population involved and ensure that optimal partition is achieved. Further single objective algorithms provide one optimal

Future research issues

The field of nature inspired partitional clustering is relatively young and emerging with new concepts and applications. There are many new research directions in this field which need investigations include:

•
In order to solve any partitional clustering problem the success of a particular nature inspired metaheuristic algorithm to achieve optimal partition is dependent on its design environment (i.e encoding scheme, operators, set of parameters, etc.). So for a given complex problem the design

References (377)

P.M. Pradhan et al.
Connectivity constrained wireless sensor deployment using multiobjective evolutionary algorithms and fuzzy decision making
Ad Hoc Netw.
(2012)
A. Amendola et al.
Special issue on statistical and computational methods in finance
Comput. Stat. Data Anal.
(2008)
S.R. Nanda et al.
Clustering Indian stock market data for portfolio management
Expert Syst. Appl.
(2010)
I.A. Sarafis et al.
NOCEAa rule-based evolutionary algorithm for efficient and effective clustering of massive high-dimensional databases
Appl. Soft Comput.
(2007)
M.N. Murty et al.
Knowledge-based clustering scheme for collection management and retrieval of library books
Pattern Recognit.
(1995)
A.K. Jain
Data clustering50 years beyond K-means
Pattern Recognit. Lett.
(2010)
M. Dorigo et al.
Ant colony optimization theorya survey
Theor. Comput. Sci.
(2005)
D. Karaboga et al.
On the performance of artificial bee colony (ABC) algorithm
Appl. Soft Comput.
(2008)
D. Karaboga et al.
A comparative study of artificial bee colony algorithm
Appl. Math. Comput.
(2009)
D. Dasgupta et al.
Recent advances in artificial immune systemsmodels and applications
Appl. Soft Comput.
(2011)

R. Xu et al.

Clustering

(2009)

S. Das et al.

Metaheuristic Clustering

(2009)

E.R. Hruschka et al.

A survey of evolutionary algorithms for clustering

IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.

(2009)

A. Abraham, S. Das, S. Roy, Swarm intelligence algorithms for data clustering, in: Soft Computing for Knowledge...

S. Basu, I. Davidson, K. Wagstaff (Eds.), Constrained Clustering: Advances in Algorithms, Theory and Applications, Data...

H. Frigui et al.

A robust competitive clustering algorithm with applications in computer vision

IEEE Trans. Pattern Anal. Mach. Intell.

(1999)

Y. Leung et al.

Clustering by scale-space filtering

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

S.C. Johnson

Hierarchical clustering schemes

Psychometrika

(1967)

F. Murtagh

A survey of recent advances in hierarchical clustering algorithms

Comput. J.

(1983)

A.K. Jain et al.

Data clustering a review

ACM Comput. Surv.

(1999)

M. Sato-Ilic et al.

Innovations in Fuzzy ClusteringTheory and Application

(2006)

A. Baraldi et al.

A survey of fuzzy clustering algorithms for pattern recognition—Part I

IEEE Trans. Syst. Man Cybern. Part B Cybern.

(1999)

A. Baraldi et al.

A survey of fuzzy clustering algorithms for pattern recognition—Part II

IEEE Trans. Syst. Man Cybern. Part B Cybern.

(1999)

F. Hoppner et al.

Fuzzy Cluster Analysis

(1999)

F. Hoppner

Fuzzy shell clustering algorithms in image processingfuzzy c-rectangular and 2-rectangular shells

IEEE Trans. Fuzzy Syst.

(1997)

S. Das et al.

Automatic clustering using an improved differential evolution algorithm

IEEE Trans. Syst. Man Cybern. Part A Syst. Hum.

(2008)

O. Younis et al.

Node clustering in wireless sensor networksrecent developments and deployment challenges

IEEE Netw.

(2006)

M. Younis, P. Munshi, G. Gupta, S.M. Elsharkawy, On efficient clustering of wireless sensor networks, in: Second IEEE...

P. Kumarawadu, D.J. Dechene, M. Luccini, A. Sauer, Algorithms for node clustering in wireless sensor networks: a...

S. Chen et al.

Multi-Stage blind clustering equaliser

IEEE Trans. Commun.

(1995)

S.J. Nanda, G. Panda, Automatic clustering using MOCLONAL for classifying actions of 3D human models, in: IEEE...

S.K. Halgamuge et al.

Classification and Clustering for Knowledge Discovery

(2005)

R.C. MacGregor et al.

Small Business Clustering TechnologiesApplications in Marketing, Management, IT and Economics

(2007)

A. Brabazon et al.

An introduction to evolutionary computation in finance

IEEE Comput. Intell. Mag.

(2008)

I. Zaliapin et al.

Clustering analysis of seismicity and aftershock identification

Phy. Rev. Lett.

(2008)

D. Jiang et al.

Cluster analysis for gene expression dataa survey

IEEE Trans. Knowl. Data Eng.

(2004)

A.V. Lukashin et al.

Topology of gene expression networks as revealed by data mining and modeling

Bioinformatics

(2003)

N.O. Andrews, E.A. Fox, Recent Developments in Document Clustering, Technical Report TR-07-35, Department of Computer...

M. Steinbach, G. Karypis,V. Kumar, A comparison of document clustering techniques, in: KDD Workshop on Text Mining,...

S. Phillips, Acceleration of k-means and related clustering algorithms, in: International Workshop on Algorithm...

D. Pelleg, A. Moore, Accelerating exact k-means algorithms with geometric reasoning, in: Fifth ACM SIGKDD International...

D. Pelleg, A. Moore, x-means: extending k-means with efficient estimation of the number of clusters, in: Seventeenth...

B. Zhang, M. Hsu, U. Dayal, k-harmonic means: a spatial clustering algorithm with boosting, in: International Workshop...

Z. Huang

Extensions to the k-means algorithm for clustering large data sets with categorical values

Data Mining Knowl. Discov.

(1998)

A. Chaturvedi et al.

k-modes clustering

J. Classif.

(2001)

B. Scholkopf et al.

Nonlinear component analysis as a kernel eigenvalue problem

J. Neural Comput.

(1998)

L. Kaufman et al.

Finding Groups in DataAn Introduction to Cluster Analysis

(2008)

X.S. Yang

Nature-Inspired Metaheuristic Algorithms

(2010)

J. Brownlee, Clever Algorithms: Nature-Inspired Programming Recipes, lulu.com,...

J.H. Holland

Adaptation in Natural and Artificial Systems

(1975)

Cited by (0)

View full text

ReviewA survey on nature inspired metaheuristic algorithms for partitional clustering

Abstract

Introduction

Section snippets

Problem formulation

Multi-objective algorithms for flexible clustering

Real life application areas of nature inspired metaheuristics based partitional clustering

Conclusion

Future research issues

Ad Hoc Netw.

Comput. Stat. Data Anal.

Expert Syst. Appl.

Appl. Soft Comput.

Pattern Recognit.

Pattern Recognit. Lett.

Theor. Comput. Sci.

Appl. Soft Comput.

Appl. Math. Comput.

Appl. Soft Comput.

Clustering

Metaheuristic Clustering

A survey of evolutionary algorithms for clustering

IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.

A robust competitive clustering algorithm with applications in computer vision

IEEE Trans. Pattern Anal. Mach. Intell.

Clustering by scale-space filtering

IEEE Trans. Pattern Anal. Mach. Intell.

Hierarchical clustering schemes

Psychometrika

A survey of recent advances in hierarchical clustering algorithms

Comput. J.

Data clustering a review

ACM Comput. Surv.

Innovations in Fuzzy ClusteringTheory and Application

A survey of fuzzy clustering algorithms for pattern recognition—Part I

IEEE Trans. Syst. Man Cybern. Part B Cybern.

A survey of fuzzy clustering algorithms for pattern recognition—Part II

IEEE Trans. Syst. Man Cybern. Part B Cybern.

Fuzzy Cluster Analysis

Fuzzy shell clustering algorithms in image processingfuzzy c-rectangular and 2-rectangular shells

IEEE Trans. Fuzzy Syst.

Automatic clustering using an improved differential evolution algorithm

IEEE Trans. Syst. Man Cybern. Part A Syst. Hum.

Node clustering in wireless sensor networksrecent developments and deployment challenges

IEEE Netw.

Multi-Stage blind clustering equaliser

IEEE Trans. Commun.

Classification and Clustering for Knowledge Discovery

Small Business Clustering TechnologiesApplications in Marketing, Management, IT and Economics

An introduction to evolutionary computation in finance

IEEE Comput. Intell. Mag.

Clustering analysis of seismicity and aftershock identification

Phy. Rev. Lett.

Cluster analysis for gene expression dataa survey

IEEE Trans. Knowl. Data Eng.

Topology of gene expression networks as revealed by data mining and modeling

Bioinformatics

Extensions to the k-means algorithm for clustering large data sets with categorical values

Data Mining Knowl. Discov.

k-modes clustering

J. Classif.

Nonlinear component analysis as a kernel eigenvalue problem

J. Neural Comput.

Finding Groups in DataAn Introduction to Cluster Analysis

Nature-Inspired Metaheuristic Algorithms

Adaptation in Natural and Artificial Systems

Review
A survey on nature inspired metaheuristic algorithms for partitional clustering