MP-K-Means: Modified Partition Based Cluster Initialization Method for K-Means Algorithm
Manoj Kumar Gupta1, Pravin Chandra2
1Manoj Kumar Gupta, Research Scholar, USIC&T, Guru Gobind Singh Indraprastha University, Delhi.
2Pravin Chandra, Professor, USIC&T, Guru Gobind Singh Indraprastha University, Delhi, India.

Manuscript received on November 15, 2019. | Revised Manuscript received on November 23, 2019. | Manuscript published on November 30, 2019. | PP: 1140-1148 | Volume-8 Issue-4, November 2019. | Retrieval Number: D6837118419/2019©BEIESP | DOI: 10.35940/ijrte.D6837.118419

Open Access | Ethics and Policies | Cite  | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In k-means algorithm, initial cluster centroids are selected arbitrarily which leads to diverse formation of clusters in each run. Consequently, accuracy and performance of k-means is majorly depends on the selection of initial centroids. Thus, the initial cluster centroids shall be chosen carefully to obtain better accuracy and performance of k-means algorithm. In view of this, a new Modified Partition based Cluster Initialization method for k-means called as MP-k-means is proposed in this paper. MP-k-means is an amended version of P-k-means [1] in which the range of values of each dimension is divided into ‘k’ equi-sized partition based on arithmetic average. This division of range into ‘k’ equi-sized partition is affected by outliers present in the data. In order to remove the effect of outliers in P-k-means, the partitioning of each dimension is made based on positional average instead of arithmetic average in MP-k-means. Six popular datasets are used for empirical evaluation of the algorithms. The empirical results are compared and validated based on various external and internal clustering validation measures. The comparative results show that MP-k-means is significantly superior to the basic k-means and P-k-means. The proposed method may also be applied to other clustering algorithms which are based on the concept of selection of initial cluster centroids.
Keywords: K-means Algorithm; Cluster Initialization; Partition based Cluster Initialization; P-k-Means; MP-k-means; Data Mining; Clustering.
Scope of the Article: Data Mining.