Abstract

Data mining is mostly utilized for a huge variety of applications in several fields like education, medical, surveillance, and industries. The clustering is an important method of data mining, in which data elements are divided into groups (clusters) to provide better quality data analysis. The Biogeography-Based Optimization (BO) is the latest metaheuristic approach, which is applied to resolve several complex optimization problems. Here, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is implemented to perform clustering over healthcare IoT datasets. The main objective of CBO-IE is to provide proficient and precise data point distribution in datasets by using Information Entropy concepts and to initialize the population by using chaos theory. Both Information Entropy and chaos theory are facilitated to improve the convergence speed of BO in global search area for selecting the cluster heads and cluster members more accurately. The CBO-IE is implemented to a MATLAB 2021a tool over eight healthcare IoT datasets, and the results illustrate the superior performance of CBO-IE based on F-Measure, intracluster distance, running time complexity, purity index, statistical analysis, root mean square error, accuracy, and standard deviation as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches.

1. Introduction

The big data [1, 2] is used and analyzed in several wireless applications by utilizing various characteristics like storage, processing, and maintenance of data. The preprocessing of a huge amount of data is performed before analysis to reduce the data redundancy with enhancing the data accuracy and efficiency [35]. The big data is processed by using various nature inspired optimization approaches like Genetic Algorithm (GA), Ant Colony Optimization (ACO), Ant Lion Optimization (ALO), and Particle Swarm Optimization (PSO) to perform optimal analysis [6].

The data mining [79] is a systematic procedure utilized for extracting the secret knowledge and model from an immense, multifaceted, and multidimensional dataset [10]. The association rule mining is one of the famous data mining approaches introduced with MapReduce concept to evaluate the relationship among data elements in a huge dataset [11]. These data relationships are recognized to generate maximum profit from marketing through IoT devices in industries. The time and security are crucial issues in business, which are frequently resolved by using data mining [1214].

The data clustering [15] is a type of data mining, in which data components are divided into various sets or groups. The data are collected from various heterogeneous resources; after that time series clustering is applied on this huge amount of data to improve the data accessibility. The industry data are distributed more precisely and accurately by clustering to predict the future aspects of market [16]. The cybercrime data are analyzed after preprocessing to perform training and testing of clustering techniques. The nature of crime is easily understood and detected by clustering the similar type of crime patterns to reduce the crime ratio [17]. The healthcare data are also received by IoT devices in the form of data stream and divided into cluster by using stream clustering combining the cluster building and merging steps [18].

In above analysis, the clustering of data is an extremely challenging issue, while it creates baffling to select the optimal clustering strategy in nature. Additionally, it is to be an incredibly exigent attempt as every dataset is not expected to be consistent, allowing for the fact that infection form and previous medical circumstance area might differ immensely in training. Previously, having a meticulous prophecy scheme does not constantly offer a valuable height of exactness beneath entire healthcare application areas due to mostly depending on the situation utilized.

A lot of clustering algorithms are introduced on various datasets to enhance the data extraction. Several optimization approaches like GA, ALO, and PSO are also described for clustering. Although few clustering techniques realize an advanced output by means of a specified dataset, the efficiency of such clustering techniques might be complementary on other datasets. The nature of the clustering methods is steady by means of no-free-lunch theorem, but at hand no solitary clustering method survives which is able to be a cure for entire problems. This means all the problems are not resolved by any algorithm and convergence speed is also a major issue in optimization techniques, which are used for data clustering.

Here, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is implemented for clustering over healthcare IoT datasets, which is improved form of Biogeography-Based Optimization (BO) approach. In its place of basically performing a qualitative examination by utilizing a methodical mapping analysis regarding prior implemented works, this proposed work contributes on a quantitative examination of clustering strategy for healthcare dataset.

The contribution of implementing CBO-IE is as follows:(1)The major intent of CBO-IE is to obtain accurate data point distribution in dataset by utilizing Information Entropy strategy and to give initial values of the population by utilizing chaos theory.(2)Both Information Entropy and chaos theory are introduced to enhance the convergence speed of BO in exploration field globally for generating the cluster heads and cluster members more precisely.(3)The CBO-IE is applied over eight healthcare IoT datasets and the outputs are evaluated on the basis of F-Measure, intracluster distance, running time complexity, purity index, statistical analysis, standard deviation, root mean square error, and accuracy as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches.

The remainder of the paper is organized as follows: Section 2 explains the literature survey of data mining techniques, big data processing, and IoT dataset processing by utilizing numerous parameters. Section 3 describes the BO approach, and the proposed CBO-IE is explained in Section 4 briefly according to flowchart, algorithms, and preliminaries. The datasets, performance factors, and experimental results are illustrated in Section 5, and at last conclusions are given in Section 6.

2. Literature Survey

The IoT [19] is situated to transfigure entire future consequently for lives. The information extracted from the IoT systems will be explored to recognize and organize multifaceted surroundings about users, fitting superior decision creation, larger computerization, advanced effectiveness, efficiency, exactness, and prosperity production. This work explores the accessibility of multiple popular data mining approaches for IoT information. The results are evaluated over artificial intelligence and neural network with superior accuracy, efficiency, and velocity in comparison to prior techniques [20].

The K-Means approach is applied for processing the big data with the help of Hadoop platform over Internet of Things (IoT) aspects. The data from IoT devices are combined or distributed in several groups, known as clusters, which are used in various smart applications such as medical, disaster, and industries applications. The power expenses and traffic over communication network was reduced to use only necessary information, not whole raw data. In actuality, it supplies enormous flexibility, permitting the formation of clusters having millions of Hadoop illustrations [21].

This big data has raised the concept of space and time complexity due to storing and processing huge amount of data over mobile and dynamic network. The Extensible Mark-up Language (XML) and machine learning methods apply to big data processing like classification, clustering, and preprocessing of IoT information [22, 23]. Latest IoT applications and performances remain further on an intellectual indulgence of the surroundings from information extracted by means of assorted sensors and small machines. Here a framework combines an ontology dependent description of information dispensations by means of unusual logic for pretty beneficial incident recognition by seasoning the specific classification strategy of machine learning. A road and traffic examination is performed to verify the results of framework [24].

The personal data of users are digitally distributed over network through smart IoT devices and multiple artificial techniques are introduced to control the digitally transferred information following few rules and policies. An intellectual police examination strategy is introduced, discovering the contradictory policy or agreements of users over advocate platform. An intellectual decision-making method is applied with the help of fuzzy cognitive maps [25].

The business is directly or indirectly related to customer behaviour, evaluated by trust mining. The trust of sellers is evaluated by dynamic clustering, where data is modified day by day over huge network area. The quick development of online shopping details illustrates a growing issue for customers who are comprised to select faithful sellers and successful seller selections from numerous existing record beginning e-commerce usage areas to enhance the business. A dynamic clustering method is adopted to calculate the trust of customers and differentiate the buying equivalence of customers for predicting the customer behaviour. The real-time and artificial datasets are utilized to perform clustering and evaluate the results based on accuracy [26].

The medical field [27] also well utilizes the mining techniques like classification to identify the mental health and evaluate the risk to manage the system. This classification is done by decision trees to discover and classify the patients according to their mental health. Here, a relative calculation of a huge array classifier is associated with several areas like tree, group, neural, possibility, categorization, and policy dependent classifiers. The linear and random classifiers are introduced to perform disease classification [28].

The cancer disease in unstructured format is well classified to discover several types of cancer disease. It generates valuable results in terms of risk aspects, treatment, and management. The text mining is well exploited in cancer disease prediction like lung, breast, and ovarian cancer. The data about cancer patients is collected from several heterogeneous resources and text mining is performed to provide various cancer related information like survival, treatment, and risk of disease [29].

The geographical data is collected from several heterogeneous resources and data mining is initialized over the spatial data to manage the disaster. The natural hazards are previously identified before happening; this is achieved by spatial data mining over huge geographic information. The several geographical information such as soil, ocean, earth’s crust, and air quality is used to employ the mining process with the help of PSO and fuzzy logics. The clusters are decided on the basis of spatial information and natural behaviours in environments [30].

The data mining based educational information extraction is a new era of research, in which distributed educational data is compiled and used for several purposes. The students, teachers, and other staffs have played an important role in educational data mining. The educational data is collected from several resources and saved in tabular format. These student data can be further analyzed by university and examined by data mining model to evaluate the results of students and maintain the records in appropriate manner [31].

The classification is also performed over data streams, combining the text, music, and video information with the help of decision trees and naive structure. These data streams also combined the human activities like movement of hands and legs [32, 33]. There are various key issues of classification like infinite size, perception development, and perception flow and characteristic assessment. Various researchers mainly solved the issues of infinite size and perception development, but the perception flow and characteristic assessment are not taken into consideration by researchers. Here, these two issues are removed in implementing a classification method over big data and parameters like error rate and running time are obtained to provide superior efficiency of the method [34].

3. The Biogeography-Based Optimization (BO) Approach

A Biogeography-Based Optimization (BO) is a metaheuristic approach based on island biogeography premise, which relates with the migration, evolution, and destruction of the castes in a habitat. A Habitat Suitability Index (HSI) is utilized to generate the best solution globally and share the characteristics with week habitat. The population is enhanced by taking the optimal solutions from prior iteration from emigrating [35] to immigrating habitats by using the migrating Suitability Index Variables (SIVs) in BO migrating process. A fresh characteristic is obtained in complete solution area using fitness function [36] and replaced every habitat’s SIVs arbitrarily and stochastically to enhance the population [37] miscellany and searching strength in BO mutation process.

3.1. Migration Process

In migration stochastic process, several parents can be utilized for a particular offspring and every habitat (Ha) is modified by receiving the SIVs from a superior HSI habitat in population (PS). The migration rates are straightly dependent on the caste number in a habitat for enhancing [38] the habitat miscellany and linear migration is evaluated by using the following equation:where is a superior HSI bth habitat (Hb) to transfer the SIV value to the ath habitat (Ha).

The emigration rate (α) and immigration rate (β) are evaluated for c castes in habitat by using the two following equations:

Here, G and M are highest emigration and immigration rates, respectively, and is maximum realizable number of castes utilized by habitat.

There are six models developed for migration phase, out of which sinusoidal model performed superior migration as compared to the other five models. This model is evaluated for ath habitat by using the two following equations:

3.2. Mutation Process

A mutation is a stochastic operator enhancing the population miscellany to obtain optimal solution [39]. The mutation process is utilized for updating at least one arbitrary chosen SIV of a solution with the help of mutation rate (Ta) and the previous possibility of subsistence (Sa) by using the following equation:where is the highest mutation rate term illustrated by user and is the highest possibility of castes count.

After that the mutation operator is improved by utilizing Cauchy model to provide optimal exploration strength of BO in huge searching area by reducing potential limitation of mutation strategy. The Cauchy distribution is evaluated with the help of probability density function, which is formulized by the following equation:

Hence, the Cauchy mutation equation is described by using the following equation:where is ath habitat. is Cauchy distribution (Algorithm 1).

START
 Initialize the parameters G = M = 1, Tmaximum = 1, PS and Max_iteration
 Initialize the populations (arbitrary group of habitats) H1, H2,.........,
 Evaluate fitness value (HSI) for every habitat
WHILE (ending condition is not found)
  Evaluate αa, βa and Ta for every habitat
  Obtain a //Migration
  FOR every habitat (from minimum to maximum HSI values)
   Choose habitat Ha (SIV) stochastically proportional to βa
   IF random < βa and Ha (SIV) choose, then
    Choose habitat Hb (SIV) stochastically proportional to αa
    IF random < αa and Hb (SIV) choose, then
     Ha (SIV) = Hb (SIV)
    END IF
   END IF
  END FOR//MUTATION
  Choose Ha (SIV) with the help of mutation possibility proportional Sa
  IF random < Ta then
   Arbitrary change the SIVs in Ha (SIV)
  END IF
  Evaluate HSI value
END WHILE
STOP

4. The Chaotic BO Approach Using Information Entropy (CBO-IE)

An updated BO approach is proposed using the chaotic behaviour and Information Entropy. The BO approach is well suitable for exploration and exploitation in huge searching space and generates the efficient results for numeric optimization [40]. Still, the numeric optimization is reasonably different from data clustering mechanism. Here, some intrinsic characteristics are investigated and a chaotic BO is implemented using Information Entropy to apply the BO for data clustering.

4.1. Element Information Entropy

The data points are distributed in multidimensional area in data clustering strategy. The confusion degree of a probabilistic variable is calculated by utilizing Information Entropy; after that, this is utilized for elements measurements to calculate the distribution. The entropy (Ea) is evaluated to isolate the element values with estimation of every value to its closer integer by using the following equation:where Ea is the entropy of ath element (a = 1, 2, 3, ..., L) and L is number of elements in dataset. mina and maxa are lowest and highest integers after discretization of element values. Rb is bth element integer percentage value.

The migration procedure of BO population is guarded by using element entropy; hence, maximum entropy is evaluated for every element in dataset by the following equation:where maximum (Ea) is maximum entropy of ath element in dataset.

At last, the normalized entropy for every element is calculated by the following equation:where normalized (Ea) is normalized entropy of ath element in dataset.

Equation (11) is repeated for all data elements in dataset and normalized entropy set (normalized (E)) is generated as follows:

Normalized (E) = (normalized (E1), normalized (E2),............, normalized (EL)).

4.2. Information Entropy-Based Migration Process in the BO Approach

Here, two mechanisms are introduced in migration to improve the performance of BO approach. The first one is original migration process of BO explained in the previous section. The second one is updated mechanism of original migration in which P particulars are elected arbitrarily from present population () and the one optimal particular is assigned as reference particular (Href). The direction of migration is guided by Href and Information Entropy is utilized to provide equivalency between population miscellany and convergence speed. The higher Information Entropy of element indicates the maximum uncertainty, which slows down the convergence speed. Hence, the speed is enhanced for element by transferring it to the location based on reference particular, globally with maximum possibility, as compared to the least Information Entropy element.

4.3. Chaos-Based Population Selection of the BO Approach

The chaos method is extremely related to initial circumstances and successfully utilized for arbitrary number generation using logistic function. The chaotic system is used in the following equation:where represents constants in the range of [1, 4]. represents variables .

The population of BO is initialized by utilizing chaos function (equation (12)), which has improved the efficiency of BO with proper use of huge solution area.

4.4. The Complete CBO-IE Approach for Data Mining

The Information Entropy-based migration and chaos-based population selection are combined in Chaotic Biogeography-Based Optimization using Information Entropy (CBO-IE) to solve the data mining, that is, data clustering in optimal way (Figure 1). Firstly, the population (PS) of BO approach is initialized with the help of chaos function (equation (12)) and every particular is denoted as a vector with size (L is number of elements in dataset, K is number of cluster centroids, and Lx is dimension of particulars). The K centroids locations are fixed into vector, in which 1st centroid relates with 1stL attributes, 2nd centroid relates to 2ndL attributes, and so on. The primary particular vector values are generated arbitrarily and consistently between lower and higher elements values in existing dataset within maximum number of iterations (Mi). Then, CBO-IE is applied to calculate the fitness values of entire particulars by using equations (1) to (8) (Algorithm 2).

5. Results and Analysis

In this section, the healthcare IoT datasets and performance factors for proposed CBE-IE approach are explained briefly. The entire approaches are implemented with the help of MATLAB 2021a tool using Core i3-3110M processor having Windows 8 operating system with 2 GB RAM and analyzed over 8 healthcare IoT datasets (Table 1). The proposed CBO-IE approach has evaluated the results for 500 iterations in terms of intracluster distance, purity index, standard deviation, root mean square error, accuracy, and F-Measure as compared to prior algorithms like K-Means, ACO, PSO, ALO, and BO over 30 independent runs.

5.1. Healthcare IoT Datasets

The proposed CBO-IE approach is applied on 8 dissimilar healthcare IoT datasets taken from UCI repository (Table 1). The datasets are thoracic surgery, breast cancer, cryotherapy, liver patient, heart patient, chronic kidney disease, diabetic retinopathy, and blood transfusion. The 470 instances of thoracic surgery dataset with 17 attributes are divided into 2 classes (survive and not survive), the 699 instances of breast cancer dataset with 10 attributes are distributed into 2 classes (benign and malignant), the 90 instances of cryotherapy dataset with 7 attributes are divided into 2 classes (success and failure), the 583 instances of liver patient dataset with 10 attributes are distributed into 2 classes (liver patient and not), the 299 instances of heart patient dataset with 12 attributes are divided into 2 classes (heart failure and not), the 400 instances of chronic kidney disease with 24 attributes are divided into 2 classes (disease found and not), the 1151 instances of diabetic retinopathy dataset with 18 attributes are distributed into 2 classes (diabetic retinopathy and not), and the 748 instances of blood transfusion dataset with 4 attributes are divided into 2 classes (donating blood and not).

5.2. Performance Factors

The performance of proposed CBO-IE approach is evaluated in terms of intracluster distance, purity index, standard deviation, root mean square error, accuracy, and F-Measure.

5.2.1. Intracluster Distance

Firstly, the distances between data points are evaluated within a cluster. After that, the average of these distances is generated representing an intracluster distance. The optimal clustering is achieved with least intracluster distance. For a cluster, the mean distance is evaluated between a centroid and total data points. This step is continuously performed for every cluster and at last mean value of entire clusters’ intracluster distances is obtained.

Table 2 illustrates that the CBO-IE generates least intracluster distance value for all IoT datasets. The CBO-IE obtains 90% superior results than BO, 92% superior results than ALO, 94% superior results than PSO, 96% superior results than GA, and 98% superior results than K-Means in terms of intracluster distance for entire healthcare IoT datasets. The average ranking of all approaches is evaluated on the basis of minimum to maximum intracluster distance (from 1 to 6).

Figure 2 represents the better rank of proposed CBO-IE with least intracluster distance values against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.2. Standard Deviation

The stiff data clustering in the region of average value is described by a numerical characteristic called standard deviation (DS). The best clustering is achieved with less standard deviation, which is obtained by the following equation:where is dataset length, V is dataset values (points), and is dataset average value.

Table 3 describes that the CBO-IE obtains less standard deviation value for all IoT datasets. The CBO-IE generates 93% better quality outcomes than BO, 95% better quality outcomes than PSO and ALO, 97% better quality outcomes than GA, and 98% better quality outcomes than K-Means in terms of standard deviation for entire healthcare IoT datasets.

Figure 3 shows the better standard deviation of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.3. Purity Index

The correctness of clustering strategy is known as purity, in which accurate classification is performed over data elements. Hence, entire points of an isolated class can be accurately allocated to an isolated cluster. Purity index (IP) is generated with the help of purity by equations (14) and (15). Maximum purity is achieved with maximum IP value nearer to 1.where is purity of zth cluster. is zth cluster length. is the number of data elements of class allocated to zth cluster.

Table 4 explains that the CBO-IE generates maximum purity index value for all IoT datasets. The CBO-IE obtains 5% superior outputs than BO, 7% superior outputs than ALO, 8% superior outputs than PSO, 12% superior outputs than GA, and 15% superior outputs than K-Means in terms of purity index for entire healthcare IoT datasets.

Figure 4 represents the better quality purity index of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.4. F-Measure

Firstly Precision (prsn) and Recall (rel) are evaluated to repossess the information by equations (16) and (17). After that, both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19).where is wth class length.

Table 5 describes that the CBO-IE obtains higher F-Measure value for all IoT datasets. The CBO-IE generates 4% better outputs than BO, 7% better outputs than ALO, 8% better outputs than PSO, 13% better outputs than GA, and 16% better outputs than K-Means in terms of F-Measure for entire healthcare IoT datasets.

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.5. Root Mean Square Error (RMSE)

The RMSE is defined as a divergence between predicted values and experimental (calculated) values. The best clustering is achieved with minimum RMSE values of datasets. It is evaluated by utilizing the following equation:where is jth element value in dataset and is jth element predicted value in dataset.

Table 6 explains that the CBO-IE generates least RMSE value for all IoT datasets. The CBO-IE generates 8% better results than BO, 78% better results than ALO, 81% better results than PSO, 87% better results than GA, and 95% better results than K-Means in terms of RMSE for entire healthcare IoT datasets.

Figure 6 shows the better RMSE of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.6. Accuracy

It evaluates the division of clusters that are accurate (i.e., it evaluates proportion of verdicts that are exact) and describes the portion of clusters in the prevailing group:

Table 7 represents that the CBO-IE obtains higher accuracy value for all IoT datasets. The CBO-IE has 3% superior outputs than BO, 5% superior outputs than ALO, 6% superior outputs than PSO, 9% superior outputs than GA, and 13% superior outputs than K-Means in terms of accuracy for entire healthcare IoT datasets.

Figure 7 represents the better accuracy of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

The Information Entropy is used with BO approach for clustering to provide accurate and efficient distribution of data points in dataset. The chaos theory is utilized to initialize the population of BO to improve the searching capability of habitat, which helps in cluster member selection process more precisely. These two strategies, Information Entropy and chaos theory, have improved the performance of BO to provide the best selection of cluster heads and their members optimally. So, CBO-IE generates superior results than BO, ALO, PSO, GA, and K-Means clustering approaches.

5.3. Running Time Complexity of the CBO-IE Approach

The number of executions is directly related to time complexity of clustering approaches to run. The complexity is calculated by utilizing few circumstances as follows: PS is population size, L is number of elements in dataset, Lx represents dimensions of particulars (elements), K is number of cluster centroids, and Mi represents maximum iterations. The cost of every execution is assumed as 1 unit. The Cost of all Executions (CE) is obtained to utilize Algorithm 3 by the following equations:

Supposing that all circumstances are equal in equation (23) in worst case, equation (24) is generated as follows:

The running time complexities of clustering approaches are O(n3) for CBO-IE, O(n3) for BO, O(n3) for ACO, O(n3) for PSO, O(n3) for ALO, and O(n2) for K-Menas in worst case. Hence, all are executable in polynomial time.

5.4. Statistical Examination Measurement

A statistical examination is functioned to obtain the extension of significance dissimilarities in the effectiveness of clustering techniques. Here, a nonparametric Friedman Examination (FE) is utilized to discover dissimilarities amid the group of serial appropriate variables. The entire clustering techniques provide equally effective explaining in the null hypothesis (Y0).

The Friedman Examination (FE) is formulized bywhere NDS is number of datasets, is average rank of approach, and AC is number of clustering approaches.

The FE critical value is 2.04925 taken from F-distribution table [41] with (AC-1) and (AC–1)(NDS–1) freedom degree, which is obtained between (6–1) = 5 and (6-1)(8-1) = 35 for applying 6 clustering approaches (AC = 6) over 8 datasets (NDS = 8) with λ = 0.10 (confidence stage). The calculated FE value is higher as compared to the critical value for null hypothesis (Y0) rejection, if Y0 is not accepted. The evaluated FE value is 17.14 for initiating 6 clustering approaches (AC = 6) over 8 datasets (NDS = 8) with λ = 0.10. Hence, the calculated FE is higher as compared to the critical FE, and then Y0 is rejected. So, it is to be summarized that entire clustering approaches are not equally effective.

Therefore, a post hoc examination is performed using Holm strategy. The proposed CBE-IE is analyzed statistically against other clustering approaches in this examination. Firstly the z value is obtained from equation (24) and after that probability (p) is generated with the help of z value and normal distribution table [42]. At last, pj value is analyzed with (Table 8):where

Table 8 illustrates that the value is higher as compared to pj value; this indicates the hypothesis rejection for entire cases. Hence, the proposed CBO-IE is superior in clustering as compared to the K-Means, GA, PSO, and BO approaches according to the above analysis.

6. Conclusion

Various fields like medicine, education, and industries utilize data mining for their useful applications. The grouping of data is performed on data clustering, which is a specific task in data mining to examine the database efficiently. In this work, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is proposed to obtain data clusters for healthcare IoT datasets. The Information Entropy is introduced with a BO approach to generate a better distribution of data points in the dataset and chaos theory is utilized to initialize the population of BO approach. The Information Entropy and chaos theory are combined with BO approach to generate optimal cluster heads and cluster members with enhanced convergence speed of BO in huge search area. The MATLAB 2021a tool is used to implement the CBO-IE for eight healthcare IoT datasets and the outcomes describe the better quality efficiency of CBO-IE on the basis of F-Measure, intracluster distance, running time complexity, purity index, statistical analysis, standard deviation, root mean square error, and accuracy as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches. The Friedman Examination and Holm strategy are introduced to perform statistical analysis of proposed CBO-IE against previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches, which represent that the CBO-IE generates 90% accurate results. In future, the proposed technique will be anticipated to estimate and be authorized with huge databases under big data. Additionally, a cross-layer communication will be anticipated to be offered and legalized in IoT structural design in the future.

Data Availability

The data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.