CBO-IE: A Data Mining Approach for Healthcare IoT Dataset Using Chaotic Biogeography-Based Optimization and Information Entropy

Ahirwar, Manish Kumar; Shukla, Piyush Kumar; Singhai, Rakesh

doi:https://doi.org/10.1155/2021/8715668

Scientific Programming

On this page

Abstract Introduction Results and Analysis Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Next-Generation Optimization Models and Algorithms in Cloud and Fog Computing

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 8715668 | https://doi.org/10.1155/2021/8715668

CBO-IE: A Data Mining Approach for Healthcare IoT Dataset Using Chaotic Biogeography-Based Optimization and Information Entropy

Manish Kumar Ahirwar,¹Piyush Kumar Shukla,¹and Rakesh Singhai²

Academic Editor: Punit Gupta

Received24 Jul 2021

Accepted22 Sept 2021

Published08 Oct 2021

Abstract

Data mining is mostly utilized for a huge variety of applications in several fields like education, medical, surveillance, and industries. The clustering is an important method of data mining, in which data elements are divided into groups (clusters) to provide better quality data analysis. The Biogeography-Based Optimization (BO) is the latest metaheuristic approach, which is applied to resolve several complex optimization problems. Here, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is implemented to perform clustering over healthcare IoT datasets. The main objective of CBO-IE is to provide proficient and precise data point distribution in datasets by using Information Entropy concepts and to initialize the population by using chaos theory. Both Information Entropy and chaos theory are facilitated to improve the convergence speed of BO in global search area for selecting the cluster heads and cluster members more accurately. The CBO-IE is implemented to a MATLAB 2021a tool over eight healthcare IoT datasets, and the results illustrate the superior performance of CBO-IE based on F-Measure, intracluster distance, running time complexity, purity index, statistical analysis, root mean square error, accuracy, and standard deviation as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches.

1. Introduction

The big data [1, 2] is used and analyzed in several wireless applications by utilizing various characteristics like storage, processing, and maintenance of data. The preprocessing of a huge amount of data is performed before analysis to reduce the data redundancy with enhancing the data accuracy and efficiency [3–5]. The big data is processed by using various nature inspired optimization approaches like Genetic Algorithm (GA), Ant Colony Optimization (ACO), Ant Lion Optimization (ALO), and Particle Swarm Optimization (PSO) to perform optimal analysis [6].

The data mining [7–9] is a systematic procedure utilized for extracting the secret knowledge and model from an immense, multifaceted, and multidimensional dataset [10]. The association rule mining is one of the famous data mining approaches introduced with MapReduce concept to evaluate the relationship among data elements in a huge dataset [11]. These data relationships are recognized to generate maximum profit from marketing through IoT devices in industries. The time and security are crucial issues in business, which are frequently resolved by using data mining [12–14].

The data clustering [15] is a type of data mining, in which data components are divided into various sets or groups. The data are collected from various heterogeneous resources; after that time series clustering is applied on this huge amount of data to improve the data accessibility. The industry data are distributed more precisely and accurately by clustering to predict the future aspects of market [16]. The cybercrime data are analyzed after preprocessing to perform training and testing of clustering techniques. The nature of crime is easily understood and detected by clustering the similar type of crime patterns to reduce the crime ratio [17]. The healthcare data are also received by IoT devices in the form of data stream and divided into cluster by using stream clustering combining the cluster building and merging steps [18].

In above analysis, the clustering of data is an extremely challenging issue, while it creates baffling to select the optimal clustering strategy in nature. Additionally, it is to be an incredibly exigent attempt as every dataset is not expected to be consistent, allowing for the fact that infection form and previous medical circumstance area might differ immensely in training. Previously, having a meticulous prophecy scheme does not constantly offer a valuable height of exactness beneath entire healthcare application areas due to mostly depending on the situation utilized.

A lot of clustering algorithms are introduced on various datasets to enhance the data extraction. Several optimization approaches like GA, ALO, and PSO are also described for clustering. Although few clustering techniques realize an advanced output by means of a specified dataset, the efficiency of such clustering techniques might be complementary on other datasets. The nature of the clustering methods is steady by means of no-free-lunch theorem, but at hand no solitary clustering method survives which is able to be a cure for entire problems. This means all the problems are not resolved by any algorithm and convergence speed is also a major issue in optimization techniques, which are used for data clustering.

Here, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is implemented for clustering over healthcare IoT datasets, which is improved form of Biogeography-Based Optimization (BO) approach. In its place of basically performing a qualitative examination by utilizing a methodical mapping analysis regarding prior implemented works, this proposed work contributes on a quantitative examination of clustering strategy for healthcare dataset.

The contribution of implementing CBO-IE is as follows:(1)The major intent of CBO-IE is to obtain accurate data point distribution in dataset by utilizing Information Entropy strategy and to give initial values of the population by utilizing chaos theory.(2)Both Information Entropy and chaos theory are introduced to enhance the convergence speed of BO in exploration field globally for generating the cluster heads and cluster members more precisely.(3)The CBO-IE is applied over eight healthcare IoT datasets and the outputs are evaluated on the basis of F-Measure, intracluster distance, running time complexity, purity index, statistical analysis, standard deviation, root mean square error, and accuracy as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches.

The remainder of the paper is organized as follows: Section 2 explains the literature survey of data mining techniques, big data processing, and IoT dataset processing by utilizing numerous parameters. Section 3 describes the BO approach, and the proposed CBO-IE is explained in Section 4 briefly according to flowchart, algorithms, and preliminaries. The datasets, performance factors, and experimental results are illustrated in Section 5, and at last conclusions are given in Section 6.

2. Literature Survey

The IoT [19] is situated to transfigure entire future consequently for lives. The information extracted from the IoT systems will be explored to recognize and organize multifaceted surroundings about users, fitting superior decision creation, larger computerization, advanced effectiveness, efficiency, exactness, and prosperity production. This work explores the accessibility of multiple popular data mining approaches for IoT information. The results are evaluated over artificial intelligence and neural network with superior accuracy, efficiency, and velocity in comparison to prior techniques [20].

The K-Means approach is applied for processing the big data with the help of Hadoop platform over Internet of Things (IoT) aspects. The data from IoT devices are combined or distributed in several groups, known as clusters, which are used in various smart applications such as medical, disaster, and industries applications. The power expenses and traffic over communication network was reduced to use only necessary information, not whole raw data. In actuality, it supplies enormous flexibility, permitting the formation of clusters having millions of Hadoop illustrations [21].

This big data has raised the concept of space and time complexity due to storing and processing huge amount of data over mobile and dynamic network. The Extensible Mark-up Language (XML) and machine learning methods apply to big data processing like classification, clustering, and preprocessing of IoT information [22, 23]. Latest IoT applications and performances remain further on an intellectual indulgence of the surroundings from information extracted by means of assorted sensors and small machines. Here a framework combines an ontology dependent description of information dispensations by means of unusual logic for pretty beneficial incident recognition by seasoning the specific classification strategy of machine learning. A road and traffic examination is performed to verify the results of framework [24].

The personal data of users are digitally distributed over network through smart IoT devices and multiple artificial techniques are introduced to control the digitally transferred information following few rules and policies. An intellectual police examination strategy is introduced, discovering the contradictory policy or agreements of users over advocate platform. An intellectual decision-making method is applied with the help of fuzzy cognitive maps [25].

The business is directly or indirectly related to customer behaviour, evaluated by trust mining. The trust of sellers is evaluated by dynamic clustering, where data is modified day by day over huge network area. The quick development of online shopping details illustrates a growing issue for customers who are comprised to select faithful sellers and successful seller selections from numerous existing record beginning e-commerce usage areas to enhance the business. A dynamic clustering method is adopted to calculate the trust of customers and differentiate the buying equivalence of customers for predicting the customer behaviour. The real-time and artificial datasets are utilized to perform clustering and evaluate the results based on accuracy [26].

The medical field [27] also well utilizes the mining techniques like classification to identify the mental health and evaluate the risk to manage the system. This classification is done by decision trees to discover and classify the patients according to their mental health. Here, a relative calculation of a huge array classiﬁer is associated with several areas like tree, group, neural, possibility, categorization, and policy dependent classiﬁers. The linear and random classifiers are introduced to perform disease classification [28].

The cancer disease in unstructured format is well classified to discover several types of cancer disease. It generates valuable results in terms of risk aspects, treatment, and management. The text mining is well exploited in cancer disease prediction like lung, breast, and ovarian cancer. The data about cancer patients is collected from several heterogeneous resources and text mining is performed to provide various cancer related information like survival, treatment, and risk of disease [29].

The geographical data is collected from several heterogeneous resources and data mining is initialized over the spatial data to manage the disaster. The natural hazards are previously identified before happening; this is achieved by spatial data mining over huge geographic information. The several geographical information such as soil, ocean, earth’s crust, and air quality is used to employ the mining process with the help of PSO and fuzzy logics. The clusters are decided on the basis of spatial information and natural behaviours in environments [30].

The data mining based educational information extraction is a new era of research, in which distributed educational data is compiled and used for several purposes. The students, teachers, and other staffs have played an important role in educational data mining. The educational data is collected from several resources and saved in tabular format. These student data can be further analyzed by university and examined by data mining model to evaluate the results of students and maintain the records in appropriate manner [31].

The classification is also performed over data streams, combining the text, music, and video information with the help of decision trees and naive structure. These data streams also combined the human activities like movement of hands and legs [32, 33]. There are various key issues of classification like infinite size, perception development, and perception flow and characteristic assessment. Various researchers mainly solved the issues of infinite size and perception development, but the perception flow and characteristic assessment are not taken into consideration by researchers. Here, these two issues are removed in implementing a classification method over big data and parameters like error rate and running time are obtained to provide superior efficiency of the method [34].

3. The Biogeography-Based Optimization (BO) Approach

A Biogeography-Based Optimization (BO) is a metaheuristic approach based on island biogeography premise, which relates with the migration, evolution, and destruction of the castes in a habitat. A Habitat Suitability Index (HSI) is utilized to generate the best solution globally and share the characteristics with week habitat. The population is enhanced by taking the optimal solutions from prior iteration from emigrating [35] to immigrating habitats by using the migrating Suitability Index Variables (SIVs) in BO migrating process. A fresh characteristic is obtained in complete solution area using fitness function [36] and replaced every habitat’s SIVs arbitrarily and stochastically to enhance the population [37] miscellany and searching strength in BO mutation process.

3.1. Migration Process

In migration stochastic process, several parents can be utilized for a particular offspring and every habitat (H_a) is modified by receiving the SIVs from a superior HSI habitat in population (P^S). The migration rates are straightly dependent on the caste number in a habitat for enhancing [38] the habitat miscellany and linear migration is evaluated by using the following equation:where is a superior HSI b^th habitat (H_b) to transfer the SIV value to the a^th habitat (H_a).

The emigration rate (α) and immigration rate (β) are evaluated for c castes in habitat by using the two following equations:

Here, G and M are highest emigration and immigration rates, respectively, and is maximum realizable number of castes utilized by habitat.

There are six models developed for migration phase, out of which sinusoidal model performed superior migration as compared to the other five models. This model is evaluated for a^th habitat by using the two following equations:

3.2. Mutation Process

A mutation is a stochastic operator enhancing the population miscellany to obtain optimal solution [39]. The mutation process is utilized for updating at least one arbitrary chosen SIV of a solution with the help of mutation rate (T_a) and the previous possibility of subsistence (S_a) by using the following equation:where is the highest mutation rate term illustrated by user and is the highest possibility of castes count.

After that the mutation operator is improved by utilizing Cauchy model to provide optimal exploration strength of BO in huge searching area by reducing potential limitation of mutation strategy. The Cauchy distribution is evaluated with the help of probability density function, which is formulized by the following equation:

Hence, the Cauchy mutation equation is described by using the following equation:where is a^th habitat. is Cauchy distribution (Algorithm 1).

	START
Initialize the parameters G = M = 1, T_maximum = 1, P^S and Max_iteration
Initialize the populations (arbitrary group of habitats) H₁, H₂,.........,
Evaluate fitness value (HSI) for every habitat
WHILE (ending condition is not found)
Evaluate α_a, β_a and T_a for every habitat
Obtain a //Migration
FOR every habitat (from minimum to maximum HSI values)
Choose habitat H_a (SIV) stochastically proportional to β_a
IF random < β_a and H_a (SIV) choose, then
Choose habitat H_b (SIV) stochastically proportional to α_a
IF random < α_a and H_b (SIV) choose, then
H_a (SIV) = H_b (SIV)
END IF
END IF
END FOR//MUTATION
Choose H_a (SIV) with the help of mutation possibility proportional S_a
IF random < T_a then
Arbitrary change the SIVs in H_a (SIV)
END IF
Evaluate HSI value
END WHILE
STOP

4. The Chaotic BO Approach Using Information Entropy (CBO-IE)

An updated BO approach is proposed using the chaotic behaviour and Information Entropy. The BO approach is well suitable for exploration and exploitation in huge searching space and generates the efficient results for numeric optimization [40]. Still, the numeric optimization is reasonably different from data clustering mechanism. Here, some intrinsic characteristics are investigated and a chaotic BO is implemented using Information Entropy to apply the BO for data clustering.

4.1. Element Information Entropy

The data points are distributed in multidimensional area in data clustering strategy. The confusion degree of a probabilistic variable is calculated by utilizing Information Entropy; after that, this is utilized for elements measurements to calculate the distribution. The entropy (E_a) is evaluated to isolate the element values with estimation of every value to its closer integer by using the following equation:where E_a is the entropy of a^th element (a = 1, 2, 3, ..., L) and L is number of elements in dataset. min_a and max_a are lowest and highest integers after discretization of element values. R_b is b^th element integer percentage value.

The migration procedure of BO population is guarded by using element entropy; hence, maximum entropy is evaluated for every element in dataset by the following equation:where maximum (E_a) is maximum entropy of a^th element in dataset.

At last, the normalized entropy for every element is calculated by the following equation:where normalized (E_a) is normalized entropy of a^th element in dataset.

Equation (11) is repeated for all data elements in dataset and normalized entropy set (normalized (E)) is generated as follows:

Normalized (E) = (normalized (E₁), normalized (E₂),............, normalized (E_L)).

4.2. Information Entropy-Based Migration Process in the BO Approach

Here, two mechanisms are introduced in migration to improve the performance of BO approach. The first one is original migration process of BO explained in the previous section. The second one is updated mechanism of original migration in which P particulars are elected arbitrarily from present population () and the one optimal particular is assigned as reference particular (H_ref). The direction of migration is guided by H_ref and Information Entropy is utilized to provide equivalency between population miscellany and convergence speed. The higher Information Entropy of element indicates the maximum uncertainty, which slows down the convergence speed. Hence, the speed is enhanced for element by transferring it to the location based on reference particular, globally with maximum possibility, as compared to the least Information Entropy element.

4.3. Chaos-Based Population Selection of the BO Approach

The chaos method is extremely related to initial circumstances and successfully utilized for arbitrary number generation using logistic function. The chaotic system is used in the following equation:where represents constants in the range of [1, 4]. represents variables .

The population of BO is initialized by utilizing chaos function (equation (12)), which has improved the efficiency of BO with proper use of huge solution area.

4.4. The Complete CBO-IE Approach for Data Mining

The Information Entropy-based migration and chaos-based population selection are combined in Chaotic Biogeography-Based Optimization using Information Entropy (CBO-IE) to solve the data mining, that is, data clustering in optimal way (Figure 1). Firstly, the population (P^S) of BO approach is initialized with the help of chaos function (equation (12)) and every particular is denoted as a vector with size (L is number of elements in dataset, K is number of cluster centroids, and L_x is dimension of particulars). The K centroids locations are fixed into vector, in which 1^st centroid relates with 1^stL attributes, 2^nd centroid relates to 2^ndL attributes, and so on. The primary particular vector values are generated arbitrarily and consistently between lower and higher elements values in existing dataset within maximum number of iterations (Mi). Then, CBO-IE is applied to calculate the fitness values of entire particulars by using equations (1) to (8) (Algorithm 2).

5. Results and Analysis

In this section, the healthcare IoT datasets and performance factors for proposed CBE-IE approach are explained briefly. The entire approaches are implemented with the help of MATLAB 2021a tool using Core i3-3110M processor having Windows 8 operating system with 2 GB RAM and analyzed over 8 healthcare IoT datasets (Table 1). The proposed CBO-IE approach has evaluated the results for 500 iterations in terms of intracluster distance, purity index, standard deviation, root mean square error, accuracy, and F-Measure as compared to prior algorithms like K-Means, ACO, PSO, ALO, and BO over 30 independent runs.

5.1. Healthcare IoT Datasets

The proposed CBO-IE approach is applied on 8 dissimilar healthcare IoT datasets taken from UCI repository (Table 1). The datasets are thoracic surgery, breast cancer, cryotherapy, liver patient, heart patient, chronic kidney disease, diabetic retinopathy, and blood transfusion. The 470 instances of thoracic surgery dataset with 17 attributes are divided into 2 classes (survive and not survive), the 699 instances of breast cancer dataset with 10 attributes are distributed into 2 classes (benign and malignant), the 90 instances of cryotherapy dataset with 7 attributes are divided into 2 classes (success and failure), the 583 instances of liver patient dataset with 10 attributes are distributed into 2 classes (liver patient and not), the 299 instances of heart patient dataset with 12 attributes are divided into 2 classes (heart failure and not), the 400 instances of chronic kidney disease with 24 attributes are divided into 2 classes (disease found and not), the 1151 instances of diabetic retinopathy dataset with 18 attributes are distributed into 2 classes (diabetic retinopathy and not), and the 748 instances of blood transfusion dataset with 4 attributes are divided into 2 classes (donating blood and not).

5.2. Performance Factors

The performance of proposed CBO-IE approach is evaluated in terms of intracluster distance, purity index, standard deviation, root mean square error, accuracy, and F-Measure.

5.2.1. Intracluster Distance

Firstly, the distances between data points are evaluated within a cluster. After that, the average of these distances is generated representing an intracluster distance. The optimal clustering is achieved with least intracluster distance. For a cluster, the mean distance is evaluated between a centroid and total data points. This step is continuously performed for every cluster and at last mean value of entire clusters’ intracluster distances is obtained.

Table 2 illustrates that the CBO-IE generates least intracluster distance value for all IoT datasets. The CBO-IE obtains 90% superior results than BO, 92% superior results than ALO, 94% superior results than PSO, 96% superior results than GA, and 98% superior results than K-Means in terms of intracluster distance for entire healthcare IoT datasets. The average ranking of all approaches is evaluated on the basis of minimum to maximum intracluster distance (from 1 to 6).

Figure 2 represents the better rank of proposed CBO-IE with least intracluster distance values against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.2. Standard Deviation

The stiff data clustering in the region of average value is described by a numerical characteristic called standard deviation (D^S). The best clustering is achieved with less standard deviation, which is obtained by the following equation:where is dataset length, V is dataset values (points), and is dataset average value.

Table 3 describes that the CBO-IE obtains less standard deviation value for all IoT datasets. The CBO-IE generates 93% better quality outcomes than BO, 95% better quality outcomes than PSO and ALO, 97% better quality outcomes than GA, and 98% better quality outcomes than K-Means in terms of standard deviation for entire healthcare IoT datasets.

Figure 3 shows the better standard deviation of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.3. Purity Index

The correctness of clustering strategy is known as purity, in which accurate classification is performed over data elements. Hence, entire points of an isolated class can be accurately allocated to an isolated cluster. Purity index (I^P) is generated with the help of purity by equations (14) and (15). Maximum purity is achieved with maximum I^P value nearer to 1.where is purity of z^th cluster. is z^th cluster length. is the number of data elements of class allocated to z^th cluster.

Table 4 explains that the CBO-IE generates maximum purity index value for all IoT datasets. The CBO-IE obtains 5% superior outputs than BO, 7% superior outputs than ALO, 8% superior outputs than PSO, 12% superior outputs than GA, and 15% superior outputs than K-Means in terms of purity index for entire healthcare IoT datasets.

Figure 4 represents the better quality purity index of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.4. F-Measure

Firstly Precision (prsn) and Recall (rel) are evaluated to repossess the information by equations (16) and (17). After that, both are combined to formalize the F-Measure (FM) by utilizing equations (18) and (19).where is w^th class length.

Table 5 describes that the CBO-IE obtains higher F-Measure value for all IoT datasets. The CBO-IE generates 4% better outputs than BO, 7% better outputs than ALO, 8% better outputs than PSO, 13% better outputs than GA, and 16% better outputs than K-Means in terms of F-Measure for entire healthcare IoT datasets.

Figure 5 shows the better F-Measure of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.5. Root Mean Square Error (RMSE)

The RMSE is defined as a divergence between predicted values and experimental (calculated) values. The best clustering is achieved with minimum RMSE values of datasets. It is evaluated by utilizing the following equation:where is j^th element value in dataset and is j^th element predicted value in dataset.

Table 6 explains that the CBO-IE generates least RMSE value for all IoT datasets. The CBO-IE generates 8% better results than BO, 78% better results than ALO, 81% better results than PSO, 87% better results than GA, and 95% better results than K-Means in terms of RMSE for entire healthcare IoT datasets.

Figure 6 shows the better RMSE of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

5.2.6. Accuracy

It evaluates the division of clusters that are accurate (i.e., it evaluates proportion of verdicts that are exact) and describes the portion of clusters in the prevailing group:

Table 7 represents that the CBO-IE obtains higher accuracy value for all IoT datasets. The CBO-IE has 3% superior outputs than BO, 5% superior outputs than ALO, 6% superior outputs than PSO, 9% superior outputs than GA, and 13% superior outputs than K-Means in terms of accuracy for entire healthcare IoT datasets.

Figure 7 represents the better accuracy of proposed CBO-IE against other approaches, K-Means, GA, PSO, ALO, and BO, for all IoT datasets.

The Information Entropy is used with BO approach for clustering to provide accurate and efficient distribution of data points in dataset. The chaos theory is utilized to initialize the population of BO to improve the searching capability of habitat, which helps in cluster member selection process more precisely. These two strategies, Information Entropy and chaos theory, have improved the performance of BO to provide the best selection of cluster heads and their members optimally. So, CBO-IE generates superior results than BO, ALO, PSO, GA, and K-Means clustering approaches.

5.3. Running Time Complexity of the CBO-IE Approach

The number of executions is directly related to time complexity of clustering approaches to run. The complexity is calculated by utilizing few circumstances as follows: P^S is population size, L is number of elements in dataset, L_x represents dimensions of particulars (elements), K is number of cluster centroids, and Mi represents maximum iterations. The cost of every execution is assumed as 1 unit. The Cost of all Executions (CE) is obtained to utilize Algorithm 3 by the following equations:

Supposing that all circumstances are equal in equation (23) in worst case, equation (24) is generated as follows:

The running time complexities of clustering approaches are O(n³) for CBO-IE, O(n³) for BO, O(n³) for ACO, O(n³) for PSO, O(n³) for ALO, and O(n²) for K-Menas in worst case. Hence, all are executable in polynomial time.

5.4. Statistical Examination Measurement

A statistical examination is functioned to obtain the extension of significance dissimilarities in the effectiveness of clustering techniques. Here, a nonparametric Friedman Examination (FE) is utilized to discover dissimilarities amid the group of serial appropriate variables. The entire clustering techniques provide equally effective explaining in the null hypothesis (Y₀).

The Friedman Examination (FE) is formulized bywhere N^DS is number of datasets, is average rank of approach, and A^C is number of clustering approaches.

The FE critical value is 2.04925 taken from F-distribution table [41] with (A^C-1) and (A^C–1)(N^DS–1) freedom degree, which is obtained between (6–1) = 5 and (6-1)(8-1) = 35 for applying 6 clustering approaches (A^C = 6) over 8 datasets (N^DS = 8) with λ = 0.10 (confidence stage). The calculated FE value is higher as compared to the critical value for null hypothesis (Y₀) rejection, if Y₀ is not accepted. The evaluated FE value is 17.14 for initiating 6 clustering approaches (A^C = 6) over 8 datasets (N^DS = 8) with λ = 0.10. Hence, the calculated FE is higher as compared to the critical FE, and then Y₀ is rejected. So, it is to be summarized that entire clustering approaches are not equally effective.

Therefore, a post hoc examination is performed using Holm strategy. The proposed CBE-IE is analyzed statistically against other clustering approaches in this examination. Firstly the z value is obtained from equation (24) and after that probability (p) is generated with the help of z value and normal distribution table [42]. At last, p_j value is analyzed with (Table 8):where

Table 8 illustrates that the value is higher as compared to p_j value; this indicates the hypothesis rejection for entire cases. Hence, the proposed CBO-IE is superior in clustering as compared to the K-Means, GA, PSO, and BO approaches according to the above analysis.

6. Conclusion

Various fields like medicine, education, and industries utilize data mining for their useful applications. The grouping of data is performed on data clustering, which is a specific task in data mining to examine the database efficiently. In this work, a Chaotic Biogeography-Based Optimization approach using Information Entropy (CBO-IE) is proposed to obtain data clusters for healthcare IoT datasets. The Information Entropy is introduced with a BO approach to generate a better distribution of data points in the dataset and chaos theory is utilized to initialize the population of BO approach. The Information Entropy and chaos theory are combined with BO approach to generate optimal cluster heads and cluster members with enhanced convergence speed of BO in huge search area. The MATLAB 2021a tool is used to implement the CBO-IE for eight healthcare IoT datasets and the outcomes describe the better quality efficiency of CBO-IE on the basis of F-Measure, intracluster distance, running time complexity, purity index, statistical analysis, standard deviation, root mean square error, and accuracy as compared to previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches. The Friedman Examination and Holm strategy are introduced to perform statistical analysis of proposed CBO-IE against previous techniques of clustering like K-Means, GA, PSO, ALO, and BO approaches, which represent that the CBO-IE generates 90% accurate results. In future, the proposed technique will be anticipated to estimate and be authorized with huge databases under big data. Additionally, a cross-layer communication will be anticipated to be offered and legalized in IoT structural design in the future.

Data Availability

The data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

A. Haque, L. Khan, and M. Baron, “SAND: semi-supervised adaptive novel class detection and classification over data stream,” in Proceedings of the 13th AAAI Conference on Artificial Intelligence (AAAI-16), pp. 1652–1658, Phoenix, AZ, USA, February 2016.
View at: Google Scholar
N. Mishra, H. K. Soni, S. Sharma, and A. K. Upadhyay, “Development and analysis of artificial neural network models for rainfall prediction by using time-series data,” I. J. Intelligent Systems and Applications, vol. 1, pp. 16–23, 2018.
View at: Publisher Site | Google Scholar
H. N. Dai, R. C. W. Wong, H. Wang, Z. Zheng, and A. V. Vasilakos, “Big data analytics for large scale wireless networks: challenges and opportunities,” ACM, vol. 52, pp. 1–46, 2019.
View at: Publisher Site | Google Scholar
A. H. Sabry, W. Z. W. Hasan, M. N. Mohtar, R. M. K. R. Ahmad, and H. R. Harun, “Planter pressure repetability data analysis for health adult based on EMED system,” Malaysian Journal of Fundamental and Applied Sciences, vol. 14, no. 1, pp. 96–101, 2018.
View at: Google Scholar
A. Mensikova and C. A. Mattmann, “Ensemble sentiment analysis to identify human trafficking in web data,” in Proceedings of the ACM Workshop on Graph Techniques for Adversarial Activity Analytics (GTA), pp. 1–6, ACM, New York, Ny, USA, 2018.
View at: Google Scholar
S. S. Gill and R. Buyya, “Bio-inspired algorithms for big data analytics: a survey, taxonomy and open challenges,” Big Data Analytics for Intelligent Healthcare Management, Elsevier, Amsterdam, Netherlands, 2019.
View at: Publisher Site | Google Scholar
A. Verma, I. Kaur, and N. Arora, “Comparative analysis of informatio extraction techniques for data mining,” Indian Journal of Science and Technology, vol. 9, no. 11, pp. 1–18, 2016.
View at: Publisher Site | Google Scholar
B. R. Manju, A. Joshuva, and V. Sugumaran, “A data mining study for condition monitoring on wind turbine blades using hoeffding tree algorithm through statistical and histogram features,” International Journal of Mechanical Engineering & Technology, vol. 9, no. 1, pp. 1061–1079, 2018.
View at: Google Scholar
I. Aytekin, E. Eyduran, K. Karadas, R. Akshan, and I. Keskin, “Prediction of fattening final live weight from some body measurements and fattening period in young bulls of crossbred and exotic breeds using MARS data mining algorithm,” Journal of Zoology, vol. 50, no. 1, pp. 189–195, 2018.
View at: Publisher Site | Google Scholar
S. Ahmad and R. Varma, “Information extraction from text messages using data mining techniques,” Malaya Journal of Matematik, vol. 5, no. 1, pp. 26–29, 2018.
View at: Publisher Site | Google Scholar
N. E. Oweis, M. M. Fouad, S. R. Oweis, S. S. Owais, and V. Snasel, “A novel mapreduce lift assiciation riule mining algorithm (MRLAR) for big data,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 8, pp. 1–7, 2016.
View at: Publisher Site | Google Scholar
C. W. Tsai, C. F. Lai, M. C. Chiang, and L. T. Yang, “Data mining for internet of things: a survey,” Communications Surveys and Tutorials, IEEE, vol. 16, no. 1, pp. 77–97, 2014.
View at: Publisher Site | Google Scholar
F. Chen, P. Deng, J. Wan, D. Zhang, A. V. Vasilakos, and X. Rong, “Data mining for Internet of things: literature review and challenges,” International Journal of Distributed Sensor Networks, vol. 2015, Article ID 365372, 14 pages, 2015.
View at: Publisher Site | Google Scholar
S. B. Patel, M. N. Patel, and S. M. Shah, “FEA-HUIM: fast and efficient algorithm for high utility item-set mining using novel data structure and pruning strategy,” National Conference on Advanced Research Trends in Information and Technologies (NCARTICT), IJSSET, vol. 4, no. 2, pp. 1–7, 2018.
View at: Google Scholar
S. Bhaskaran, R. Marappan, and B. Santhi, “Design and analysis of a cluster-based intelligent hybrid recommendation system for E-learning applications,” Mathematics, MDPI, vol. 9, no. 107, pp. 1–21, 2021.
View at: Publisher Site | Google Scholar
D. Giordano, M. Mellia, and T. Cerquitelli, “K-MDTSC: K-Multi-Dimensional time-series clustering algorithm,” Electronics, MDPI, vol. 10, no. 1166, pp. 1–21, 2021.
View at: Publisher Site | Google Scholar
R. Mothukuri and B. B. Rao, “Cluster Analysis of cyber crime data using R,” International Journal of Computer Science and Mobile Applications, vol. 6, no. 2, pp. 62–70, 2018.
View at: Google Scholar
A. N. Onaizah, “A novel data stream clustering algorithm in healthcare IoT,” International Journal of Research in Applied Natural and Social Sciences (IJRANSS), vol. 5, no. 6, pp. 51–60, 2017.
View at: Google Scholar
R. Krishnamurthi, A. Kumar, D. Gopinathan, A. Nayyar, and B. Qureshi, “An overview of IoT sensor data processing, fusion, and analysis techniques,” Sensors, MDPI, vol. 20, pp. 1–23, 6076.
View at: Publisher Site | Google Scholar
F. Alam, R. Mehmood, I. Katib, and A. Albeshri, “Analysis of eight data mining algorithms for smarter Internet of things (IoT),” in Proceedings of the International Workshop on Data Mining in IoT Systems (DaMIS), pp. 437–442, Elsevier, London, UK, September 2016.
View at: Publisher Site | Google Scholar
A. M. C. Souza and J. R. A. Amazonas, “An outlier detect algorithm using big data processing and Internet of things architecture,” in Proceedings of the International Workshop on Big Data and Data Mining Challenges on IoT and Pervasive Systems (BigD2M), pp. 1010–1015, Elsevier, London, UK, June 2015.
View at: Publisher Site | Google Scholar
D. Gil, M. Johnsson, H. Mora, and J. Szymanski, “Review of the complexity of managing big data of the Internet of things,” Complexity, vol. 2019, Article ID 4592902, 12 pages, 2019.
View at: Publisher Site | Google Scholar
M. S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, and A. P. Sheth, “Machine Learning for Internet of things data analysis: a survey,” Digital Communications and Networks, vol. 4, pp. 161–175, 2018.
View at: Publisher Site | Google Scholar
M. Ruta, F. Scioscia, G. Loseto, A. Pinto, and E. D. Sciascio, “Machine learning in the Internet of things: a semantic-enhanced approach,” Semantic Web, vol. 10, pp. 1–21, 2017.
View at: Publisher Site | Google Scholar
K. Demertzis, K. Rantos, and G. Drosatos, “A dynamic intelligent policies analysis mechanism for personal data processing in the IoT ecosystem,” Big Data And Cognitive Computing, vol. 4, no. 9, pp. 1–16, 2020.
View at: Publisher Site | Google Scholar
N. Joseph and B. S. Kumar, “Top-K competitor trust mining and customer behavior investigation using data mining technique,” Journal of Network Communications and Emerging Technologies (JNCET), vol. 8, no. 2, pp. 26–30, 2018.
View at: Google Scholar
M. P. Dooshima, E. N. Chidozie, B. J. Ademola, O. O. Sekoni, and I. P. Adebayo, “A predictive model for the risk of mental illness in Nigeria using data mining,” International Journal of Immonology, vol. 6, no. 1, pp. 5–16, 2018.
View at: Publisher Site | Google Scholar
B. A. Tama and S. Lim, “A comparative performance evaluation of classification algorithms for clinical decision support systems,” Mathematics, MDPI, vol. 8, no. 1814, pp. 1–25, 2020.
View at: Publisher Site | Google Scholar
C. C. N. Wang, I. S. Chang, P. C. Y. Sheu, and J. J. P. Tsai, “Application of semantic computing in cancer on secondary data analysis,” in Proceedings of the 2nd International Conference on Robotic Computing, IEEE, pp. 407–413, Laguna Hills, CA, USA, February 2018.
View at: Publisher Site | Google Scholar
K. Ravikumar and A. R. Kannan, “Spatial data mining for prediction of natural events and disaster management based on fuzzy logic using hybrid PSO,” Taga Journal, vol. 14, pp. 858–878, 2018.
View at: Google Scholar
B. M. M. Alom and M. Courtney, ““Educational data mining: a case study perspectives from primary to university education in Australia,” International Journal of Information Technology and Computer Science, vol. 2, pp. 1–9, 2018.
View at: Publisher Site | Google Scholar
S. Celik and O. Yilmaz, “Prediction of body weight of Turkish tazi dogs using data mining techniques: classification and regression tree (CART) and multivariate adaptive regression splines (MARS),” Journal of Zoology, vol. 50, no. 2, pp. 575–583, 2018.
View at: Publisher Site | Google Scholar
W. Ugulino, D. Cardador, K. Vega, E. Velloso, R. Miliditi, and H. Fuks, “Wearable computing: accelerometers data classification of body postures and movements,” in Proceedings of the 21st Brazilian Symposium on Artificial Intelligence, Advances in Artificial Intelligence, Lecture Notes in Computer Science, pp. 52–61, Springer, Stuttgart, Germany, September 2012.
View at: Publisher Site | Google Scholar
M. B. Chandak, “Role of big-data in classification and noval class detection in data streams,” Journal of Big Data, vol. 23, no. 5, Springer, 2016.
View at: Publisher Site | Google Scholar
M. Gupta, K. K. Gupta, and P. K. Shukla, “Session key based fast, secure and lightweight image encryption algorithm,” Multimedia Tools and Applications, vol. 80, pp. 10391–10416, 2021.
View at: Publisher Site | Google Scholar
H. S. Pannu, D. Singh, and A. K. Malhi, “Multi-objective particle swarm optimization-based adaptive neuro-fuzzy inference system for benzene monitoring,” Neural Computing & Applications, vol. 31, pp. 2195–2205, 2019.
View at: Publisher Site | Google Scholar
D. Singh, J. Singh, and A. Chhabra, “High availability of clouds: failover strategies for cloud computing using integrated check pointing algorithms,” in Proceedings of the 2012 International Conference on Communication Systems and Network Technologies, pp. 698–703, Rajkot, India, May 2012.
View at: Publisher Site | Google Scholar
D. Singh and V. Kumar, “A comprehensive review of computational dehazing techniques,” Archives of Computational Methods in Engineering, vol. 26, pp. 1395–1413, 2019.
View at: Publisher Site | Google Scholar
R. Gupta, “Performance analysis of anti-phishing tools and study of classification data mining algorithms for a novel anti-phishing system,” International Journal of Computer Network and Information Security, vol. 12, pp. 70–77, 2015.
View at: Publisher Site | Google Scholar
R. Bhatt, P. Maheshwary, P. Shukla, P. Shukla, M. Shrivastava, and S. Changlani, “Implementation of fruit fly optimization algorithm (FFOA) to escalate the attacking efficiency of node capture attack in wireless sensor networks (WSN),” Computer Communications, vol. 149, pp. 134–145, 2020, ISSN 0140-3664.
View at: Publisher Site | Google Scholar
F Distribution Table, 2018, http://www.socr.ucla.edu/applets.dir/f_table.html.
Normal Distribution Table, http://math.arizona.edu/∼rsims/ma464/standardnormaltable.pdf.

Copyright

Copyright © 2021 Manish Kumar Ahirwar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

840

Downloads

689

Citations