Boosted K-nearest neighbor classifiers based on fuzzy granules☆
Introduction
As early as the 1960s, Zadeh, a famous American cybernetic expert, proposed fuzzy set theory and in 1979 first presented the problem of fuzzy information granulation [1]. He believed that human cognition can be summarized as three main characteristics of granulation, organization and causality [2], [3], [4], [5]. In 1985, Hobbs [6] presented the concept of granularity. Later, Yager and Filev [7] further pointed out that “people have formed a granular view of the world”. From this point of view, human observation, measurement, conceptualization and reasoning are carried out in a granular sense. The granular computing was first proposed by T.Y. Lin [8], [9]. Information granules are not only reflective of the nature of the data but can efficiently capture some auxiliary domain knowledge conveyed by the user and in this way reflect the human-centricity aspects of the investigations and enhances the actionability aspects of the results [10].
The concept of information granularity is ubiquitous, and granular computing has also promoted the development of many concepts, which are as the follows: graphs [11], [12], [13], information tables [14], knowledge representation [15], association discovery and data mining [16], clustering [17] and rule clustering [18], classification [19] and so on. Granule computing is also widely applied in forecasting time series [20], prediction tasks [21], concept learning [22], perception [23], optimization [24], credit scoring [25] etc. Many scholars have conducted extensive and in-depth research from various angles. Miao discussed the structure of granular computing from the perspective of set theory [26]. Wang analyzed the uncertainty measure in granular computing and its application in big data [27], [28]. Yao proposed neighborhood system and neighborhood granular computing [29], [30]. Hu analyzed neighborhood reduction and classification [31], [32], [33]. Chen studied feature dimension reduction and optimization from the perspective of group intelligence [34], [35] and so on. These views suggest that granulation, as one of the important features of human cognition, plays an important role in modeling complex data.
KNN algorithm was first proposed by Hart in 1968 [36]. KNN is a non-parametric statistical method for classification and regression in the field of pattern recognition [37]. KNN uses a vector space model to classify cases with the same category, with high similarity to each other, and can calculate the similarity with known category cases to evaluate the possible classification of unknown category cases. It is a simple and effective non-parametric classification method. Its advantages include that it is very suitable for incremental learning without knowing the sample distribution in advance. Explicit rules are not required, and classification accuracy is high, so it is widely used in many fields such as clustering, big data, and multi-label learning [38], [39], [40], [41], [42], [43]. The classical KNN algorithm has high time and space complexity. The same weight of K neighbor samples affects classification accuracy. It is noise sensitivity and has low classification accuracy to unbalanced samples, and is also difficult in determining K value. Many scholars have proposed improvements from the above aspects and improved the its performance [44], [45], [46], [47], [48], [49], [50], [51], [52].
It is difficult to construct a single classifier with high accuracy. However, it is possible to construct a strong classifier with high accuracy via integrating some weak classifiers. The weak learning theorem [53] theoretically supports this possibility. It is the main content of integrated learning research how to construct weak classifiers and how to integrate them. At present, the more successful integrated learning algorithm is AdaBoost [37], first proposed by Freund and Schapire in 1995. In 1999, Schapire et al. extended AdaBoost, which deals with binary judgments, to continuous AdaBoost with continuous confidence output. Thus, it can more accurately describe classification boundaries and have better classification effects [54]. AdaBoost algorithm is simple and widely applied in many fields, such as face recognition, water quality detection, protein prediction, pedestrian detection, EEG signal analysis, urban rail transit, etc. [55], [56], [57], [58], [59], [60], [61], [62], [63], [64]. At the same time, it also attracted a large number of scholars to research and improve its generalization ability [65], [66], [67], [68].
In this paper, we define fuzzy granule vector based on fuzzy information granulation from a new perspective, and design two new classification models: FGKNN and BFGKNN. On the basis of fuzzy granulation of various attributes of a classification system, we define fuzzy granule vector distance, and propose the K-nearest fuzzy granule vector concept, and transform the classification problem into the K-nearest fuzzy granule vector search problem. Moreover, we present FGKNN classification model. Based on FGKNN, we furthermore design BFGKNN model. We employ 10-fold cross-validation to test the performance of the two algorithms on UCI data sets. Theoretical analysis and experimental results show that FGKNN and BFGKNN can achieve better performance under appropriate parameters.
Section snippets
Fuzzy Information granulation
In many cases, the granularity of human reasoning and conceptual construction is ambiguous, not precise. Fuzzy information granulation is usually obtained through a fuzzy binary relationship, and its fuzzy granulation is carried out in the entire fuzzy granule space. A series of definitions are given as the follows.
Definition 1 Let be a classification system. Here represents a sample set. is an attribute set. And expresses label (its values can be discrete or
Experimental analysis
In this paper, we uses 5 data sets in UCI as the data source for experimental testing. Based on them, we modify 1% of data and build another 5 data sets with noise (see Table 4). Since the value ranges of the data sets in the table are different, the data set needs to be normalized. We employ the maximum and minimum method (see Eq. (17)) to ensure that all data can be converted to the range . Samples are fuzzy granulated on each atomic attribute to form a fuzzy granule vectors. We adopt
Conclusion
The classic classifier is for numerical calculations and does not involve set operations. From the view point on the fuzzy granulation of samples, we propose two classified algorithms of fuzzy set forms, FGKNN and BFGKNN. First, the fuzzy granulation method is introduced, and the fuzzy granule vector and rules are constructed in the classification system, and the similarity and operation rules of fuzzy granule vectors are defined. Next, the concept of K-nearest fuzzy granule vector is defined,
CRediT authorship contribution statement
Wei Li: Conceptualization, Methodology, Software, Validation, Formal analysis, Data curation, Writing - original draft, Project administration, Funding acquisition, Supervision, Writing - review & editing, Investigation, Visualization, Resources. Yumin Chen: Visualization, Writing - review & editing. Yuping Song: Resources, Writing - review & editing.
Acknowledgments
This work was supported by Science and Technology Planning Guidance Project of Xiamen, China (No. 3502Z20179038), National Science Foundation of Fujian Province, China (No. 2015J05015) and National Natural Science Foundation of China (No. 61573297).
Wei Li is an associate professor, master supervisor with the School of Computer and Information Engineering at Xiamen University of Technology, Xiamen, China. He is also a member of China Computer Federation (CCF). His research interests include Artificial Intelligence, Computer Graphics, Machine Learning and Granular Computing. He received the Ph.D. degree in Basic Theory of Artificial Intelligence, Xiamen University, China in 2013. He was also a visiting scholar with department of computer
References (68)
Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic
Fuzzy Sets and Systems
(1997)- et al.
An application of fuzzy hypergraphs and hypergraphs in granular computing
Inform. Sci.
(2018) - et al.
Simple graphs in granular computing
Inform. Sci.
(2016) - et al.
Granular computing on information tables: Families of subsets and operators
Inform. Sci.
(2018) - et al.
Knowledge pairing systems in granular computing
Knowl.-Based Syst.
(2017) - et al.
A rapid fuzzy rule clustering method based on granular computing
Appl. Soft Comput.
(2014) Fast multi-class recognition of piecewise regular objects based on sequential three-way decisions and granular computing
Knowl.-Based Syst.
(2016)- et al.
A hybrid fuzzy time series forecasting model based on granular computing and bio-inspired optimization approaches
J. Comput. Sci.
(2018) - et al.
Construction of prediction intervals for gas flow systems in steel industry based on granular computing
Control Eng. Pract.
(2018) - et al.
Concept learning via granular computing: A cognitive viewpoint
Inform. Sci.
(2015)
Perception granular computing in visual haze-free task
Expert Syst. Appl.
Efficient topology optimization using GPU computing with multilevel granularity
Adv. Eng. Softw.
A granular computing-based approach to credit scoring modeling
Neurocomputing
Monotonic uncertainty measures for attribute reduction in probabilistic rough set model
Internat. J. Approx. Reason.
Neighborhood classifiers
Expert Syst. Appl.
Neighborhood rough set based heterogeneous feature subset selection
Inform. Sci.
Neighborhood classifiers
Inform. Sci.
A rough set approach to feature selection based on ant colony optimization
Pattern Recognit. Lett.
Finding rough set reducts with fish swarm algorithm
Knowl.-Based Syst.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
An effective refinement strategy for KNN text classifier
Expert Syst. Appl.
Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
Pattern Recognit.
KNN-IS: An iterative spark-based design of the k-nearest neighbors classifier for big data
Knowl.-Based Syst.
ML-KNN: A lazy learning approach to multi-label learning
Pattern Recognit.
Study on density peaks clustering based on k-nearest neighbors and principa component analysis
Knowl.-Based Syst.
Shared-nearest-neighbor-based clustering by fast search and find of density peaks
Inform. Sci.
Granger causality driven AHP for feature weighted KNN
Pattern Recognit.
Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive k-nearest neighborhood technique
Expert Syst. Appl.
Automated classification of lung bronchovascular anatomy in CT using AdaBoost
Med. Image Anal.
Shedding light on the asymmetric learning capability of AdaBoost
Pattern Recognit. Lett.
Predicting toxic action mechanisms of phenols using AdaBoost learner
Chemometr. Intell. Lab. Syst.
Modeling of solubility in MEA, DEA, TEA, and MDEA aqueous solutions using AdaBoost-decision tree and artificial neural network
International Journal of Greenhouse Gas Control
A self-constructing cascade classifier with AdaBoost and SVM for pedestrian detection
Eng. Appl. Artif. Intell.
Fuzzy sets and information granularity
Cited by (40)
Boosted stochastic fuzzy granular hypersurface classifier
2024, Knowledge-Based SystemsGene selection of microarray data using Heatmap Analysis and Graph Neural Network
2023, Applied Soft ComputingWABL method as a universal defuzzifier in the fuzzy gradient boosting regression model
2023, Expert Systems with ApplicationsCitation Excerpt :The advantages of boosting methods when training fuzzy classifiers are that the size of the rule base is very small and learning is very fast. Two algorithms are proposed for classification, such as fuzzy granule k-Nearest Neighbor (FGKNN) and boosted fuzzy granule k-Nearest Neighbor (BFGKNN) based on fuzzy granulation, KNN (k-Nearest Neighbourd) and Adaboost information in the study (Li, Chen et al., 2020). Using granular computation, the problem solving process is normalized as a structured and hierarchical process.
Fuzzy granular deep convolutional network with residual structures
2022, Knowledge-Based SystemsCitation Excerpt :In 2020, modular granular neural networks [19] based on particle swarm optimization variants and fuzzy dynamic parameter adaptation were proposed by Daniela Sanchez et al. and applied to human recognition tasks. Li [20] integrated the boosting algorithm with the granular KNN algorithm to further improve the performance of the KNN algorithm. Granular computing is an arithmetic model defined from the perspective of human cognition, which is highly similar to human logic, cognition, and memory, and has been successfully applied in several fields [21–27].
Reinforced fuzzy clustering-based rule model constructed with the aid of exponentially weighted ℓ<inf>2</inf> regularization strategy and augmented random vector functional link network
2022, Fuzzy Sets and SystemsCitation Excerpt :A fuzzy rule-based model exploits rules as a means of knowledge representation to formalize the knowledge existed in the model. Moreover, due to its modular architecture, well-developed design methodologies and practices, as well as its advantages in interpretability, it has been used in a wide spectrum of realms such as fuzzy control, pattern analysis, fuzzy decision, time series prediction, robotics, etc. [2–6]. There are two well-known fuzzy rule-based models in addressing regression problems.
Design Gaussian information granule based on the principle of justifiable granularity: A multi-dimensional perspective
2022, Expert Systems with Applications
Wei Li is an associate professor, master supervisor with the School of Computer and Information Engineering at Xiamen University of Technology, Xiamen, China. He is also a member of China Computer Federation (CCF). His research interests include Artificial Intelligence, Computer Graphics, Machine Learning and Granular Computing. He received the Ph.D. degree in Basic Theory of Artificial Intelligence, Xiamen University, China in 2013. He was also a visiting scholar with department of computer science at University of Massachusetts Boston, U.S.A, from June 2018 to June 2019. Contact him at [email protected]
Yumin Chen is a profess or of Xiamen University of Technology and a tutor of Ph.D. students of Fuzhou University. He received Ph.D. from Tongji University, China, in 2010. He was a post-doctor of Electronic Science and Technology University of China from 2014 to 2017. He is a committee member of Rough Sets and Soft Computing of China, member of CCF, member of CAA and member of CAAI. His research interests includes artificial intelligence, machine learning, pattern recognition and rough sets.
Yuping Song is an associate professor in the School of Mathematical Sciences at Xiamen University. Her research interests include discrete differential geometry, computer graphics, artificial intelligence. She has a Ph.D. in Pure Mathematics from Peking University. Contact her at [email protected]
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105606.