Abstract
The K Nearest Neighbor (kNN) method has widely been used in the applications of data mining and machine learning due to its simple implementation and distinguished performance. However, setting all test data with the same k value in the previous kNN methods has been proven to make these methods impractical in real applications. This article proposes to learn a correlation matrix to reconstruct test data points by training data to assign different k values to different test data points, referred to as the Correlation Matrix kNN (CM-kNN for short) classification. Specifically, the least-squares loss function is employed to minimize the reconstruction error to reconstruct each test data point by all training data points. Then, a graph Laplacian regularizer is advocated to preserve the local structure of the data in the reconstruction process. Moreover, an ℓ1-norm regularizer and an ℓ2, 1-norm regularizer are applied to learn different k values for different test data and to result in low sparsity to remove the redundant/noisy feature from the reconstruction process, respectively. Besides for classification tasks, the kNN methods (including our proposed CM-kNN method) are further utilized to regression and missing data imputation. We conducted sets of experiments for illustrating the efficiency, and experimental results showed that the proposed method was more accurate and efficient than existing kNN methods in data-mining applications, such as classification, regression, and missing data imputation.
- Enrico Blanzieri and Farid Melgani. 2008. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Trans. Geosci. Remote Sens. 46, 6 (2008), 1804--1811. Google ScholarCross Ref
- Jiahua Chen and Jun Shao. 2001. Jackknife variance estimation for nearest-neighbor imputation. J. Am. Statist. Assoc. 96, 453 (2001), 260--269. Google ScholarCross Ref
- Xiai Chen, Zhi Han, Yao Wang, Yandong Tang, and Haibin Yu. 2016. Nonconvex plus quadratic penalized low-rank and sparse decomposition for noisy image alignment. Sci. Chin. Infor. Sci. 5 (2016), 1--13. Google ScholarCross Ref
- Debo Cheng, Shichao Zhang, Xingyi Liu, Ke Sun, and Ming Zong. 2015. Feature selection by combining subspace learning with sparse representation. Multimedia Syst. (2015), 1--7. Google ScholarDigital Library
- Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C. Sinan Güntürk. 2010. Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63, 1 (2010), 1--38. Google Scholar
- Yongsheng Dong, Dacheng Tao, and Xuelong Li. 2015b. Nonnegative multiresolution representation-based texture image classification. ACM Trans. Intell. Syst. Technol. 7, 1 (2015), 4.Google ScholarDigital Library
- Zhen Dong, Wei Liang, Yuwei Wu, Mingtao Pei, and Yunde Jia. 2015a. Nonnegative correlation coding for image classification. Sci. Chin. Infor. Sci. 59, 1 (2015), 1--14. Google ScholarCross Ref
- Jianping Fan, Jinye Peng, Ling Gao, and Ning Zhou. 2015. Hierarchical learning of tree classifiers for large-scale plant species identification. IEEE Trans. Image Process. 24, 11 (2015), 4172--84. Google ScholarCross Ref
- Pedro J. García-Laencina, José-Luis Sancho-Gómez, Aníbal R. Figueiras-Vidal, and Michel Verleysen. 2009. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72, 7 (2009), 1483--1493. Google ScholarDigital Library
- Mohammad Ghasemi Hamed, Mathieu Serrurier, and Nicolas Durand. 2012. Possibilistic knn regression using tolerance intervals. In Advances in Computational Intelligence. 410--419. Google ScholarCross Ref
- Xiaofei He, Chiyuan Zhang, Lijun Zhang, and Xuelong Li. 2016. A-optimal projection for image representation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 5 (2016), 1009--1015. Google ScholarDigital Library
- Boyu Li, Yun Wen Chen, and Yan Qiu Chen. 2008. The nearest neighbor algorithm of local probability centers. IEEE Trans. Syst. Man Cybernet. B 38, 1 (2008), 141--154. Google ScholarDigital Library
- Xuelong Li, Qun Guo, and Xiaoqiang Lu. 2016. Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Process. 25, 7 (2016), 3329--3342. Google ScholarCross Ref
- Xuelong Li, Lichao Mou, and Xiaoqiang Lu. 2015. Scene parsing from an MAP perspective. IEEE Trans. Cybernet. 45, 9 (2015), 1876--1886. Google ScholarCross Ref
- Xuelong Li and Yanwei Pang. 2009. Deterministic column-based matrix decomposition. IEEE Trans. Knowl. Data Eng. 22, 1 (2009), 145--149. Google ScholarDigital Library
- Xuelong Li, Zhigang Wang, and Xiaoqiang Lu. 2016. Surveillance video synopsis via scaling down objects. IEEE Trans. Image Process. 25, 2 (2016), 740--755. Google ScholarCross Ref
- Fan Liu, Jinhui Tang, Yan Song, Liyan Zhang, and Zhenmin Tang. 2015. Local structure-based sparse representation for face recognition. ACM Trans. Intell. Syst. Technol. 7, 1 (2015), 2. Google ScholarDigital Library
- Chen Luo, Jia Zeng, Mingxuan Yuan, Wenyuan Dai, and Qiang Yang. 2016. Telco user activity level prediction with massive mobile broadband data. ACM Trans. Intell. Syst. Technol. 7, 4 (2016), 63. Google ScholarDigital Library
- Minnan Luo, Fuchun Sun, and Huaping Liu. 2014. Joint block structure sparse representation for multi-input--multi-output (MIMO) T--S fuzzy system identification. IEEE Trans. Fuzzy Syst. 22, 6 (2014), 1387--1400. Google ScholarCross Ref
- Tristan Mary-Huard and Stephane Robin. 2009. Tailored aggregation for classification. IEEE Trans. Pattern Anal. Mach. Intell. 31, 11 (2009), 2098--2105. Google ScholarDigital Library
- Phayung Meesad and Kairung Hengpraprohm. 2008. Combination of knn-based feature selection and knnbased missing-value imputation of microarray data. In ICICIC. 341--341. Google ScholarDigital Library
- Amir Navot, Lavi Shpigelman, Naftali Tishby, and Eilon Vaadia. 2006. Nearest neighbor based feature selection for regression and its application to neural activity. (2006).Google Scholar
- Karl S. Ni and Truong Q. Nguyen. 2009. An adaptable-nearest neighbors algorithm for MMSE image interpolation. IEEE Trans. Image Process. 18, 9 (2009), 1976--1987. Google ScholarDigital Library
- X. Niyogi. 2004. Locality preserving projections. In NIPS, Vol. 16. 153.Google Scholar
- Yongsong Qin, Shichao Zhang, Xiaofeng Zhu, Jilian Zhang, and Chengqi Zhang. 2007. Semi-parametric optimization for missing data imputation. Appl. Intell. 27, 1 (2007), 79--88. Google ScholarDigital Library
- F. Sahigara, D. Ballabio, R. Todeschini, and V. Consonni. 2014. Assessing the validity of QSARs for ready biodegradability of chemicals: An applicability domain perspective. Curr. Comput.- Aid. Drug Des. 10, 10 (2014), 137--147. Google ScholarCross Ref
- Ziqiang Shi, Jiqing Han, and Tieran Zheng. 2013. Audio classification with low-rank matrix representation features. ACM Trans. Intell. Syst. Technol. 5, 1 (2013), 15. Google ScholarDigital Library
- Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, and C. Lee Giles. 2007. Iknn: Informative k-nearest neighbor pattern classification. In PKDD. 248--264.Google Scholar
- Jimeng Sun and Chandan K. Reddy. 2013. Big data analytics for healthcare. In KDD. 1525--1525. Google ScholarDigital Library
- Yu Sun, Jianzhong Qi, Yu Zheng, Zhang, and Rui. 2015. K-nearest neighbor temporal aggregate queries. Inproceedings (2015).Google Scholar
- Lu An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, and Jiawei Han. 2011. Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations. Springer, Berlin, 223--241 pages. Google ScholarCross Ref
- Pascal Vincent and Yoshua Bengio. 2001. K-local hyperplane and convex distance nearest neighbor algorithms. In NIPS. 985--992.Google Scholar
- Hui Wang. 2006. Nearest neighbors by neighborhood counting. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6 (2006), 942--953. Google ScholarDigital Library
- Yilun Wang, Yu Zheng, and Yexiang Xue. 2014. Travel time estimation of a path using sparse trajectories. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 25--34. Google ScholarDigital Library
- Kilian Q. Weinberger and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 1 (2006), 207--244.Google Scholar
- Xindong Wu, Huanhuan Chen, Gongqing Wu, Jun Liu, Qinghua Zheng, Xiaofeng He, Aoying Zhou, Zhong-Qiu Zhao, Bifang Wei, Ming Gao, and others. 2015. Knowledge engineering with big data. IEEE Intell. Syst. 30, 5 (2015), 46--55. Google ScholarCross Ref
- Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, S. Yu Philip, and others. 2008. Top 10 algorithms in data mining. Knowl. Infor. Syst. 14, 1 (2008), 1--37. Google ScholarDigital Library
- Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. 2014. Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26, 1 (2014), 97--107. Google ScholarDigital Library
- Chunlei Yang, Jialie Shen, Jinye Peng, and Jianping Fan. 2012. Image collection summarization via dictionary learning for sparse representation. Pattern Recogn. 46, 3 (2012), 948--961. Google ScholarDigital Library
- Zizhen Yao and Walter L. Ruzzo. 2006. A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinfor. 7, Suppl. 1 (2006), S11. Google ScholarCross Ref
- Renzhen Ye and Xuelong Li. 2016. Compact structure hashing via sparse and similarity preserving embedding. IEEE Trans. Cybernet. 46, 3 (2016), 718--729. Google ScholarCross Ref
- Chengqi Zhang, Yongsong Qin, Xiaofeng Zhu, Jilian Zhang, and Shichao Zhang. 2006. Clustering-based missing value imputation for data preprocessing. In 2006 4th IEEE International Conference on Industrial Informatics. 1081--1086. Google ScholarCross Ref
- Chengqi Zhang, Xiaofeng Zhu, Jilian Zhang, Yongsong Qin, and Shichao Zhang. 2007. GBKII: An imputation method for missing values. In PAKDD. 1080--1087.Google Scholar
- Shizhao Zhang. 2010. KNN-CF approach: Incorporating certainty factor to kNN classification. IEEE Intell. Infor. Bull. 11, 1 (2010), 24--33.Google Scholar
- Shichao Zhang. 2011. Shell-neighbor method and its application in missing data imputation. Appl. Intell. 35, 1 (2011), 123--133. Google ScholarDigital Library
- Shichao Zhang, Debo Cheng, Ming Zong, and Lianli Gao. 2016. Self-representation nearest neighbor search for classification. Neurocomputing 195 (2016), 137--142. Google ScholarDigital Library
- Shichao Zhang, Ming Zong, Ke Sun, Yue Liu, and Debo Cheng. 2014. Efficient kNN algorithm based on graph sparse reconstruction. In ADMA. 356--369. Google ScholarCross Ref
- Yuejie Zhang, Lei Cen, Cheng Jin, Xiangyang Xue, and Jianping Fan. 2011. Learning inter-related statistical query translation models for English-Chinese bi-directional CLIR. In International Joint Conference on Artificial Intelligence. 1915--1920.Google Scholar
- Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013a. Sparse hashing for fast multimedia search. ACM Trans. Infor. Syst. 31, 2 (2013), 9.Google ScholarDigital Library
- Xiaofeng Zhu, Zi Huang, Yang Yang, Heng Tao Shen, Changsheng Xu, and Jiebo Luo. 2013b. Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn. 46, 1 (2013), 215--229. Google ScholarDigital Library
- Xiaofeng Zhu, Xuelong Li, and Shichao Zhang. 2016a. Block-row sparse multiview multilabel learning for image classification. IEEE Trans. Cybernet. 46, 2 (2016), 450--461. Google ScholarCross Ref
- Xiaofeng Zhu, Xuelong Li, Shichao Zhang, Chunhua Ju, and Xindong Wu. 2016b. Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans. Neur. Netw. Learn. Syst. (2016).Google Scholar
- Xiaofeng Zhu, Heung-Il Suk, and Dinggang Shen. 2014. Matrix-similarity based loss function and feature selection for alzheimer’s disease diagnosis. In CVPR. 3089--3096. Google ScholarDigital Library
- Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang, and Zhuoming Xu. 2011. Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23, 1 (2011), 110--121. Google ScholarDigital Library
- Xiaofeng Zhu, Shichao Zhang, Jilian Zhang, and Chengqi Zhang. 2007. Cost-sensitive imputing missing values with ordering. In AAAI. 1922--1923.Google Scholar
Index Terms
- Learning k for kNN Classification
Recommendations
Folded-concave penalization approaches to tensor completion
The existing studies involving matrix or tensor completion problems are commonly under the nuclear norm penalization framework due to the computational efficiency of the resulting convex optimization problem. Folded-concave penalization methods have ...
Sparse classification using Group Matching Pursuit
AbstractGroup structure exists in supervised learning problems inherently. For example, in the training data of a classification problem, samples from the same class will have similar representations. These samples from the same class will ...
A case-based reasoning driven ensemble learning paradigm for financial distress prediction with missing data
AbstractFinancial distress prediction is often accompanied by missing sample data. For this purpose, a novel case-based reasoning (CBR) driven ensemble learning paradigm is proposed for financial distress prediction with missing data. In the ...
Graphical abstractDisplay Omitted
Highlights- A case-based reasoning (CBR) driven ensemble learning paradigm is proposed.
- CBR-...
Comments