skip to main content
research-article
Open Access

Learning k for kNN Classification

Authors Info & Claims
Published:12 January 2017Publication History
Skip Abstract Section

Abstract

The K Nearest Neighbor (kNN) method has widely been used in the applications of data mining and machine learning due to its simple implementation and distinguished performance. However, setting all test data with the same k value in the previous kNN methods has been proven to make these methods impractical in real applications. This article proposes to learn a correlation matrix to reconstruct test data points by training data to assign different k values to different test data points, referred to as the Correlation Matrix kNN (CM-kNN for short) classification. Specifically, the least-squares loss function is employed to minimize the reconstruction error to reconstruct each test data point by all training data points. Then, a graph Laplacian regularizer is advocated to preserve the local structure of the data in the reconstruction process. Moreover, an ℓ1-norm regularizer and an ℓ2, 1-norm regularizer are applied to learn different k values for different test data and to result in low sparsity to remove the redundant/noisy feature from the reconstruction process, respectively. Besides for classification tasks, the kNN methods (including our proposed CM-kNN method) are further utilized to regression and missing data imputation. We conducted sets of experiments for illustrating the efficiency, and experimental results showed that the proposed method was more accurate and efficient than existing kNN methods in data-mining applications, such as classification, regression, and missing data imputation.

References

  1. Enrico Blanzieri and Farid Melgani. 2008. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Trans. Geosci. Remote Sens. 46, 6 (2008), 1804--1811. Google ScholarGoogle ScholarCross RefCross Ref
  2. Jiahua Chen and Jun Shao. 2001. Jackknife variance estimation for nearest-neighbor imputation. J. Am. Statist. Assoc. 96, 453 (2001), 260--269. Google ScholarGoogle ScholarCross RefCross Ref
  3. Xiai Chen, Zhi Han, Yao Wang, Yandong Tang, and Haibin Yu. 2016. Nonconvex plus quadratic penalized low-rank and sparse decomposition for noisy image alignment. Sci. Chin. Infor. Sci. 5 (2016), 1--13. Google ScholarGoogle ScholarCross RefCross Ref
  4. Debo Cheng, Shichao Zhang, Xingyi Liu, Ke Sun, and Ming Zong. 2015. Feature selection by combining subspace learning with sparse representation. Multimedia Syst. (2015), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C. Sinan Güntürk. 2010. Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63, 1 (2010), 1--38. Google ScholarGoogle Scholar
  6. Yongsheng Dong, Dacheng Tao, and Xuelong Li. 2015b. Nonnegative multiresolution representation-based texture image classification. ACM Trans. Intell. Syst. Technol. 7, 1 (2015), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhen Dong, Wei Liang, Yuwei Wu, Mingtao Pei, and Yunde Jia. 2015a. Nonnegative correlation coding for image classification. Sci. Chin. Infor. Sci. 59, 1 (2015), 1--14. Google ScholarGoogle ScholarCross RefCross Ref
  8. Jianping Fan, Jinye Peng, Ling Gao, and Ning Zhou. 2015. Hierarchical learning of tree classifiers for large-scale plant species identification. IEEE Trans. Image Process. 24, 11 (2015), 4172--84. Google ScholarGoogle ScholarCross RefCross Ref
  9. Pedro J. García-Laencina, José-Luis Sancho-Gómez, Aníbal R. Figueiras-Vidal, and Michel Verleysen. 2009. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72, 7 (2009), 1483--1493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mohammad Ghasemi Hamed, Mathieu Serrurier, and Nicolas Durand. 2012. Possibilistic knn regression using tolerance intervals. In Advances in Computational Intelligence. 410--419. Google ScholarGoogle ScholarCross RefCross Ref
  11. Xiaofei He, Chiyuan Zhang, Lijun Zhang, and Xuelong Li. 2016. A-optimal projection for image representation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 5 (2016), 1009--1015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Boyu Li, Yun Wen Chen, and Yan Qiu Chen. 2008. The nearest neighbor algorithm of local probability centers. IEEE Trans. Syst. Man Cybernet. B 38, 1 (2008), 141--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xuelong Li, Qun Guo, and Xiaoqiang Lu. 2016. Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Process. 25, 7 (2016), 3329--3342. Google ScholarGoogle ScholarCross RefCross Ref
  14. Xuelong Li, Lichao Mou, and Xiaoqiang Lu. 2015. Scene parsing from an MAP perspective. IEEE Trans. Cybernet. 45, 9 (2015), 1876--1886. Google ScholarGoogle ScholarCross RefCross Ref
  15. Xuelong Li and Yanwei Pang. 2009. Deterministic column-based matrix decomposition. IEEE Trans. Knowl. Data Eng. 22, 1 (2009), 145--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xuelong Li, Zhigang Wang, and Xiaoqiang Lu. 2016. Surveillance video synopsis via scaling down objects. IEEE Trans. Image Process. 25, 2 (2016), 740--755. Google ScholarGoogle ScholarCross RefCross Ref
  17. Fan Liu, Jinhui Tang, Yan Song, Liyan Zhang, and Zhenmin Tang. 2015. Local structure-based sparse representation for face recognition. ACM Trans. Intell. Syst. Technol. 7, 1 (2015), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chen Luo, Jia Zeng, Mingxuan Yuan, Wenyuan Dai, and Qiang Yang. 2016. Telco user activity level prediction with massive mobile broadband data. ACM Trans. Intell. Syst. Technol. 7, 4 (2016), 63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Minnan Luo, Fuchun Sun, and Huaping Liu. 2014. Joint block structure sparse representation for multi-input--multi-output (MIMO) T--S fuzzy system identification. IEEE Trans. Fuzzy Syst. 22, 6 (2014), 1387--1400. Google ScholarGoogle ScholarCross RefCross Ref
  20. Tristan Mary-Huard and Stephane Robin. 2009. Tailored aggregation for classification. IEEE Trans. Pattern Anal. Mach. Intell. 31, 11 (2009), 2098--2105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Phayung Meesad and Kairung Hengpraprohm. 2008. Combination of knn-based feature selection and knnbased missing-value imputation of microarray data. In ICICIC. 341--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Amir Navot, Lavi Shpigelman, Naftali Tishby, and Eilon Vaadia. 2006. Nearest neighbor based feature selection for regression and its application to neural activity. (2006).Google ScholarGoogle Scholar
  23. Karl S. Ni and Truong Q. Nguyen. 2009. An adaptable-nearest neighbors algorithm for MMSE image interpolation. IEEE Trans. Image Process. 18, 9 (2009), 1976--1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Niyogi. 2004. Locality preserving projections. In NIPS, Vol. 16. 153.Google ScholarGoogle Scholar
  25. Yongsong Qin, Shichao Zhang, Xiaofeng Zhu, Jilian Zhang, and Chengqi Zhang. 2007. Semi-parametric optimization for missing data imputation. Appl. Intell. 27, 1 (2007), 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Sahigara, D. Ballabio, R. Todeschini, and V. Consonni. 2014. Assessing the validity of QSARs for ready biodegradability of chemicals: An applicability domain perspective. Curr. Comput.- Aid. Drug Des. 10, 10 (2014), 137--147. Google ScholarGoogle ScholarCross RefCross Ref
  27. Ziqiang Shi, Jiqing Han, and Tieran Zheng. 2013. Audio classification with low-rank matrix representation features. ACM Trans. Intell. Syst. Technol. 5, 1 (2013), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, and C. Lee Giles. 2007. Iknn: Informative k-nearest neighbor pattern classification. In PKDD. 248--264.Google ScholarGoogle Scholar
  29. Jimeng Sun and Chandan K. Reddy. 2013. Big data analytics for healthcare. In KDD. 1525--1525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yu Sun, Jianzhong Qi, Yu Zheng, Zhang, and Rui. 2015. K-nearest neighbor temporal aggregate queries. Inproceedings (2015).Google ScholarGoogle Scholar
  31. Lu An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, and Jiawei Han. 2011. Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations. Springer, Berlin, 223--241 pages. Google ScholarGoogle ScholarCross RefCross Ref
  32. Pascal Vincent and Yoshua Bengio. 2001. K-local hyperplane and convex distance nearest neighbor algorithms. In NIPS. 985--992.Google ScholarGoogle Scholar
  33. Hui Wang. 2006. Nearest neighbors by neighborhood counting. IEEE Trans. Pattern Anal. Mach. Intell. 28, 6 (2006), 942--953. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yilun Wang, Yu Zheng, and Yexiang Xue. 2014. Travel time estimation of a path using sparse trajectories. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kilian Q. Weinberger and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 1 (2006), 207--244.Google ScholarGoogle Scholar
  36. Xindong Wu, Huanhuan Chen, Gongqing Wu, Jun Liu, Qinghua Zheng, Xiaofeng He, Aoying Zhou, Zhong-Qiu Zhao, Bifang Wei, Ming Gao, and others. 2015. Knowledge engineering with big data. IEEE Intell. Syst. 30, 5 (2015), 46--55. Google ScholarGoogle ScholarCross RefCross Ref
  37. Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, S. Yu Philip, and others. 2008. Top 10 algorithms in data mining. Knowl. Infor. Syst. 14, 1 (2008), 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. 2014. Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26, 1 (2014), 97--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chunlei Yang, Jialie Shen, Jinye Peng, and Jianping Fan. 2012. Image collection summarization via dictionary learning for sparse representation. Pattern Recogn. 46, 3 (2012), 948--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zizhen Yao and Walter L. Ruzzo. 2006. A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinfor. 7, Suppl. 1 (2006), S11. Google ScholarGoogle ScholarCross RefCross Ref
  41. Renzhen Ye and Xuelong Li. 2016. Compact structure hashing via sparse and similarity preserving embedding. IEEE Trans. Cybernet. 46, 3 (2016), 718--729. Google ScholarGoogle ScholarCross RefCross Ref
  42. Chengqi Zhang, Yongsong Qin, Xiaofeng Zhu, Jilian Zhang, and Shichao Zhang. 2006. Clustering-based missing value imputation for data preprocessing. In 2006 4th IEEE International Conference on Industrial Informatics. 1081--1086. Google ScholarGoogle ScholarCross RefCross Ref
  43. Chengqi Zhang, Xiaofeng Zhu, Jilian Zhang, Yongsong Qin, and Shichao Zhang. 2007. GBKII: An imputation method for missing values. In PAKDD. 1080--1087.Google ScholarGoogle Scholar
  44. Shizhao Zhang. 2010. KNN-CF approach: Incorporating certainty factor to kNN classification. IEEE Intell. Infor. Bull. 11, 1 (2010), 24--33.Google ScholarGoogle Scholar
  45. Shichao Zhang. 2011. Shell-neighbor method and its application in missing data imputation. Appl. Intell. 35, 1 (2011), 123--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shichao Zhang, Debo Cheng, Ming Zong, and Lianli Gao. 2016. Self-representation nearest neighbor search for classification. Neurocomputing 195 (2016), 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Shichao Zhang, Ming Zong, Ke Sun, Yue Liu, and Debo Cheng. 2014. Efficient kNN algorithm based on graph sparse reconstruction. In ADMA. 356--369. Google ScholarGoogle ScholarCross RefCross Ref
  48. Yuejie Zhang, Lei Cen, Cheng Jin, Xiangyang Xue, and Jianping Fan. 2011. Learning inter-related statistical query translation models for English-Chinese bi-directional CLIR. In International Joint Conference on Artificial Intelligence. 1915--1920.Google ScholarGoogle Scholar
  49. Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013a. Sparse hashing for fast multimedia search. ACM Trans. Infor. Syst. 31, 2 (2013), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Xiaofeng Zhu, Zi Huang, Yang Yang, Heng Tao Shen, Changsheng Xu, and Jiebo Luo. 2013b. Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recogn. 46, 1 (2013), 215--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xiaofeng Zhu, Xuelong Li, and Shichao Zhang. 2016a. Block-row sparse multiview multilabel learning for image classification. IEEE Trans. Cybernet. 46, 2 (2016), 450--461. Google ScholarGoogle ScholarCross RefCross Ref
  52. Xiaofeng Zhu, Xuelong Li, Shichao Zhang, Chunhua Ju, and Xindong Wu. 2016b. Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans. Neur. Netw. Learn. Syst. (2016).Google ScholarGoogle Scholar
  53. Xiaofeng Zhu, Heung-Il Suk, and Dinggang Shen. 2014. Matrix-similarity based loss function and feature selection for alzheimer’s disease diagnosis. In CVPR. 3089--3096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang, and Zhuoming Xu. 2011. Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23, 1 (2011), 110--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Xiaofeng Zhu, Shichao Zhang, Jilian Zhang, and Chengqi Zhang. 2007. Cost-sensitive imputing missing values with ordering. In AAAI. 1922--1923.Google ScholarGoogle Scholar

Index Terms

  1. Learning k for kNN Classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 3
        Special Issue: Mobile Social Multimedia Analytics in the Big Data Era and Regular Papers
        May 2017
        320 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/3040485
        • Editor:
        • Yu Zheng
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 January 2017
        • Revised: 1 August 2016
        • Accepted: 1 August 2016
        • Received: 1 May 2016
        Published in tist Volume 8, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader