Elsevier

Neurocomputing

Volume 71, Issues 13–15, August 2008, Pages 2772-2781
Neurocomputing

Selecting valuable training samples for SVMs via data structure analysis

https://doi.org/10.1016/j.neucom.2007.09.008Get rights and content

Abstract

In spite of its salient properties and wide acceptance, support vector machines (SVMs) still face difficulties in scalability, because solving the quadratic programming (QP) problems in SVMs training is especially costly when dealing with large sets of training data. This paper presents a new algorithm named sample reduction by data structure analysis (SR-DSA) for SVMs to improve their scalability. The SR-DSA utilizes data structure information in selecting the data points valuable in learning the separating plane. As this method is performed completely before SVMs training, it avoids the problem suffered by most sample reduction methods that choose samples heavily depending on repeated training of SVMs. Experiments on both synthetic and real world datasets show that the SR-DSA is capable of reducing the number of samples as well as the time for SVMs training while maintaining high testing accuracy.

Introduction

Founded on Vapniks statistical learning theory, support vector machines (SVMs) [18], [1] have played an important role in many areas including (but not confined to) pattern recognition, regression, image processing, and bioinformatics, due to its salient properties such as margin maximization and kernel substitution for classifying the data in a high dimensional feature space. Compared with neural networks, SVMs have a smaller number of tunable parameters and can find the global solution. However, as the number of data points increases, the limitation of SVMs in the aspect of scalability becomes significant. Specifically, the training time increases dramatically (at least quadratically) and the memory may blow up because the standard quadratic programming (QP) algorithms are too time-consuming in the case of a large number of samples.

An intuitive solution to accelerate SVMs training is to decompose the QP problem into a number of sub-problems so that the overall SVMs training time can be reduced [8], [2], [10]. Although the time complexity can be reduced from O(N3) to O(N2) when the number of data points N is very large, this time complexity still cannot meet the requirement in practice and needs further improvement.

Active learning can be applied in SVMs to sample a small number of training data out of the whole dataset [17], [15]. In active learning, the data close to the boundary will be iteratively selected because they are supposed to have higher chances to be support vectors (SVs) in the next round, until no data point is nearer to the boundary than the SVs. However, it has shortcomings in (1) the whole dataset could be scanned for many times; (2) the time for it to converge could be long; and (3) it may select too many data points because of improper criteria for the concept of “near”.

Some random sampling methods (e.g., [9]) are also possible to reduce the time complexity of SVMs training. But as the probability distribution of the whole dataset may not be well represented by the randomly selected data points, the performance of SVMs will be downgraded greatly. Although some authors proposed other random sampling methods (e.g., [20]) with the idea similar to active learning, in which the data near the boundary will have higher probabilities to be selected, these methods still suffer from the difficulties existing in active learning.

Clustering-based SVM (CB-SVM) [21] is a newly developed method to consider the clustering information in reducing the training samples for SVMs. Based on BIRCH [22], a hierarchical micro-clustering algorithm, the CB-SVM constructs two clustering feature (CF) trees before training. However, inappropriate setting the node width may cause rescanning of the whole dataset. Also, for nonlinear CB-SVM, the clustering is performed in the input space, while the distance is measured in the kernel space, which introduces unexpected mistakes in sample reduction. In addition, CB-SVM requires many rounds of SVMs training, just like that in the active learning.

Almost all of the existing methods proposed to improve the scalability of SVMs need to train SVMs and/or scan the whole dataset for many times to get the current selection of data, their efficiency is still limited by the training speed of SVMs and the scale of the dataset. Instead of using SVMs training to select valuable data points, a new approach is proposed in this paper, i.e., to first find out the inherent structure of the dataset by clustering, and then infer data points that are potential SVs directly based on the detected dataset structure.

For clarity, Table 1lists the notations that will be quoted. The bold typeface in this paper denotes vectors or matrices, and the normal typeface stands for scalars or vector components.

The remainder of this paper is organized as follows. We first provide a brief review of SVMs in Section 2. In Section 3, we describe how to analyze the structure of given data and how to calculate Mahalanobis distance in both input and kernel spaces, then we introduce our algorithm, SR-DSA, which removes the data points that will not affect the separating plane decision. Section 4 reports the toy and real world experimental results, which demonstrate the feasibility and merits of our algorithm. In Section 5, we provide some discussions and sum up the paper.

Section snippets

Support vector machines

Given m training pairs (x1,y1),,(xm,ym) , where xiRd is an input vector labeled by yi{+1,-1} for i=1,,m, SVMs [18] search for a separating hyper-plane with largest margin, which is called an optimal hyper-plane wTx+b=0. This hyper-plane can classify an input pattern according to the following function: f(x)=sgn(wTx+b),where sgn(k)=+1ifk0,-1ifk<0.

In order to maximize the margin for linearly separable cases, we need to find the solution for the following quadratic programming problemmin12w2

Sample reduction for SVMs

In this section, we will first illustrate how to find the structure information in the given dataset, and then we elaborate the details about Mahalanobis calculation. Our algorithm, SR-DSA, is presented afterwards and followed by the analysis of its complexity.

Experiments and results

To demonstrate the performance of the proposed SR-DSA, we report the experimental results on both synthetic and real world datasets. We use SVM Light 1 for SVMs implementation. All the programs were executed on a Pentium IV 1.2 GHz machine with 256 MB memory.

Discussion and conclusion

The data reduction method for support vector machines (SVMs) presented in this paper, sample reduction by data structure analysis (SR-DSA), can be easily applied to other large margin learning models such as M4 [6]. Besides, it is potential to reduce the samples in semi-supervised learning as well. In this paper, we only discuss binary classification for simplicity; while for multi-class learning problem, SR-DSA is also useful to speed up the training of each pairwise SVMs in the decision

Defeng Wang received his Ph.D. degree in computing at the Hong Kong Polytechnic University in 2006 and M.Eng. in Computer Application from Xidian University in 2003. Currently, he is a postdoctoral research fellow in the Department of Computer Science and Engineering of the Chinese University of Hong Kong. His recent research is focused on kernel methods, large margin learning, and their applications in computational life science.

References (22)

  • C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Discovery

    (1998)
  • R. Collobert et al.

    SVMTorch: Support vector machines for large-scale regression problems

    J. Mach. Learn. Res.

    (2001)
  • D. Eppstein

    Fast hierarchical clustering and other applications of dynamic closest pairs

  • S. Everitt et al.

    Cluster Analysis

    (2001)
  • C.W. Hsu et al.

    A comparison of methods for multiclass support vector machines

    IEEE Trans. Neural Networks

    (2002)
  • K. Huang et al.

    Learning large margin classifiers locally and globally

  • A.K. Jain et al.

    Algorithms for Clustering Data

    (1988)
  • T. Joachims

    Making large-scale SVM learning practical

  • Y.J. Lee et al.

    RSVMreduced support vector machine

  • J. Platt

    Fast training of support vector machines using sequential minimal optimization

  • J. Platt et al.

    Large margin DAGs for multiclass classification

    Adv. Neural Inform. Process. Syst.

    (2000)
  • Cited by (39)

    • A fast instance selection method for support vector machines in building extraction

      2020, Applied Soft Computing
      Citation Excerpt :

      Also, they might deliver a high standard deviation of classification accuracy and loss [28]. The clustering step can be done for instances of each class separately [36–38] or for all instances regardless of their classes [30–32,35,39]. Regarding selecting the crucial clusters, the authors in [30] and [39] considered homogeneous clusters (those that contain instances of a single class) as ineffective ones that do not contribute much to extracting the border of classes and preserved only one point from each of them (centroid or a point near the centroid).

    • Support vector candidates selection via Delaunay graph and convex-hull for large and high-dimensional datasets

      2018, Pattern Recognition Letters
      Citation Excerpt :

      The selected support vector candidates from the proposed method give a similar accuracy to the whole dataset, but with a fraction of training time. Wang and Shi [48] presents a sample reduction technique based on data structure analysis (SR-DSA) to improve SVM scalability. The algorithm has three steps: (i) find the data structures for positive and negative class independently, where it defines a data structure as units in which the data points are considered to share the same dispersion.

    • Finding the samples near the decision plane for support vector learning

      2017, Information Sciences
      Citation Excerpt :

      It is imperative to explore strategies to speed up SVMs for practical applications. Prior work in speeding up SVMs can be categorized into: algorithmic improvements [8,11,23,26], new model establishment [9,10,17,18,27], and data pre-processing [5,12,13,20,24,25,29,33,34]. The aim of algorithmic approach is to make QP solver become faster.

    View all citing articles on Scopus

    Defeng Wang received his Ph.D. degree in computing at the Hong Kong Polytechnic University in 2006 and M.Eng. in Computer Application from Xidian University in 2003. Currently, he is a postdoctoral research fellow in the Department of Computer Science and Engineering of the Chinese University of Hong Kong. His recent research is focused on kernel methods, large margin learning, and their applications in computational life science.

    Lin Shi received her B.Eng. degree in computer science at the Xidian University in Xi’an, China, in 2001. She is currently a Ph.D. student in the Department of Computer Science and Engineering at the Chinese University of Hong Kong. Her research area is medical image analysis.

    View full text