ABSTRACT
In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. The proposed method is simple and reaches an ε-accurate solution in O(log(1/ε)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVMperf, and a recent primal coordinate descent implementation.
- Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with LaRank. ICML. Google ScholarDigital Library
- Boser, B. E., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. COLT. Google ScholarDigital Library
- Bottou, L. (2007). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.Google Scholar
- Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2007). Coordinate descent method for large-scale L2-loss linear SVM (Technical Report). http://www.csie.ntu.edu.tw/~cjlin/papers/cdl2.pdf.Google Scholar
- Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin markov networks. JMLR. To appear. Google ScholarDigital Library
- Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. JMLR, 3, 951--991. Google ScholarDigital Library
- Friess, T.-T., Cristianini, N., & Campbell, C. (1998). The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. ICML. Google ScholarDigital Library
- Joachims, T. (1998). Making large-scale SVM learning practical. Advances in Kernel Methods -Support Vector Learning. Cambridge, MA: MIT Press. Google ScholarDigital Library
- Joachims, T. (2006). Training linear SVMs in linear time. ACM KDD. Google ScholarDigital Library
- Kao, W.-C., Chung, K.-M., Sun, C.-L., & Lin, C.-J. (2004). Decomposition methods for linear support vector machines. Neural Comput., 16, 1689--1704. Google ScholarDigital Library
- Keerthi, S. S., & DeCoste, D. (2005). A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 6, 341--361. Google ScholarDigital Library
- Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO algorithm for SVM classifier design. Neural Comput., 13, 637--649. Google ScholarDigital Library
- Langford, J., Li, L., & Strehl, A. (2007). Vowpal Wabbit. http://hunch.net/~vw.Google Scholar
- Lin, C.-J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression. JMLR, 9, 623--646. Google ScholarDigital Library
- Luo, Z.-Q., & Tseng, P. (1992). On the convergence of coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl., 72, 7--35. Google ScholarDigital Library
- Mangasarian, O. L., & Musicant, D. R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Networks, 10, 1032--1037. Google ScholarDigital Library
- Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines: An application to face detection. CVPR. Google ScholarDigital Library
- Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods -Support Vector Learning. Cambridge, MA: MIT Press. Google ScholarDigital Library
- Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. ICML. Google ScholarDigital Library
- Smola, A. J., Vishwanathan, S. V. N., & Le, Q. (2008). Bundle methods for machine learning. NIPS.Google Scholar
- Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML. Google ScholarDigital Library
Index Terms
- A dual coordinate descent method for large-scale linear SVM
Recommendations
Stochastic gradient descent for large-scale linear nonparallel SVM
WI '17: Proceedings of the International Conference on Web IntelligenceIn recent years, nonparallel support vector machine (NPSVM) is proposed as a nonparallel hyperplane classifier with superior performance than standard SVM and existing nonparallel classifiers such as the twin support vector machine (TWSVM). With the ...
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines
Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel ...
Large-scale linear nonparallel support vector machine solver
Twin support vector machines (TWSVMs), as the representative nonparallel hyperplane classifiers, have shown the effectiveness over standard SVMs from some aspects. However, they still have some serious defects restricting their further study and real ...
Comments