research-article

A dual coordinate descent method for large-scale linear SVM

Authors:
Cho-Jui Hsieh

National Taiwan University, Taipei, Taiwan

National Taiwan University, Taipei, Taiwan
View Profile

,
Kai-Wei Chang

National Taiwan University, Taipei, Taiwan

National Taiwan University, Taipei, Taiwan
View Profile

,
Chih-Jen Lin

National Taiwan University, Taipei, Taiwan

National Taiwan University, Taipei, Taiwan
View Profile

,
S. Sathiya Keerthi

Yahoo! Research, Santa Clara

Yahoo! Research, Santa Clara
View Profile

,
S. Sundararajan

Yahoo! Labs, Bangalore, India

Yahoo! Labs, Bangalore, India
View Profile

ICML '08: Proceedings of the 25th international conference on Machine learningJuly 2008Pages 408–415https://doi.org/10.1145/1390156.1390208

Published:05 July 2008Publication History

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 408–415

ABSTRACT

In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. The proposed method is simple and reaches an ε-accurate solution in O(log(1/ε)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVM^perf, and a recent primal coordinate descent implementation.

References

Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with LaRank. ICML. Google ScholarDigital Library
Boser, B. E., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. COLT. Google ScholarDigital Library
Bottou, L. (2007). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.Google Scholar
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2007). Coordinate descent method for large-scale L2-loss linear SVM (Technical Report). http://www.csie.ntu.edu.tw/~cjlin/papers/cdl2.pdf.Google Scholar
Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin markov networks. JMLR. To appear. Google ScholarDigital Library
Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. JMLR, 3, 951--991. Google ScholarDigital Library
Friess, T.-T., Cristianini, N., & Campbell, C. (1998). The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. ICML. Google ScholarDigital Library
Joachims, T. (1998). Making large-scale SVM learning practical. Advances in Kernel Methods -Support Vector Learning. Cambridge, MA: MIT Press. Google ScholarDigital Library
Joachims, T. (2006). Training linear SVMs in linear time. ACM KDD. Google ScholarDigital Library
Kao, W.-C., Chung, K.-M., Sun, C.-L., & Lin, C.-J. (2004). Decomposition methods for linear support vector machines. Neural Comput., 16, 1689--1704. Google ScholarDigital Library
Keerthi, S. S., & DeCoste, D. (2005). A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 6, 341--361. Google ScholarDigital Library
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO algorithm for SVM classifier design. Neural Comput., 13, 637--649. Google ScholarDigital Library
Langford, J., Li, L., & Strehl, A. (2007). Vowpal Wabbit. http://hunch.net/~vw.Google Scholar
Lin, C.-J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression. JMLR, 9, 623--646. Google ScholarDigital Library
Luo, Z.-Q., & Tseng, P. (1992). On the convergence of coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl., 72, 7--35. Google ScholarDigital Library
Mangasarian, O. L., & Musicant, D. R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Networks, 10, 1032--1037. Google ScholarDigital Library
Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines: An application to face detection. CVPR. Google ScholarDigital Library
Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods -Support Vector Learning. Cambridge, MA: MIT Press. Google ScholarDigital Library
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. ICML. Google ScholarDigital Library
Smola, A. J., Vishwanathan, S. V. N., & Le, Q. (2008). Bundle methods for machine learning. NIPS.Google Scholar
Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML. Google ScholarDigital Library

Index Terms

A dual coordinate descent method for large-scale linear SVM
1. Computing methodologies
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Stochastic gradient descent for large-scale linear nonparallel SVM
WI '17: Proceedings of the International Conference on Web Intelligence

In recent years, nonparallel support vector machine (NPSVM) is proposed as a nonparallel hyperplane classifier with superior performance than standard SVM and existing nonparallel classifiers such as the twin support vector machine (TWSVM). With the ...
Read More
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

Linear support vector machines (SVM) are useful for classifying large-scale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel ...
Read More
Large-scale linear nonparallel support vector machine solver

Twin support vector machines (TWSVMs), as the representative nonparallel hyperplane classifiers, have shown the effectiveness over standard SVMs from some aspects. However, they still have some serious defects restricting their further study and real ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 533
  Total Citations
  View Citations
- 2,766
  Total Downloads
- Downloads (Last 12 months)126
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A dual coordinate descent method for large-scale linear SVM

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stochastic gradient descent for large-scale linear nonparallel SVM

Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

Large-scale linear nonparallel support vector machine solver

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A dual coordinate descent method for large-scale linear SVM

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stochastic gradient descent for large-scale linear nonparallel SVM

Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

Large-scale linear nonparallel support vector machine solver

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media