skip to main content
10.1145/1390156.1390208acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

A dual coordinate descent method for large-scale linear SVM

Published:05 July 2008Publication History

ABSTRACT

In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. The proposed method is simple and reaches an ε-accurate solution in O(log(1/ε)) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVMperf, and a recent primal coordinate descent implementation.

References

  1. Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with LaRank. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boser, B. E., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. COLT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bottou, L. (2007). Stochastic gradient descent examples. http://leon.bottou.org/projects/sgd.Google ScholarGoogle Scholar
  4. Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chang, K.-W., Hsieh, C.-J., & Lin, C.-J. (2007). Coordinate descent method for large-scale L2-loss linear SVM (Technical Report). http://www.csie.ntu.edu.tw/~cjlin/papers/cdl2.pdf.Google ScholarGoogle Scholar
  6. Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin markov networks. JMLR. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. JMLR, 3, 951--991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Friess, T.-T., Cristianini, N., & Campbell, C. (1998). The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joachims, T. (1998). Making large-scale SVM learning practical. Advances in Kernel Methods -Support Vector Learning. Cambridge, MA: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joachims, T. (2006). Training linear SVMs in linear time. ACM KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kao, W.-C., Chung, K.-M., Sun, C.-L., & Lin, C.-J. (2004). Decomposition methods for linear support vector machines. Neural Comput., 16, 1689--1704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Keerthi, S. S., & DeCoste, D. (2005). A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 6, 341--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO algorithm for SVM classifier design. Neural Comput., 13, 637--649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Langford, J., Li, L., & Strehl, A. (2007). Vowpal Wabbit. http://hunch.net/~vw.Google ScholarGoogle Scholar
  15. Lin, C.-J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for large-scale logistic regression. JMLR, 9, 623--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Luo, Z.-Q., & Tseng, P. (1992). On the convergence of coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl., 72, 7--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mangasarian, O. L., & Musicant, D. R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Networks, 10, 1032--1037. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines: An application to face detection. CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods -Support Vector Learning. Cambridge, MA: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Smola, A. J., Vishwanathan, S. V. N., & Le, Q. (2008). Bundle methods for machine learning. NIPS.Google ScholarGoogle Scholar
  22. Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A dual coordinate descent method for large-scale linear SVM

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICML '08: Proceedings of the 25th international conference on Machine learning
              July 2008
              1310 pages
              ISBN:9781605582054
              DOI:10.1145/1390156

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 5 July 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate140of548submissions,26%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader