skip to main content
10.1145/1281192.1281270acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A scalable modular convex solver for regularized risk minimization

Published:12 August 2007Publication History

ABSTRACT

A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a highly scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as l1 and l2 penalties. At present, our solver implements 20 different estimation problems, can be easily extended, scales to millions of observations, and is up to 10 times faster than specialized solvers for many applications. The open source code is freely available as part of the ELEFANT toolbox.

References

  1. G. Bakir, T. Hofmann, B. Schölkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan. Predicting Structured Data. MIT Press, Cambridge, Massachusetts, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11, Argonne National Laboratory, 2006.Google ScholarGoogle Scholar
  3. O. E. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory. John Wiley and Sons, New York, 1978.Google ScholarGoogle Scholar
  4. K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1:23--34, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Benson, L. Curfman-McInnes, J. Moré, and J. Sarich. TAO user manual. Technical Report ANL/MCS-TM-242, Argonne National Laboratory, 2004.Google ScholarGoogle Scholar
  6. R. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190--1208, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proc. of ACM conference on info. and knowledge mgmt., pages 78--87, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 51(12):4203--4215, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001.Google ScholarGoogle Scholar
  10. O. Chapelle. Training a support vector machine in the primal. Technical Report TR.147, Max Planck Institute for Biological Cybernetics, 2006.Google ScholarGoogle Scholar
  11. C. Chu, S. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS 19, 2007.Google ScholarGoogle Scholar
  12. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, , and M. Zhu. Tools for Privacy Preserving Distributed Data Mining. ACM SIGKDD Explorations, 4(2), December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, AdaBoost and Bregman distances. In COLT, pages 158--169. Morgan Kaufmann, San Francisco, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Cowell, A. David, S. Lauritzen, and D. Spiegelhalter. Probabilistic Networks and Expert Sytems. Springer, New York, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Crammer and Y. Singer. Online ranking by projecting. Neural Computation, 17(1):145--175, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. A. C. Cressie. Statistics for Spatial Data. John Wiley and Sons, New York, 1993.Google ScholarGoogle Scholar
  17. L. Fahrmeir and G. Tutz. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representation. Technical report, IBM Watson Research Center, New York, 2000.Google ScholarGoogle Scholar
  19. S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. JMLR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Gentile and M. K. Warmuth. Linear hinge loss and average margin. In NIPS 11, pages 225--231, Cambridge, MA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 115--132, Cambridge, MA, 2000. MIT Press.Google ScholarGoogle Scholar
  22. J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms, I and II. 305 and 306. Springer-Verlag, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  23. T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods. Support Vector Learning, pages 169--184, Cambridge, MA, 1999. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Joachims. A support vector method for multivariate performance measures. In ICML, pages 377--384, San Francisco, California, 2005. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Joachims. Training linear SVMs in linear time. In KDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 6:341--361, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Koenker. Quantile Regression. Cambridge University Press, 2005.Google ScholarGoogle Scholar
  28. J. D. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In ICML, volume 18, pages 282--289, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Q. Le and A. Smola. Direct optimization of ranking measures. JMLR, 2007. submitted.Google ScholarGoogle Scholar
  30. O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Oper. Res., 13:444--452, 1965.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik. Predicting time series with support vector machines. In ICANN'97, pages 999--1004, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. TR 87, Microsoft Research, Redmond, WA, 1999.Google ScholarGoogle Scholar
  33. B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Comput., 13(7):1443--1471, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, pages 213--220, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Shalev-Shwartz and Y. Singer. Online learning optimization in the dual. In COLT, 2006. extendedGoogle ScholarGoogle Scholar
  36. V. Sindhwani and S. Keerthi. Large scale semi-supervised linear svms. In SIGIR '06, pages 477--484, New York, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. I. Takeuchi, Q. Le, T. Sears, and A. Smola. Nonparametric quantile estimation. JMLR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B. Taskar, C. Guestrin, and D. Koller. Max-margin networks. In NIPS, pages 25--32, 2004.Google ScholarGoogle Scholar
  39. R. Tibshirani. Regression shrinkage and selection via lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 58:267--288 1996.Google ScholarGoogle ScholarCross RefCross Ref
  40. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. V. Vapnik, S. Golowich, and A. J. Smola. Support method for function approximation, regression estimation, and signal processing. In NIPS, pages 281--287, 1997.Google ScholarGoogle Scholar
  42. S. V. N. Vishwanathan and A. J. Smola. Fast kernels string and tree matching. In NIPS, pages 569--576, 2003Google ScholarGoogle Scholar
  43. C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. M. I. Jordan, editor, Learning and Inference in Graphical Models, pages 599--621. Kluwer Academic, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A scalable modular convex solver for regularized risk minimization

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
            August 2007
            1080 pages
            ISBN:9781595936097
            DOI:10.1145/1281192

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 August 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader