Article

A scalable modular convex solver for regularized risk minimization

Authors:
Choon Hui Teo

National ICT Australia, Canberra, Australia

National ICT Australia, Canberra, Australia
View Profile

,
Alex Smola

National ICT Australia, Canberra, Australia

National ICT Australia, Canberra, Australia
View Profile

,
S. V.N. Vishwanathan

National ICT Australia, Canberra, Australia

National ICT Australia, Canberra, Australia
View Profile

,
Quoc Viet Le

National ICT Australia / Max Planck Institute for Biological Cybernetics

National ICT Australia / Max Planck Institute for Biological Cybernetics
View Profile

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2007Pages 727–736https://doi.org/10.1145/1281192.1281270

Published:12 August 2007Publication History

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 727–736

ABSTRACT

A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a highly scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as l₁ and l₂ penalties. At present, our solver implements 20 different estimation problems, can be easily extended, scales to millions of observations, and is up to 10 times faster than specialized solvers for many applications. The open source code is freely available as part of the ELEFANT toolbox.

References

G. Bakir, T. Hofmann, B. Schölkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan. Predicting Structured Data. MIT Press, Cambridge, Massachusetts, 2007. Google ScholarDigital Library
S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc users manual. Technical Report ANL-95/11, Argonne National Laboratory, 2006.Google Scholar
O. E. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory. John Wiley and Sons, New York, 1978.Google Scholar
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1:23--34, 1992.Google ScholarCross Ref
S. Benson, L. Curfman-McInnes, J. Moré, and J. Sarich. TAO user manual. Technical Report ANL/MCS-TM-242, Argonne National Laboratory, 2004.Google Scholar
R. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190--1208, 1995. Google ScholarDigital Library
L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proc. of ACM conference on info. and knowledge mgmt., pages 78--87, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 51(12):4203--4215, 2005. Google ScholarDigital Library
C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001.Google Scholar
O. Chapelle. Training a support vector machine in the primal. Technical Report TR.147, Max Planck Institute for Biological Cybernetics, 2006.Google Scholar
C. Chu, S. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS 19, 2007.Google Scholar
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, , and M. Zhu. Tools for Privacy Preserving Distributed Data Mining. ACM SIGKDD Explorations, 4(2), December 2002. Google ScholarDigital Library
M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, AdaBoost and Bregman distances. In COLT, pages 158--169. Morgan Kaufmann, San Francisco, 2000. Google ScholarDigital Library
R. Cowell, A. David, S. Lauritzen, and D. Spiegelhalter. Probabilistic Networks and Expert Sytems. Springer, New York, 1999. Google ScholarDigital Library
K. Crammer and Y. Singer. Online ranking by projecting. Neural Computation, 17(1):145--175, 2005. Google ScholarDigital Library
N. A. C. Cressie. Statistics for Spatial Data. John Wiley and Sons, New York, 1993.Google Scholar
L. Fahrmeir and G. Tutz. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, 1994.Google ScholarCross Ref
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representation. Technical report, IBM Watson Research Center, New York, 2000.Google Scholar
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. JMLR, 2001. Google ScholarDigital Library
C. Gentile and M. K. Warmuth. Linear hinge loss and average margin. In NIPS 11, pages 225--231, Cambridge, MA, 1999. Google ScholarDigital Library
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 115--132, Cambridge, MA, 2000. MIT Press.Google Scholar
J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms, I and II. 305 and 306. Springer-Verlag, 1993.Google ScholarCross Ref
T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods. Support Vector Learning, pages 169--184, Cambridge, MA, 1999. MIT Press. Google ScholarDigital Library
T. Joachims. A support vector method for multivariate performance measures. In ICML, pages 377--384, San Francisco, California, 2005. Morgan Kaufmann Publishers. Google ScholarDigital Library
T. Joachims. Training linear SVMs in linear time. In KDD, 2006. Google ScholarDigital Library
S. S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 6:341--361, 2005. Google ScholarDigital Library
R. Koenker. Quantile Regression. Cambridge University Press, 2005.Google Scholar
J. D. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In ICML, volume 18, pages 282--289, 2001. Google ScholarDigital Library
Q. Le and A. Smola. Direct optimization of ranking measures. JMLR, 2007. submitted.Google Scholar
O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Oper. Res., 13:444--452, 1965.Google ScholarDigital Library
K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik. Predicting time series with support vector machines. In ICANN'97, pages 999--1004, 1997. Google ScholarDigital Library
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. TR 87, Microsoft Research, Redmond, WA, 1999.Google Scholar
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Comput., 13(7):1443--1471, 2001. Google ScholarDigital Library
F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL, pages 213--220, 2003. Google ScholarDigital Library
S. Shalev-Shwartz and Y. Singer. Online learning optimization in the dual. In COLT, 2006. extendedGoogle Scholar
V. Sindhwani and S. Keerthi. Large scale semi-supervised linear svms. In SIGIR '06, pages 477--484, New York, USA, 2006. ACM Press. Google ScholarDigital Library
I. Takeuchi, Q. Le, T. Sears, and A. Smola. Nonparametric quantile estimation. JMLR, 2006. Google ScholarDigital Library
B. Taskar, C. Guestrin, and D. Koller. Max-margin networks. In NIPS, pages 25--32, 2004.Google Scholar
R. Tibshirani. Regression shrinkage and selection via lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 58:267--288 1996.Google ScholarCross Ref
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005. Google ScholarDigital Library
V. Vapnik, S. Golowich, and A. J. Smola. Support method for function approximation, regression estimation, and signal processing. In NIPS, pages 281--287, 1997.Google Scholar
S. V. N. Vishwanathan and A. J. Smola. Fast kernels string and tree matching. In NIPS, pages 569--576, 2003Google Scholar
C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. M. I. Jordan, editor, Learning and Inference in Graphical Models, pages 599--621. Kluwer Academic, 1998. Google ScholarDigital Library

Index Terms

A scalable modular convex solver for regularized risk minimization
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization

From many fewer acquired measurements than suggested by the Nyquist sampling theory, compressive sensing (CS) theory demonstrates that, a signal can be reconstructed with high probability when it exhibits sparsity in some domain. Most of the ...
Read More
Regularized bundle methods for convex and non-convex risks

Machine learning is most often cast as an optimization problem. Ideally, one expects a convex objective function to rely on efficient convex optimizers with nice guarantees such as no local optima. Yet, non-convexity is very frequent in practice and it ...
Read More
Bundle Methods for Regularized Risk Minimization

A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2007
1080 pages
ISBN:9781595936097
DOI:10.1145/1281192
General Chair:
Pavel Berkhin
Yahoo!, USA
,
Program Chairs:
Rich Caruana
Cornell University, USA
,
Xindong Wu
University of Vermont, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithms
convexity
optimization
Qualifiers
- Article
Conference

Acceptance Rates
KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 98
  Total Citations
  View Citations
- 784
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A scalable modular convex solver for regularized risk minimization

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization

Regularized bundle methods for convex and non-convex risks

Bundle Methods for Regularized Risk Minimization