Abstract
The problem of tolerant data fitting by a nonlinear surface, induced by a kernel-based support vector machine is formulated as a linear program with fewer number of variables than that of other linear programming formulations. A generalization of the linear programming chunking algorithm for arbitrary kernels is implemented for solving problems with very large datasets wherein chunking is performed on both data points and problem variables. The proposed approach tolerates a small error, which is adjusted parametrically, while fitting the given data. This leads to improved fitting of noisy data (over ordinary least error solutions) as demonstrated computationally. Comparative numerical results indicate an average time reduction as high as 26.0% over other formulations, with a maximal time reduction of 79.7%. Additionally, linear programs with as many as 16,000 data points and more than a billion nonzero matrix elements are solved.
Article PDF
Similar content being viewed by others
References
Bennett, K. P. (1999). Combining support vector and mathematical programming methods for induction. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector mechines (pp. 307-326). Cambridge, MA: MIT Press.
Bradley, P. S. & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optimization, Methods and Software, (vol. 13, pp. 1-10). ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps
Burges, C. J. C. (1998). Atutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121-167.
Cherkassky, V. & Mulier, F. (1998). Learning from data-concepts, theory and methods. New York: John Wiley & Sons.
Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-279.
Dantzig, G. B. & Wolfe, P. (1960). Decomposition principle for linear programs. Operations Research, 8, 101-111.
Delve. Data for evaluating learning in valid experiments. http://www.cs.utoronto.ca/~delve/
Gilmore, P. C. & Gomory, R. E. (1961). A linear programming approach to the cutting stock problem. Operations Research, 9, 849-859.
Huber, P. J. (1964). Robust estimation of location parameter. Annals of Mathematical Statistics, 35, 73-101.
Huber, P. J. (1981). Robust statistics. New York: John Wiley.
ILOG CPLEX Division, Incline Village, Nevada. (1991). ILOG CPLEX 6.5 Reference Manual. 2000.
Mangasarian, O. L. (2000). Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 135-146). Cambridge, MA: MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps
Mangasarian, O. L. & Meyer, R. R. (1979). Nonlinear perturbation of linear programs. SIAM Journal on Control and Optimization, 17:6, 745-752.
MATLAB. (1994-2000). User's guide. The MathWorks, Inc., Natick, MA 01760. http:/www.mathworks.com/ products/matlab/usersguide.shtml
MATLAB. (1997). Application program interface guide. The MathWorks, Inc., Natick, MA 01760.
Murphy, P. M. & Aha, D. W. (1992). UCI repository of machine learning databases. www.ics.uci.edu/~mlearn/ MLRepository.html
Schölkopf, B., Bartlett, P., Smola, A., & Williamson, R. (1998). Support vector regression with automatic accuracy control. In L. Niklasson, M. Boden, & T. Ziemke (Eds.). Proceedings of the Eight International Conference on Artificial Neural Networks (pp. 111-116) Berlin: Springer Verlag. Available at http://www.kernelmachines. org/publications.html
Schölkopf, B., Bartlett, P., Smola, A., & Williamson, R. (1999). Shrinking the tube: A new support vector regression algorithm. In M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.). Advances in neural information processing systems (vol. 11, pp. 330-336). Cambridge, MA: MIT Press. Available at http://www.kernelmachines. org/publications.html
Schölkopf, B., Burges, C., & Smola, A. (Eds.). (1999). Advances in kernel methods: Support vector machines. Cambridge, MA: MIT Press.
Smola, A. J. (1998). Learning with kernels. Ph.D. Thesis, Technische Universität Berlin, Berlin, Germany.
Smola, A., Schölkopf, B., & Rätsch, G. (1999). Linear programs for automatic accuracy control in regression. In Ninth International Conference on Artificial Neural Networks, Conference Publications No. 470 (pp. 575-580). London: IEE. Available at http://www.kernel-machines.org/publications.html
Street, W. N. & Mangasarian, O. L. (1998). Improved generalization via tolerant training. Journal of Optimization Theory and Applications, 96:2, 259-279. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/95-11.ps
US Census Bureau. Adult dataset. Publicly available from www.sgi.com/Technology/mlc/db/.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Weston, J., Gammerman, A., Stitson, M., Vapnik, V., Vovk, V., & Watkins, C. (1997). Support vector density estimation. In B. Schölkopf, C. Burnes, & A. Smola (Eds.). Advances in kernel methods: Support vector machines (pp. 293-306). Cambridge, MA: MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mangasarian, O., Musicant, D.R. Large Scale Kernel Regression via Linear Programming. Machine Learning 46, 255–269 (2002). https://doi.org/10.1023/A:1012422931930
Issue Date:
DOI: https://doi.org/10.1023/A:1012422931930