Abstract
We examine methods for constructing regression ensembles based on a linear program (LP). The ensemble regression function consists of linear combinations of base hypotheses generated by some boosting-type base learning algorithm. Unlike the classification case, for regression the set of possible hypotheses producible by the base learning algorithm may be infinite. We explicitly tackle the issue of how to define and solve ensemble regression when the hypothesis space is infinite. Our approach is based on a semi-infinite linear program that has an infinite number of constraints and a finite number of variables. We show that the regression problem is well posed for infinite hypothesis spaces in both the primal and dual spaces. Most importantly, we prove there exists an optimal solution to the infinite hypothesis space problem consisting of a finite number of hypothesis. We propose two algorithms for solving the infinite and finite hypothesis problems. One uses a column generation simplex-type algorithm and the other adopts an exponential barrier approach. Furthermore, we give sufficient conditions for the base learning algorithm and the hypothesis set to be used for infinite regression ensembles. Computational results show that these methods are extremely promising.
Article PDF
Similar content being viewed by others
References
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithm: Bagging, boosting and variants. Machine Learning, 36, 105–142.
Bennett, K., Demiriz, A., & Shawe-Taylor, J. (2000). A column generation algorithm for boosting. In Pat Langley (Ed.), Proceedings Seventeenth International Conference on Machine Learning (pp. 65–72). San Francisco: Morgan Kaufmann.
Bertoni, A., Campadelli, P., & Parodi, M. (1997). Aboosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings ICANN'97, Int. Conf. on Artificial Neural Networks, Vol. V of LNCS (pp. 343–348), Berlin: Springer.
Bertsekas, D. (1995). Nonlinear programming. Belmont, MA: Athena Scientific.
Bradley, P., Mangasarian, O., & Rosen, J. (1998). Parsimonious least norm approximation. Computational Optimization and Applications, 11:1, 5–21.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11:7, 1493–1518. Also Technical Report 504, Statistics Department, University of California, Berkeley.
Breneman, C., Sukumar, N., Bennett, K., Embrechts, M., Sundling, M., & Lockwood, L. (2000). Wavelet representations of molecular electronic properties: Applications in ADME, QSPR, and QSAR. Presentation, QSAR in Cells Symposium of the Computers in Chemistry Division's 220th American Chemistry Society National Meeting.
Censor,Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms and application. Numerical Mathematics and Scientific Computation. Oxford: Oxford University Press.
Chen, S., Donoho, D., & Saunders, M. (1995). Atomic decomposition by basis pursuit Technical Report 479, Department of Statistics, Stanford University.
Collins, M., Schapire, R., & Singer,Y. (2000). Adaboost and logistic regression unified in the context of information geometry. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory.
Cominetti, R., & Dussault, J.-P. (1994). A stable exponential penalty algorithm with superlinear convergence J.O.T.A., 83:2.
Demiriz, A., Bennett, K., Breneman, C., & Embrechts, M. (2001). Support vector machine regression in chemometrics. Computer Science and Statistics. In Proceeding of the Conference on the 32 Symposium on the Interface, to appear.
Dietterich, T. (1999). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2.
Drucker, H., Schapire, R., & Simard, P. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7, 705–719.
Duffy, N., & Helmbold, D. (2000). Leveraging for regression. In Colt'00 (pp. 208-219).
Embrechts, M., Kewley, R., & Breneman, C. (1998). Computationally intelligent data mining for the automated design and discovery of novel pharmaceuticals. In C. D. et al. (Ed.), Intelligent engineering systems through artifical neural networks, pp. 391-396. ASME Press.
Fisher, J., D. H. (Ed.). Improving regressors using boosting techniques. In Proceedings of the Fourteenth International Conference on Machine Learning.
Frean, M., & Downs, T. (1998). A simple cost function for boosting. Technical Report, Department of Computer Science and Electrical Engineering, University of Queensland.
Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In COLT. San Mateo, CA: Morgan Kaufman. ACM Press, New York, NY, pp. 325–332.
Freund, Y., & Schapire, R. (1994). A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT: European Conference on Computational Learning Theory. LNCS.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning (pp. 148–146). San Mateo, CA: Morgan Kaufmann.
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford Univerity.
Friedman, J. (1999). Greedy function approximation. Technical Report, Department of Statistics, Stanford University.
Frisch, K. (1955). The logarithmic potential method of convex programming. Memorandum, University Institute of Economics, Oslo.
Grove, A., & Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artifical Intelligence.
Hettich, R., & Kortanek, K. (1993). Semi-infinite programming: Theory, methods and applications. SIAM Review, 3, 380–429.
Kaliski, J. A., Haglin, D. J., Roos, C., & Terlaky, T. (1997). Logarithmic barrier decomposition methods for semi-infinite programming. International Transactions in Operational Research, 4:4, 285–303.
Kivinen, J., & Warmuth, M. (1999). Boosting as entropy projection. In Proc. 12th Annual Conference on Computational Learning Theory (pp. 134–144). New York: ACM Press.
LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U. A., Säckinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié, & P. Gallinari (Eds.), Proceedings ICANN'95-International Conference on Artificial Neural Networks (Vol. II, pp. 53–60). Nanterre, France. EC2.
Luenberger, D. (1984). Linear and nonlinear programming (2nd edn.). Reading: Addison-Wesley Publishing Co., Reprinted with corrections in May, 1989.
Mackey, M. C., & Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197, 287–289.
Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proc. of AAAI.
Mason, L., Bartlett, P., & Baxter, J. (1998). Improved generalization through explicit optimization of margins. Technical Report, Deparment of Systems Engineering, Australian National University.
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 221–247). Cambridge, MA: MIT Press.
Mika, S., Rätsch, G., & Müller, K.-R. (2001). A mathematical programming approach to the Kernel Fisher algorithm. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Processing Systems, 13, 591–597.
Mosheyev, L., & Zibulevsky, M. (2000). Penalty/barrier multiplier algorithm for semidefinite programming. Optimization Methods and Software, 13:4, 235–262.
Müller, K.-R., Kohlmorgen, J., & Pawelzik, K. (1995). Analysis of switching dynamics with competing neural networks. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E78-A:10, 1306–1315.
Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1999). Predicting time series with support vector machines. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods-support vector learning (pp. 243–254). Cambridge, MA: MIT Press. Short version appeared in ICANN'97, Springer Lecture Notes in Computer Science.
Müller, K.-R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., & Vapnik, V. (1997). Predicting time series with support vector machines. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial neural networks-ICANN'97 (pp. 999–1004). Berlin: Springer. Lecture Notes in Computer Science, Vol. 1327.
Pawelzik, K., Kohlmorgen, J., & Müller, K.-R. (1996). Annealed competition of experts for a segmentation and classification of switching dynamics. Neural Computation, 8:2, 342–358.
Rätsch, G. (2001). Robust boosting via convex optimization. Ph.D. Thesis, University of Potsdam, Neues Palais 10, 14469 Potsdam, Germany.
Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42:3, 287–320. also NeuroCOLT Technical Report NC-TR-1998-021.
Rätsch, G., Schölkopf, B., Mika, S., & Müller, K.-R. (2000a). SVM and boosting: One class. Technical report 119, GMD FIRST, Berlin. Accepted for publication in IEEE TPAMI.
Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., & Müller, K.-R. (2000b). Robust ensemble learning. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 207–219). Cambridge, MA: MIT Press.
Rätsch, G. R., & Warmuth, M. K. (2001). Marginal boosting. Royal Holloway College, NeuroCOLT2 Technical report, 97. London.
Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K.-R. (2000). Barrier boosting. In COLT'2000 (pp. 170–179). San Mateo, CA: Morgan Kaufmann.
Ridgeway, G. D., & Madigan, T. R. (1999). Boosting methodology for regression problems. In D. Heckerman, & J. Whittaker (Eds.), Proceedings of Artificial Intelligence and Statistics '99 (pp. 152-161). http:/www.rand.org/methodology/stat/members/gregr.
Schapire, R., Freund,Y., Bartlett, P., & Lee,W. (1997). Boosting the margin:Anewexplanation for the effectiveness of voting methods. In Proc. 14th International Conference on Machine Learning (pp. 322–330). San Mateo, CA: Morgan Kaufmann.
Schölkopf, B., Burges, C., & Smola, A. (Eds.). (1999). Advances in Kernel methods-support vector learning. Cambridge, MA: MIT Press.
Schölkopf, B., Smola, A., Williamson, R. C., & Bartlett, P. L. (2000). New support vector algorithms. Neural Computation, 12, 1207–1245.
Schwenk, H., & Bengio, Y. (1997). AdaBoosting neural networks. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proc. of the Int. Conf. on Artificial Neural Networks (ICANN'97), Vol. 1327 of LNCS (pp. 967–972). Berlin: Springer.
Smola, A. J. (1998). Learning with Kernels. Ph.D. Thesis, Technische Universit¨at Berlin.
Smola, A., Schölkopf, B., & Rätsch, G. (1999). Linear programs for automatic accuracy control in regression. In Proceedings ICANN'99, Int. Conf. on Artificial Neural Networks, Berlin: Springer.
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer Verlag.
Weigend, A., & N. A. Gershenfeld (Eds.) (1994). Time series prediction: Forecasting the future and understanding the past. Addison-Wesley. Santa Fe Institute Studies in the Sciences of Complexity.
Zemel, R., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in Neural Information Precessing Systems 13 (pp. 696–702). Cambridge, MA: MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rätsch, G., Demiriz, A. & Bennett, K.P. Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces. Machine Learning 48, 189–218 (2002). https://doi.org/10.1023/A:1013907905629
Issue Date:
DOI: https://doi.org/10.1023/A:1013907905629