An Efficient Method To Estimate Bagging's Generalization Error

Wolpert, David H.; Macready, William G.

doi:10.1023/A:1007519102914

An Efficient Method To Estimate Bagging's Generalization Error

Published: April 1999

Volume 35, pages 41–55, (1999)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

An Efficient Method To Estimate Bagging's Generalization Error

Download PDF

David H. Wolpert¹ &
William G. Macready²

2342 Accesses
102 Citations
3 Altmetric
Explore all metrics

Abstract

Bagging (Breiman, 1994a) is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set (Efron & Tibshirani, 1993, Efron, 1979). The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive, for leave-one-out cross-validation one needs to train the underlying algorithm on the order of mν times, where m is the size of the training set and ν is the number of replicates. This paper presents several techniques for estimating the generalization error of a bagged learning algorithm without invoking yet more training of the underlying learning algorithm (beyond that of the bagging itself), as is required by cross-validation-based estimation. These techniques all exploit the bias-variance decomposition (Geman, Bienenstock & Doursat, 1992, Wolpert, 1996). The best of our estimators also exploits stacking (Wolpert, 1992). In a set of experiments reported here, it was found to be more accurate than both the alternative cross-validation-based estimator of the bagged algorithm's error and the cross-validation-based estimator of the underlying algorithm's error. This improvement was particularly pronounced for small test sets. This suggests a novel justification for using bagging—more accurate estimation of the generalization error than is possible without bagging.

References

Breiman, L. (1994a). Bagging predictors. Univesity of California, Dept. of Statistics, TR 421.
Breiman, L. (1994b). Heuristics of instability and stabilization in model selection. University of California, Dept. of Statistics, TR 416.
Breiman, L. (1996). Out-of-bag estimation. University of California, Dept. of statistics.
Efron, B. (1979). Computers and the theory of statistics: thinking the unthinkable. SIAM Review, 21: 460.
Google Scholar
Efron, B. & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4: 1–58.
Google Scholar
Tibshirani, R. (1996). Bias, variance and prediction error for classification rules. University of Toronto Statistics Department Technical Report.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5: 241–249.
Google Scholar
Wolpert, D. H. (1996). The bootstrap is inconsistent with probability theory. “Maximum Entropy and Bayesian Methods”, K. Hanson and R. Silver (Eds), pages 69–76.
Wolpert, D. H. (1996). On bias plus variance. Neural Computation, in press.
Wolpert, D. H. & Macready, W. G. (1996). Combining stacking with bagging to improve a learning algorithm. Submitted.

Download references

Author information

Authors and Affiliations

Caelum Research, NASA Ames Research Center, MS 269-1, Moffett Field, CA, 94035
David H. Wolpert
Bios Group, LP, 317 Paseo de Peralta, Santa Fe, NM, 87501
William G. Macready

Authors

David H. Wolpert
View author publications
You can also search for this author in PubMed Google Scholar
William G. Macready
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wolpert, D.H., Macready, W.G. An Efficient Method To Estimate Bagging's Generalization Error. Machine Learning 35, 41–55 (1999). https://doi.org/10.1023/A:1007519102914

Download citation

Issue Date: April 1999
DOI: https://doi.org/10.1023/A:1007519102914

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Efficient Method To Estimate Bagging's Generalization Error

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Efficient Method To Estimate Bagging's Generalization Error

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation