- 1.A.R. Barron. Complexity regularization with application to artificial neural networks. In G. Roussas, editor, Nonparametric Functional Estimation and Related Topics, pages 561-576. Kluwer Academic Publishers, 1991.Google ScholarCross Ref
- 2.A.R. Barron and T.M. Cover. Minimum complexity density estimation. IEEE ~ransactions on Information Theory, 37:1034-1054, 1991.Google Scholar
- 3.Wray Buntine. Learning classification trees. Statistics and Computing, 2:63-73, 1992.Google ScholarCross Ref
- 4.David P. Helmbold and Robert E. Schapire. Predicting nearly ms well as the best pruning of a decision tree. Machine Learning, 27(1):51-68, 1997. Google ScholarDigital Library
- 5.Michael Kearns and Yishal Mansour. A fast, bottom-up decision tree pruning algorithm with near-optimal generalization. In Proceedings of the 15th International Conference on Machine Learning. Morgan Kaufmann, 1998. Google ScholarDigital Library
- 6.Michael Kearns, Yisha/Mansour, Andrew Ng, and Dana Ron. An experimental and theoretical comparison of model selection methods. In Proceedings of the Eighth A CM Conference on Computational Learning Theory, pages 21-30. ACM Press, 1995. Google ScholarDigital Library
- 7.G. Lugosi and K. Zeger. Concept learning using complexity regularization. IEEE 2~ansactions on Information Theory, 42:48-54, 1996. Google ScholarDigital Library
- 8.David McAllester. Some pac-bayesian theorems. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 230-234, 1998. Google ScholarDigital Library
- 9.Jonathan J. Oliver and David Hand. On pruning and averaging decision trees. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.Google Scholar
- 10.F.C. Pereira and Y. Singer. An efficient extension to mixture techniques for prediction and decision trees. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, pages 114-121, 1997. Long version to appear in Machine Learning. Google ScholarDigital Library
- 11.J. Ross Quinlan. Cd.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- 12.John Shawe-Taylor and Robert Williamson. A pac analysis of a bayesian estimator. In Proceedings of the Tenth Annual Conference on Computational Learning Theory. ACM Press, 1997. Google ScholarDigital Library
Index Terms
- PAC-Bayesian model averaging
Recommendations
Greedy model averaging
NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing SystemsThis paper considers the problem of combining multiple models to achieve a prediction accuracy not much worse than that of the best single model for least squares regression. It is known that if the models are mis-specified, model averaging is superior ...
PAC-Bayesian Stochastic Model Selection
PAC-Bayesian learning methods combine the informative priors of Bayesian methods with distribution-free PAC guarantees. Stochastic model selection predicts a class label by stochastically sampling a classifier according to a “posterior distribution” on ...
Combining PAC-Bayesian and Generic Chaining Bounds
There exist many different generalization error bounds in statistical learning theory. Each of these bounds contains an improvement over the others for certain situations or algorithms. Our goal is, first, to underline the links between these bounds, ...
Comments