Abstract
Three general methods for obtaining exact bounds on the probability of overfitting are proposed within statistical learning theory: a method of generating and destroying sets, a recurrent method, and a blockwise method. Six particular cases are considered to illustrate the application of these methods. These are the following model sets of predictors: a pair of predictors, a layer of a Boolean cube, an interval of a Boolean cube, a monotonic chain, a unimodal chain, and a unit neighborhood of the best predictor. For the interval and the unimodal chain, the results of numerical experiments are presented that demonstrate the effects of splitting and similarity on the probability of overfitting.
Similar content being viewed by others
References
P. V. Botov, “Exact Bounds for the Probability of Overfitting for Monotone and Unimodal Sets of Predictors,” in Proceedings of the 14th Russian Conference on Mathematical Methods of Pattern Recognition (MAKS Press, Moscow, 2009), pp. 7–10.
K. V. Vorontsov, “Combinatorial Approach to Estimating the Quality of Learning Algorithms,” in Mathematics Problems of Cybernetics Ed. by O.B. Lupanov (Fizmatlit, Moscow, 2004), Vol. 13, pp. 5–36.
D. A. Kochedykov, “Similarity Structures in Sets of Classifiers and Generalization Bounds,” in Proceedings of the 14th Russian Conference on Mathematical Methods of Pattern Recognition (MAKS Press, Moscow, 2009), pp. 45–48.
A. I. Frey, “Exact Bounds for the Probability of Overfitting for Symmetric Sets of Predictors,” in Proceedings of the 14th Russian Conference on Mathematical Methods of Pattern Recognition (MAKS Press, Moscow, 2009), pp. 66–69.
E. T. Bax, “Similar Predictors and VC Error Bounds,” Tech. Rep. CalTech-CS-TR97-14: 6 1997.
S. Boucheron, O. Bousquet, and G. Lugosi, “Theory of Classification: A Survey of Some Recent Advances,” ESIAM: Probab. Stat., No. 9, 323–375 (2005).
R. Herbrich and R. Williamson, “Algorithmic Luckiness,” J. Machine Learning Res., No. 3, 175–212 (2002).
V. Koltchinskii, “Rademacher Penalties and Structural Risk Minimization,” IEEE Trans. Inf. Theory 47(5) 1902–1914 (2001).
V. Koltchinskii and D. Panchenko, “Rademacher Processes and Bounding the Risk of Function Learning,” in High Dimensional Probability, II, Ed. by D.E. Gine and J Wellner (Birkhauser, 1999) pp. 443–457.
J. Langford, “Quantitatively Tight Sample Complexity Bounds,” Ph.D. Thesis (Carnegie Mellon Thesis, 2002).
J. Langford and D. McAllester, “Computable Shell Decomposition Bounds,” in Proceedings of the 13th Annual Conference on Computer Learning Theory (Morgan Kaufmann, San Francisco, CA, 2000), pp. 25–34.
J. Langford and J. Shawe-Taylor, “PAC-Bayes and Margins,” in Advances in Neural Information Processing Systems 15 (MIT Press, 2002), pp. 439–446.
D. McAllester, “PAC-Bayesian Model Averaging,” in COLT: Proceedings of the Workshop on Computational Learning Theory (Morgan Kaufmann, San Francisco, CA, 1999).
P. Philips, “Data-Dependent Analysis of Learning Systems,” Ph.D. Thesis (The Australian National University, Canberra, 2005).
J. Sill, “Monotonicity and Connectedness in Learning Systems,” Ph.D. Thesis (California Inst. Technol., 1998).
V. Vapnik, Estimation of Dependencies Based on Empirical Data (Springer, New York, 1982).
V. Vapnik, Statistical Learning Theory (Wiley, New York, 1998).
V. Vapnik and A. Chervonenkis, “On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities,” Theory Probab. Its. Appl. 16(2), 264–280 (1971).
N. Vayatis and R. Azencott, “Distribution-dependent Vapnik-Chervonenkis Bounds,” Lecture Notes in Computer Science 1572 230–240 (1999).
K. V. Vorontsov, “Combinatorial Probability and the Tightness of Generalization Bounds,” Pattern Recognit. Image Anal. 18(2), 243–259 (2008).
K. V. Vorontsov, “On the Influence of Similarity of Classifiers on the Probability of Overfitting,” in Pattern Recognition and Image Analysis: New Information Technologies (PRIA-9) (Nizhni Novgorod, 2008), Vol. 2, pp. 303–306.
K. V. Vorontsov, “Splitting and Similarity Phenomena in the Sets of Classifiers and Their Effect on the Probability of Overfitting,” Pattern Recognit. Image Anal. 19(3), 412–420 (2009).
K. V. Vorontsov, “Tight Bounds for the Probability of Overfitting,” Dokl. Math. 80(3) 793–796 (2009).
Author information
Authors and Affiliations
Corresponding author
Additional information
Konstantin Vorontsov. Born 1971. Graduated from the Faculty of Applied Mathematics and Control, Moscow Institute of Physics and Technology, in 1994. Received candidate’s degree in 1999 and doctoral degree in 2010. Currently is with the Dorodnicyn Computing Centre, Russian Academy of Sciences. Scientific interests: statistical learning theory, machine learning, data mining, probability theory, and combinatorics. Author of 75 papers.
Rights and permissions
About this article
Cite this article
Vorontsov, K.V. Exact combinatorial bounds on the probability of overfitting for empirical risk minimization. Pattern Recognit. Image Anal. 20, 269–285 (2010). https://doi.org/10.1134/S105466181003003X
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S105466181003003X