Skip to main content
Log in

Exact combinatorial bounds on the probability of overfitting for empirical risk minimization

  • Mathematical Methods in Pattern Recognition
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

Three general methods for obtaining exact bounds on the probability of overfitting are proposed within statistical learning theory: a method of generating and destroying sets, a recurrent method, and a blockwise method. Six particular cases are considered to illustrate the application of these methods. These are the following model sets of predictors: a pair of predictors, a layer of a Boolean cube, an interval of a Boolean cube, a monotonic chain, a unimodal chain, and a unit neighborhood of the best predictor. For the interval and the unimodal chain, the results of numerical experiments are presented that demonstrate the effects of splitting and similarity on the probability of overfitting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P. V. Botov, “Exact Bounds for the Probability of Overfitting for Monotone and Unimodal Sets of Predictors,” in Proceedings of the 14th Russian Conference on Mathematical Methods of Pattern Recognition (MAKS Press, Moscow, 2009), pp. 7–10.

    Google Scholar 

  2. K. V. Vorontsov, “Combinatorial Approach to Estimating the Quality of Learning Algorithms,” in Mathematics Problems of Cybernetics Ed. by O.B. Lupanov (Fizmatlit, Moscow, 2004), Vol. 13, pp. 5–36.

    Google Scholar 

  3. D. A. Kochedykov, “Similarity Structures in Sets of Classifiers and Generalization Bounds,” in Proceedings of the 14th Russian Conference on Mathematical Methods of Pattern Recognition (MAKS Press, Moscow, 2009), pp. 45–48.

    Google Scholar 

  4. A. I. Frey, “Exact Bounds for the Probability of Overfitting for Symmetric Sets of Predictors,” in Proceedings of the 14th Russian Conference on Mathematical Methods of Pattern Recognition (MAKS Press, Moscow, 2009), pp. 66–69.

    Google Scholar 

  5. E. T. Bax, “Similar Predictors and VC Error Bounds,” Tech. Rep. CalTech-CS-TR97-14: 6 1997.

  6. S. Boucheron, O. Bousquet, and G. Lugosi, “Theory of Classification: A Survey of Some Recent Advances,” ESIAM: Probab. Stat., No. 9, 323–375 (2005).

  7. R. Herbrich and R. Williamson, “Algorithmic Luckiness,” J. Machine Learning Res., No. 3, 175–212 (2002).

  8. V. Koltchinskii, “Rademacher Penalties and Structural Risk Minimization,” IEEE Trans. Inf. Theory 47(5) 1902–1914 (2001).

    Article  MATH  MathSciNet  Google Scholar 

  9. V. Koltchinskii and D. Panchenko, “Rademacher Processes and Bounding the Risk of Function Learning,” in High Dimensional Probability, II, Ed. by D.E. Gine and J Wellner (Birkhauser, 1999) pp. 443–457.

  10. J. Langford, “Quantitatively Tight Sample Complexity Bounds,” Ph.D. Thesis (Carnegie Mellon Thesis, 2002).

  11. J. Langford and D. McAllester, “Computable Shell Decomposition Bounds,” in Proceedings of the 13th Annual Conference on Computer Learning Theory (Morgan Kaufmann, San Francisco, CA, 2000), pp. 25–34.

    Google Scholar 

  12. J. Langford and J. Shawe-Taylor, “PAC-Bayes and Margins,” in Advances in Neural Information Processing Systems 15 (MIT Press, 2002), pp. 439–446.

  13. D. McAllester, “PAC-Bayesian Model Averaging,” in COLT: Proceedings of the Workshop on Computational Learning Theory (Morgan Kaufmann, San Francisco, CA, 1999).

    Google Scholar 

  14. P. Philips, “Data-Dependent Analysis of Learning Systems,” Ph.D. Thesis (The Australian National University, Canberra, 2005).

    Google Scholar 

  15. J. Sill, “Monotonicity and Connectedness in Learning Systems,” Ph.D. Thesis (California Inst. Technol., 1998).

  16. V. Vapnik, Estimation of Dependencies Based on Empirical Data (Springer, New York, 1982).

    Google Scholar 

  17. V. Vapnik, Statistical Learning Theory (Wiley, New York, 1998).

    MATH  Google Scholar 

  18. V. Vapnik and A. Chervonenkis, “On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities,” Theory Probab. Its. Appl. 16(2), 264–280 (1971).

    Article  MATH  MathSciNet  Google Scholar 

  19. N. Vayatis and R. Azencott, “Distribution-dependent Vapnik-Chervonenkis Bounds,” Lecture Notes in Computer Science 1572 230–240 (1999).

    Article  MathSciNet  Google Scholar 

  20. K. V. Vorontsov, “Combinatorial Probability and the Tightness of Generalization Bounds,” Pattern Recognit. Image Anal. 18(2), 243–259 (2008).

    Article  Google Scholar 

  21. K. V. Vorontsov, “On the Influence of Similarity of Classifiers on the Probability of Overfitting,” in Pattern Recognition and Image Analysis: New Information Technologies (PRIA-9) (Nizhni Novgorod, 2008), Vol. 2, pp. 303–306.

    Google Scholar 

  22. K. V. Vorontsov, “Splitting and Similarity Phenomena in the Sets of Classifiers and Their Effect on the Probability of Overfitting,” Pattern Recognit. Image Anal. 19(3), 412–420 (2009).

    Article  Google Scholar 

  23. K. V. Vorontsov, “Tight Bounds for the Probability of Overfitting,” Dokl. Math. 80(3) 793–796 (2009).

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. V. Vorontsov.

Additional information

Konstantin Vorontsov. Born 1971. Graduated from the Faculty of Applied Mathematics and Control, Moscow Institute of Physics and Technology, in 1994. Received candidate’s degree in 1999 and doctoral degree in 2010. Currently is with the Dorodnicyn Computing Centre, Russian Academy of Sciences. Scientific interests: statistical learning theory, machine learning, data mining, probability theory, and combinatorics. Author of 75 papers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vorontsov, K.V. Exact combinatorial bounds on the probability of overfitting for empirical risk minimization. Pattern Recognit. Image Anal. 20, 269–285 (2010). https://doi.org/10.1134/S105466181003003X

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S105466181003003X

Key words

Navigation