MDL convergence speed for Bernoulli sequences

Poland, Jan; Hutter, Marcus

doi:10.1007/s11222-006-6746-3

MDL convergence speed for Bernoulli sequences

Published: June 2006

Volume 16, pages 161–175, (2006)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Jan Poland¹ &
Marcus Hutter²

109 Accesses
10 Citations
Explore all metrics

Abstract

The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barron A. R. and Cover T. M. 1991. Minimum complexity density estimation. IEEE Trans. on Information Theory 37(4): 1034–1054.
Article MathSciNet Google Scholar
Barron A. R., Rissanen J. J., and Yu B. 1998. The minimum description length principle in coding and modeling. IEEE Trans. on Information Theory 44(6): 2743–2760.
Article MathSciNet Google Scholar
Clarke B. S. and Barron A.R. 1990. Information-theoretic asymptotics of Bayes methods. IEEE Trans. on Information Theory 36: 453–471.
Article MathSciNet Google Scholar
Gács P. 1983. On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22: 71–93.
Article MATH MathSciNet Google Scholar
Grünwald P. and Langford J. 2004. Suboptimal behaviour of Bayes and MDL in classification under misspecification. In 17th Annual Conference on Learning Theory (COLT, pp. 331–347.
Hutter M. 2001. Convergence and error bounds for universal prediction of nonbinary sequences. Proc. 12th Eurpean Conference on Machine Learning (ECML-2001), pp. 239–250
Hutter M. 2003a. Convergence and loss bounds for Bayesian sequence prediction. IEEE Trans. on Information Theory 49(8): 2061–2067.
Article MathSciNet Google Scholar
Hutter M. 2003b. Optimality of universal Bayesian prediction for general loss and alphabet. Journal of Machine Learning Research 4: 971–1000.
Article MathSciNet Google Scholar
Hutter. M. 2003c. Sequence prediction based on monotone complexity. In Proc. 16th Annual Conference on Learning Theory (COLT-2003), Lecture Notes in Artificial Intelligence, Berlin, Springer, pp. 506–521.
Hutter M. 2005. Sequential predictions based on algorithmic complexity. Journal of Computer and System Sciences 72(1): 95–117.
Article MathSciNet Google Scholar
Levin L. A. 1973. On the notion of a random sequence. Soviet Math. Dokl. 14(5): 1413–1416.
MATH Google Scholar
Li J. Q. 1999. Estimation of Mixture Models. PhD thesis, Dept. of Statistics. Yale University.
Li M. and Vit’anyi P. M. B. 1997. An introduction to Kolmogorov complexity and its applications. Springer, 2nd edition.
Poland J. and Hutter M. 2004a. Convergence of discrete MDL for sequential prediction. In 17th Annual Conference on Learning Theory (COLT), pp. 300–314.
Poland J. and Hutter M. 2004b. On the convergence speed of MDL predictions for Bernoulli sequences. In International Conference on Algorithmic Learning Theory (ALT), pp. 294– 308.
Poland J. and Hutter M. 2005. Strong asymptotic assertions for discrete MDL in regression and classification. In Benelearn 2005 (Ann. Machine Learning Conf. of Belgium and the Netherlands)
Rissanen J. J. 1996. Fisher Information and Stochastic Complexity. IEEE Trans. on Information Theory 42(1): 40– 47.
Article MATH MathSciNet Google Scholar
Rissanen J. J. 1999. Hypothesis selection and testing by the MDL principle. The Computer Journal 42(4): 260–269.
Article MATH MathSciNet Google Scholar
Solomonoff R. J. 1978. Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Information Theory IT-24: 422–432.
Article MathSciNet Google Scholar
Vitányi P. M. and Li M. 2000. Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans. on Information Theory 46(2): 446–464.
Article Google Scholar
Vovk V. G. 1997. Learning about the parameter of the Bernoulli model. Journal of Computer and System Sciences 55: 96–104.
Article MATH MathSciNet Google Scholar
Zhang T. 2004. On the convergence of MDL density estimation. In Proc. 17th Annual Conference on Learning Theory (COLT), pp. 315–330,
Zvonkin A. K. and Levin L. A. 1970. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys 25(6): 83–124.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Japan
Jan Poland
IDSIA, Galleria 2, CH-6928, Manno (Lugano), Switzerland
Marcus Hutter

Authors

Jan Poland
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Poland.

Additional information

A shorter version of this paper (Poland and Hutter, 2004b) appeared in ALT 2004.

This work was supported by SNF grant 2100-67712.02

This research was done while the first author was with IDSIA, Switzerland.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poland, J., Hutter, M. MDL convergence speed for Bernoulli sequences. Stat Comput 16, 161–175 (2006). https://doi.org/10.1007/s11222-006-6746-3

Download citation

Received: 01 August 2004
Accepted: 01 December 2005
Issue Date: June 2006
DOI: https://doi.org/10.1007/s11222-006-6746-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MDL convergence speed for Bernoulli sequences

Abstract

Access this article

Similar content being viewed by others

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Compound Poisson Approximations to Sums of Extrema of Bernoulli Variables

Rigorous Upper Bound for the Discrete Bak–Sneppen Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MDL convergence speed for Bernoulli sequences

Abstract

Access this article

Similar content being viewed by others

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Compound Poisson Approximations to Sums of Extrema of Bernoulli Variables

Rigorous Upper Bound for the Discrete Bak–Sneppen Model

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation