Summary
In this paper, the relationship between code length and the selection of the number of bins for a histogram density is considered for a sequence of iid observations on [0,1]. First, we use a shortest code length criterion to select the number of bins for a histogram. A uniform almost sure asymptotic expansion for the code length is given and it is used to prove the asymptotic optimality of the selection rule. In addition, the selection rule is consistent if the true density is uniform [0,1]. Secondly, we deal with the problem: what is the “best” achievable average code length with underlying density functionf? Minimax lower bounds are derived for the average code length over certain smooth classes of underlying densitiesf. For the smooth class with bounded first derivatives, the rate in the lower bound is shown to be achieved by a code based on a sequence of histograms whose number of bins is changed predictively. Moreover, this best code can be modified to ensure that the almost sure version of the code length has asymptotically the same behavior as its expected value, i.e., the average code length.
Article PDF
Similar content being viewed by others
References
Assouad, P.: Deux remarques sur l'estimation. Compt. Rendus de l'Academie Sci. Paris296, 1021–1024 (1983)
Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Trans. Inf. Theory IT-37, 1034–1054 (1991)
Birgé, L.: Approximations dans les espaces metriques et theorie de l'estimation. Z. Wahrscheinlichkeitstheor. Verw. Geb.65, 181–237 (1983)
Breiman, L.A., Freedman, D.F.: How many variables should be entered in a regression equation? J. Am. Stat. Assoc.78, 131–136 (1983)
Bretagnolle, J., Huber, C.: Estimation des densities: risque minimax. Z. Wahrscheinlichkeitsther. Verw. Geb.47, 119–137 (1979)
Clarke, B.S.: Asymptotic cumulative risk and bayes risk under entropy, with applications. PhD thesis, University of Illinois at Urbana-Champaign, 1989
Davisson, L.D.: Minimax noiseless universal coding for Markov sources. IEEE Trans. Inf. Theory29, 211–215 (1983)
Dawid, A.P.: Present position and potential developments: some personal views, statistical theory, the prequential approach. J. R. Stat. Soc. Ser.B 147, 278–292 (1984)
Dawid, A.P.: Prequential data analysis. In: Ghosh, M., Pathak, P.K. (eds.) Issues and controversies in statistical inference. Essays in Honor of D. Basu's 65th birthday. (to appear)
Devroye, L.: A course in density estimation. Progress in probability and statistics, vol. 14. Basel: Birkhauser 1987
Donoho, D., Lui, R., MacGibbon, B.: Minimax risk over hyperrectangles and implications. Ann. Stat.18, 1416–1437 (1990)
Freedman, D.A., Diaconis, P.: On the histogram as a density estimator: L2 theory. Z. Wahrscheinlichkeitstheor. Verw. Geb.57, 453–475 (1981)
Hall, P., Hannan, E.J.: On stochastic complexity and nonparametric density estimation. Biometrika74, 705–714 (1988)
Hamming, R.W.: Coding and information theory. Englewood Cliffs, N.J.: Prentice-Hall 1986
Hannan, E.J., Cameron, M.A., Speed, T.P.: Estimating spectra and prediction variance (manuscript, 1991)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat.11, 416–431 (1983)
Rissanen, J.: Stochastic complexity and modeling. Ann. Stat.14, 1080–1100 (1986)
Rissanen, J.: Stochastic complexity in statistical inquiry. Singapore: World Scientific 1989
Rissanen, J., Speed, T.P., Yu, B.: Density estimation by stochastic complexity. IEEE Trans. Inf. Theory (to appear 1992)
Speed, T.P., Yu, B.: Model selection and prediction: Normal regression. Ann. Inst. Stat. Math. (submitted for publication)
Stone, C.J.: Optimal uniform rate of convergence for nonparametric estimators of a density function or its derivatives. Recent advances in statistics, pp. 393–406. New York: Academic Press 1983
Stone, C.J.: An asymptotic optimal histogram selection rule. Le Cam, L.M., Ohshen, R.A. (eds.) Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, vol.II, pp. 513–520. Belmont, CA: Wadsworth 1985
Author information
Authors and Affiliations
Additional information
Research supported in part by NSF grant DMS-8701426
Research supported in part by NSF grant DMS-8802378