Minimal Penalties for Gaussian Model Selection

Birgé, Lucien; Massart, Pascal

doi:10.1007/s00440-006-0011-8

Minimal Penalties for Gaussian Model Selection

Published: 04 July 2006

Volume 138, pages 33–73, (2007)
Cite this article

Download PDF

Probability Theory and Related Fields Aims and scope Submit manuscript

Minimal Penalties for Gaussian Model Selection

Download PDF

Lucien Birgé¹ &
Pascal Massart²

1041 Accesses
179 Citations
3 Altmetric
Explore all metrics

Abstract

This paper is mainly devoted to a precise analysis of what kind of penalties should be used in order to perform model selection via the minimization of a penalized least-squares type criterion within some general Gaussian framework including the classical ones. As compared to our previous paper on this topic (Birgé and Massart in J. Eur. Math. Soc. 3, 203–268 (2001)), more elaborate forms of the penalties are given which are shown to be, in some sense, optimal. We indeed provide more precise upper bounds for the risk of the penalized estimators and lower bounds for the penalty terms, showing that the use of smaller penalties may lead to disastrous results. These lower bounds may also be used to design a practical strategy that allows to estimate the penalty from the data when the amount of noise is unknown. We provide an illustration of the method for the problem of estimating a piecewise constant signal in Gaussian noise when neither the number, nor the location of the change points are known.

References

Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34, (2006)
Akaike H. (1969). Statistical predictor identification. Ann. Inst. Statist. Math. 22:203–217
Article MathSciNet Google Scholar
Akaike H. (1973). Information theory and an extension of the maximum likelihood principle. In: Petrov P.N., Csaki F. (eds) Proceedings 2nd International Symposium on Information Theory. Akademia Kiado, Budapest, pp. 267–281
Google Scholar
Akaike H. (1974). A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–723
Article MATH MathSciNet Google Scholar
Akaike H. A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. 30, Part A, 9–14 (1978)
Amemiya T. (1985). Advanced Econometrics. Basil Blackwell, Oxford
Google Scholar
Barron A.R., Birgé L., Massart P. (1999). Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113:301–415
Article MATH Google Scholar
Barron A.R., Cover T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inf. Theory 37:1034–1054
Article MathSciNet Google Scholar
Birgé, L.: An alternative point of view on Lepski’s method. In: de Gunst, M.C.M., Klaassen, C.A.J., van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes–Monograph Series, Vol. 36. 113–133 (2001)
Birgé L., Massart P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4:329–375
Article MATH MathSciNet Google Scholar
Birgé L., Massart P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3:203–268
Article MATH Google Scholar
Birgé, L., Massart, P.: A generalized C _p criterion for Gaussian model selection. Technical Report No 647. Laboratoire de Probabilités, Université Paris VI (2001) http://www.proba. jussieu.fr/mathdoc/preprints/index.html#2001
Daniel C., Wood F.S. (1971). Fitting Equations to Data. Wiley, New York
MATH Google Scholar
Draper N.R., Smith H. (1981). Applied Regression Analysis, 2nd edn. Wiley, New York
MATH Google Scholar
Efron B., Hastie R., Johnstone I.M., Tibshirani R. (2004). Least angle regression. Ann. Statist. 32:407–499
Article MATH MathSciNet Google Scholar
Feller W. (1968). An Introduction to Probability Theory and its Applications, Vol I (3rd edn). Wiley, New York
Google Scholar
George E.I., Foster D.P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87:731–747
Article MATH MathSciNet Google Scholar
Gey S., Nédélec E. (2005). Model selection for CART regression trees. IEEE Trans. Inf. Theory 51:658–670
Article Google Scholar
Guyon X., Yao J.F. (1999). On the underfitting and overfitting sets of models chosen by order selection criteria. Jour. Multivar. Anal. 70:221–249
Article MATH MathSciNet Google Scholar
Hannan E.J., Quinn B.G. (1979). The determination of the order of an autoregression. J.R.S.S., B 41:190–195
MATH MathSciNet Google Scholar
Hoeffding W. (1963). Probability inequalities for sums of bounded random variables. J.A.S.A. 58:13–30
MATH MathSciNet Google Scholar
Hurvich K.L., Tsai C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76:297–307
Article MATH MathSciNet Google Scholar
Johnstone, I.: Chi-square oracle inequalities. In: de Gunst, M.C.M., Klaassen, C.A.J. van der Vaart, A.W. (eds.) State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Institute of Mathematical Statistics, Lecture Notes–Monograph Series, Vol. 36. pp. 399–418 (2001)
Kneip A. (1994). Ordered linear smoothers. Ann. Statist. 22:835–866
MATH MathSciNet Google Scholar
Lavielle M., Moulines E. (2000). Least Squares estimation of an unknown number of shifts in a time series. J. Time Series Anal. 21:33–59
Article MATH MathSciNet Google Scholar
Lebarbier E. (2005). Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Proces. 85:717–736
Article Google Scholar
Li K.C. (1987). Asymptotic optimality for C _p, C _L, cross-validation, and generalized cross-validation: Discrete index set. Ann. Statist. 15:958–975
MATH MathSciNet Google Scholar
Loubes, J.-M., Massart, P.: Discussion of “Least angle regression” by Efron, B., Hastie, R., Johnstone, I., Tibshirani, R. Ann. Statist. 32, 460–465 (2004).
Mallows C.L. (1973). Some comments on C _p. Technometrics 15:661–675
Article MATH Google Scholar
Massart P. (1990). The tight constant in the D.K.W. inequality. Ann. Probab. 18:1269–1283
MATH MathSciNet Google Scholar
McQuarrie A.D.R., Tsai C.-L. (1998). Regression and Time Series Model Selection. World Scientific, Singapore
MATH Google Scholar
Mitchell T.J., Beauchamp J.J. (1988). Bayesian variable selection in linear regression. J.A.S.A. 83:1023–1032
MATH MathSciNet Google Scholar
Polyak B.T., Tsybakov A.B. (1990). Asymptotic optimality of the C _p-test for the orthogonal series estimation of regression. Theory Probab. Appl. 35:293–306
Article MATH MathSciNet Google Scholar
Rissanen J. (1978). Modeling by shortest data description. Automatica 14:465–471
Article MATH Google Scholar
Schwarz G. (1978). Estimating the dimension of a model. Ann. Statist. 6:461–464
MATH MathSciNet Google Scholar
Shen X., Ye J. (2002). Adaptive model selection. J.A.S.A. 97:210–221
MATH MathSciNet Google Scholar
Shibata R. (1981). An optimal selection of regression variables. Biometrika 68:45–54
Article MATH MathSciNet Google Scholar
Wallace D.L. (1959). Bounds on normal approximations to Student’s and the chi-square distributions. Ann. Math. Stat. 30:1121–1130
MathSciNet Google Scholar
Whittaker E.T., Watson G.N. (1927). A Course of Modern Analysis. Cambridge University Press, London
MATH Google Scholar
Yang Y. (2005). Can the strenghths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92:937–950
Article MathSciNet Google Scholar
Yao Y.C. (1988). Estimating the number of change points via Schwarz criterion. Stat. Probab. Lett. 6:181–189
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

UMR 7599 “Probabilités et modèles aléatoires”, Laboratoire de Probabilités, boîte 188, Université Paris VI, 4 Place Jussieu, 75252, Paris Cedex 05, France
Lucien Birgé
UMR 8628 “Laboratoire de Mathématiques”, Bât. 425, Université Paris Sud, Campus d’Orsay, 91405, Orsay Cedex, France
Pascal Massart

Authors

Lucien Birgé
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Massart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Massart.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Birgé, L., Massart, P. Minimal Penalties for Gaussian Model Selection. Probab. Theory Relat. Fields 138, 33–73 (2007). https://doi.org/10.1007/s00440-006-0011-8

Download citation

Received: 11 July 2004
Revised: 24 March 2006
Published: 04 July 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s00440-006-0011-8

Keywords

Mathematics Subject Classification (2000)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Minimal Penalties for Gaussian Model Selection

Abstract

Article PDF

Similar content being viewed by others

Linear estimation under the Gauss–Helmert model: geometrical interpretation and general solution

On estimation of $$L_{r}$$ -norms in Gaussian white noise models

Hilbert space methods for reduced-rank Gaussian process regression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Minimal Penalties for Gaussian Model Selection

Abstract

Article PDF

Similar content being viewed by others

Linear estimation under the Gauss–Helmert model: geometrical interpretation and general solution

On estimation of $$L_{r}$$ -norms in Gaussian white noise models

Hilbert space methods for reduced-rank Gaussian process regression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation