Abstract
On the one hand, kernel density estimation has become a common tool for empirical studies in any research area. This goes hand in hand with the fact that this kind of estimator is now provided by many software packages. On the other hand, since about three decades the discussion on bandwidth selection has been going on. Although a good part of the discussion is about nonparametric regression, this parameter choice is by no means less problematic for density estimation. This becomes obvious when reading empirical studies in which practitioners have made use of kernel densities. New contributions typically provide simulations only to show that the own selector outperforms some of the existing methods. We review existing methods and compare them on a set of designs that exhibit few bumps and exponentially falling tails. We concentrate on small and moderate sample sizes because for large ones the differences between consistent methods are often negligible, at least for practitioners. As a byproduct we find that a mixture of simple plug-in and cross-validation methods produces bandwidths with a quite stable performance.
Similar content being viewed by others
Notes
We are grateful to the comments and suggestions of one of the anonymous referees.
References
Ahmad, I.A., Ran, I.S.: Data based bandwidth selection in kernel density estimation with paramteric start via kernel contrasts. J. Nonparametr. Stat. 16, 841–877 (2004)
Bean, S.J., Tsokos, C.P.: Developments in nonparametric density estimation. Int. Stat. Rev. 48, 267–287 (1980)
Bowman, A.: An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71, 353–360 (1984)
Cao, R.: Bootstrapping the mean integrated squared error. J. Multivar. Anal. 45, 137–160 (1993)
Cao, R., Cuevas, A., Gonzlez Manteiga, W.: A comparative study of several smoothing methods in density estimation. Comput. Stat. Data Anal. 17, 153–176 (1994)
Chacon, J.E., Montanero, J., Nogales, A.G.: Bootstrap bandwidth selection using an h-dependent pilot bandwidth. Scand. J. Stat. 35, 139–157 (2008)
Chaudhuri, P., Marron, J.S.: SiZer for exploration of structures in curves. J. Am. Stat. Assoc. 94, 807–823 (1999)
Chiu, S.T.: Some stabilized bandwidth selectors for nonparametric regression. Ann. Stat. 19, 1528–1546 (1991a)
Chiu, S.T.: Bandwidth selection for kernel density estimation. Ann. Stat. 19, 1883–1905 (1991b)
Chiu, S.T.: An automatic bandwidth selector for kernel density estimation. Biometrika 79, 771–782 (1992)
Chiu, S.T.: A comparative review of bandwidth selection for kernel density estimation. Stat. Sin. 6, 129–145 (1996)
Devroye, L.: The double kernel method in density estimation. Annales de l’Institut Henri Poincaré 25, 533–580 (1989)
Devroye, L.: Universal smoothing factor selection in density estimation: theory and practice. Test 6, 223–320 (1997)
Devroye, L., Gyorfi, L.: Nonparametric Density Estimation: The \(L_1\) View. Wiley, New York (1985)
Devroye, L., Lugosi, G.: A universal acceptable smoothing factor for kernel density estimation. Ann. Stat. 24, 2499–2512 (1996)
Duin, R.P.W.: On the choice of smoothing parameters of Parzen estimators of probability density functions. IEEE Trans. Comput. 25, 1175–1179 (1976)
Faraway, J.J., Jhun, M.: Bootstrap choice of bandwidth for density estimation. J. Am. Stat. Assoc. 85, 1119–1122 (1990)
Feluch, W., Koronacki, J.: A note on modified cross-validation in density estimation. Comput. Stat. Data Anal. 13, 143–151 (1992)
Fryer, M.J.: A review of some non-parametric methods of density estimation. J. Appl. Math. 20(3), 335–354 (1977)
Godtliebsen, F., Marron, J.S., Chaudhuri, P.: Significance in scale space for bivariate density estimation. J. Comput. Graph. Stat. 11, 1–21 (2002)
Grund, B., Polzehl, J.: Bias corrected bootstrap bandwidth selection. J. Nonparametr. Stat. 8, 97–126 (1997)
Habbema, J.D.F., Hermans, J., van den Broek, K.: A stepwise discrimination analysis program using density estimation, In: Bruckman, G. (Ed.) COMPSTAT ’74. Proceedings in Computational Statistics, pp. 101–110. Physica, Vienna (1974)
Hall, P.: Using the bootstrap to estimate mean square error and select smoothing parameters in nonparametric problems. J. Multivar. Anal. 32, 177–203 (1990)
Hall, P., Johnstone, I.: Empirical functionals and efficient smoothing parameter selection. J. R. Stat. Soc. Ser. B 54, 475–530 (1992)
Hall, P., Marron, J.S.: Extent to which least-squares cross-validation minimises integrated square error in nonparametric density estimation. Probab. Theory Relat. Fields 74, 567–581 (1987a)
Hall, P., Marron, J.S.: Estimation of integrated squared density derivatives. Stat. Probab. Lett. 6, 109–115 (1987b)
Hall, P., Marron, J.S.: Lower bounds for bandwidth selection in density estimation. Probab. Theory Relat. Fields 90, 149–173 (1991)
Hall, P., Marron, J.S., Park, B.U.: Smoothed cross-validation. Probab. Theory Relat. Fields 92, 1–20 (1992)
Hall, P., Sheater, S.J., Jones, M.C., Marron, J.S.: On optimal databased bandwidth selection in kernel density estimation. Biometrika 78, 263–269 (1991)
Hanning, J., Marron, J.S.: Advanced distribution theory for SiZer. J. Am. Stat. Assoc 101, 484–499 (2006)
Hardle, W., Muller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer Series in Statistics, Berlin (2004)
Hardle, W., Vieu, P.: Kernel regression smoothing of time series. J. Time Ser. Anal. 13, 209–232 (1992)
Hart, J.D., Yi, S.: One-sided cross validation. J. Am. Stat. Assoc. 93, 620–631 (1998)
Jones, M.C.: On some kernel density estimation bandwidth selectors related to the double kernel method. Sankhya Ser. A 60, 249–264 (1998)
Jones, M.C., Marron, J.S., Park, B.U.: A simple root \(n\) bandwidth selector. Ann. Stat. 19, 1919–1932 (1991)
Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91, 401–407 (1996a)
Jones, M.C., Marron, J.S., Sheather, S.J.: Progress in data-based bandwidth selection for kernel density estimation. Comput. Stat. 11, 337–381 (1996b)
Jones, M.C., Sheather, S.J.: Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Stat. Probab. Lett. 11, 511–514 (1991)
Kim, W.C., Park, B.U., Marron, J.S.: Asymptotically best bandwidth selectors in kernel density estimation. Stat. Probab. Lett. 19, 119–127 (1994)
Loader, C.R.: Bandwidth selection: classical or plug-in? Ann. Stat. 27(2), 415–438 (1999)
Mammen, E., Martínez-Miranda, M.D., Nielsen, J.P., Sperlich, S.: Do-validation for kernel density estimation. J. Am. Stat. Assoc. 106, 651–660 (2011)
Marron, J.S.: Convergence properties of an empirical error criterion for multivariate density estimation. J. Multivar. Anal. 19, 1–13 (1986)
Marron, J.S.: Automatic smoothing parameter selection: a survey. Empir. Econ. 13, 187–208 (1988a)
Marron, J.S.: Partitioned cross-validation. Econ. Rev. 6, 271–283 (1988b)
Marron, J.S.: Bootstrap bandwidth selection. In: LePage, R., Billard, L. (eds.) Exploring the Limits of Bootstrap, pp. 249–262. Wiley, New York (1992)
Marron, J.S.: Visual understanding of higher order kernels. J. Comput. Graph. Stat. 3, 447–458 (1994)
Marron, J.S., Nolan, D.: Canonical kernels for density estimation. Stat. Probab. Lett. 7, 195–199 (1988)
Marron, J.S., Wand, M.P.: Exact mean integrated squared errors. Ann. Stat. 20, 712–736 (1992)
Martinez-Miranda, M.D., Nielsen, J., Sperlich, S.: One sided cross validation in density estimation. In: Gregoriou, G.N. (ed.) Operational Risk Towards Basel III: Best Practices and Issues in Modeling, Management and Regulation. Wiley, Hoboken (2009)
Park, B.U., Marron, J.S.: Comparison of data-driven bandwidth selectors. J. Am. Stat. Assoc. 85, 66–72 (1990)
Park, B.U., Turlach, B.A.: Practical performance of several data driven bandwidth selectors, CORE Discussion Paper 9205 (1992)
Rigollet, P., Tsybakov, A.: Linear and convex aggregation of density estimators. Math. Methods Stat. 16, 260–280 (2007)
Rudemo, M.: Empirical choice of histograms and kernel density estimators. Scand. J. Stat. 9, 65–78 (1982)
Ruppert, D., Cline, B.H.: Bias Reduction in kernel density estimation by smoothed empirical transformations. Ann. Stat. 22, 185–210 (1994)
Samarov, A., Tsybakov, A.: Aggregation of density estimators and dimension reduction. In: Nair, V. (ed.) Advances in Statistical Modeling and Inference: essays in honor of Kjell A. Doksum, pp. 233–251 (2007)
Savchuk, O.J., Hart, J.D., Sheather, S.J.: Indirect cross-validation for density estimation. J. Am. Stat. Assoc. 105, 415–423 (2010)
Silverman, B.W.: Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability, vol. 26. Chapman and Hall, London (1986)
Scott, D.W., Terrell, G.R.: Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 82, 1131–1146 (1987)
Sheather, S.J.: Density estimation. Stat. Sci. 19, 588–597 (2004)
Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B 53, 683–690 (1991)
Stone, C.J.: An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat. 12, 1285–1297 (1984)
Stute, W.: Modified cross validation in density estimation. J. Stat. Plan. Inference 30, 293–305 (1992)
Tartar, M.E., Kronmal, R.A.: An introduction to the implementation and theory of nonparametric density estimation. Am. Stat. 30, 105–112 (1976)
Taylor, C.C.: Bootstrap choice of the smoothing parameter in kernel density estimation. Biometrika 76, 705–712 (1989)
Turlach, B.A.: Bandwidth selection in kernel density estimation: a review. Working Paper (1994)
Wand, M.P., Jones, M.C.: Kernel smoothing. Monographs on Statistics and Applied Probability, vol. 60. Chapman and Hall, London (1995)
Wand, M.P., Marron, J.S., Ruppert, D.: Transformations in density estimation. J. Am. Stat. Assoc. 86, 343–353 (1991)
Wegkamp, M.H.: Quasi universal bandwidth selection for kernel density estimators. Can. J. Stat. 27, 409–420 (1999)
Wegman, E.J.: Nonparametric probability density estimation: I. A summary of available methods. Technometrics 14, 533–546 (1972)
Wertz, W., Schneider, B.: Statistical density estimation: a bibliography. Int. Stat. Rev. 47, 155–175 (1979)
Yang, Y.: Mixing strategies for density estimation. Ann. Stat. 28, 75–87 (2000)
Yang, L., Marron, S.: Iterated transformation-kernel density estimation. J. Am. Stat. Assoc. 94, 580–589 (1999)
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors thank Maria Dolores Martinez-Miranda, Lijian Yang, two anonymous referees and Göran Kauermann for helpful discussion and comments.
Rights and permissions
About this article
Cite this article
Heidenreich, NB., Schindler, A. & Sperlich, S. Bandwidth selection for kernel density estimation: a review of fully automatic selectors. AStA Adv Stat Anal 97, 403–433 (2013). https://doi.org/10.1007/s10182-013-0216-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-013-0216-y