Abstract
We consider in this paper simultaneous Bayesian variable selection and estimation for linear regression models with global-local shrinkage priors on the regression coefficients. We propose a variable selection procedure that selects a variable if the ratio of the posterior mean of its regression coefficient to the corresponding ordinary least square estimate is greater than a half. The regression coefficient is estimated by the posterior mean or zero depending on whether the corresponding variable is selected or not. Under the assumption of orthogonal designs, we prove that if the local parameters have polynomial-tailed priors, the proposed method enjoys the oracle property in the sense that it can achieve variable selection consistency and optimal estimation rate at the same time. However, if, instead, an exponential-tailed prior is used for the local parameters, the proposed method has variable selection consistency but not the optimal estimation rate. We show via simulation and real data examples that our proposed selection mechanism works for nonorthogonal designs as well.
Similar content being viewed by others
References
Armagan, A., Clyde, M. and Dunson, D.B. (2011). Generalized beta mixtures of Gaussians, p. 523–531.
Armagan, A., Dunson, D.B. and Lee, J. (2013). Generalized double Pareto shrinkage. Stat. Sin. 23, 1, 119–143.
Bayarri, M.J., Berger, J.O., Forte, A. and Garca-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. Ann. Statist. 40, 3, 1550–1577. https://doi.org/10.1214/12-AOS1013.
Bhadra, A., Datta, J., Polson, N.G. and Willard, B. (2017). The horseshoe + estimator of ultra-sparse signals. Bayesian Anal. https://doi.org/10.1214/16-BA1028, advance publication.
Bhattacharya, A., Pati, D., Pillai, N.S. and Dunson, D.B. (2015). Dirichlet–Laplace priors for optimal shrinkage. J. Am. Stat. Assoc. 110, 512, 1479–1490.
Bingham, N.H., Goldie, C.M. and Teugels, J.L. (1987). Regular variation (Encyclopedia of Mathematics and its Applications). Cambridge University Press, Cambridge.
Carvalho, C.M., Polson, N.G. and Scott, J.G. (2010). The horseshoe estimator for sparse signals. Biometrika 97, 2, 465–480.
Castillo, I., Schmidt-Hieber, J. and Van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Stat. 43, 5, 1986–2018.
Clyde, M., Parmigiani, G. and Vidakovic, B. (1998). Multiple shrinkage and subset selection in wavelets. Biometrika 85, 2, 391–401.
Datta, J. and Ghosh, J.K. (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Anal. 8, 1, 111–132.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Stat. 32, 2, 407–499.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 456, 1348–1360.
George, E.I. and McCulloch, R.E. (1993). Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 423, 881–889.
Geweke, J. (1996). Variable selection and model comparison in regression. Bayesian statistics 5, 609–620.
Ghosh, P., Tang, X., Ghosh, M. and Chakrabarti, A. (2016). Asymptotic properties of Bayes risk of a general class of shrinkage priors in multiple hypothesis testing under sparsity. Bayesian Anal. 11, 3, 753–796.
Griffin, J.E. and Brown, P.J. (2005). Alternative prior distributions for variable selection with very many more variables than observations. University of Warwick, Tech. rep.
Griffin, J.E. and Brown, P.J. (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Anal. 5, 1, 171–188.
Griffin, J.E. and Brown, P.J. (2011). Bayesian hyper-lassos with non-convex penalization. Aust. N. Z. J. Stat. 53, 4, 423–442.
Hahn, P.R. and Carvalho, C.M. (2015). Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective. J. Am. Stat. Assoc. 110, 509, 435–448.
Hoerl, A.E. and Kennard, R.W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 1, 55–67.
Ishwaran, H. and Rao, J.S. (2005). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33, 2, 730–773.
Johnson, V.E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Am. Stat. Assoc. 107, 498, 649–660.
Li, Q. and Lin, N. (2010). The Bayesian elastic net. Bayesian Anal. 5, 1, 151–170.
Liang, F., Song, Q. and Yu, K. (2013). Bayesian subset modeling for high-dimensional generalized linear models. J. Am. Stat. Assoc. 108, 502, 589–606.
Maruyama, Y., George, E.I., et al. (2011). Fully Bayes factors with a generalized g-prior. Ann. Stat. 39, 5, 2740–2765.
Mitchell, T.J. and Beauchamp, J.J. (1988). Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 404, 1023–1032.
Narisetty, N.N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Stat. 42, 2, 789–817.
Park, T. and Casella, G. (2008). The Bayesian lasso. J. Am. Stat. Assoc. 103, 482, 681–686.
van der Pas, S., Szabó, B. and van der Vaart, Aad. (2017). Adaptive posterior contraction rates for the horseshoe. arXiv:170203698.
Polson, N.G. and Scott, J.G. (2010). Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Statistics 9, 501–538.
Polson, N.G. and Scott, J.G. (2012). On the half–Cauchy prior for a global scale parameter. Bayesian Anal. 7, 4, 887–902.
Ročková, V. and George, E.I. (2016). The spike-and-slab lasso. Journal of the American Statistical Association (just-accepted). https://doi.org/10.1080/01621459.2016.1260469.
Shuster, J. (1968). On the inverse Gaussian distribution function. J. Am. Stat. Assoc. 63, 324, 1514–1516.
Stamey, T.A., Yang, N., Hay, A.R., McNeal, J.E., Freiha, F.S. and Redwine, E. (1987). Prostate-specific antigen as a serum marker for adenocarcinoma of the prostate. N. Engl. J. Med. 317, 15, 909–916.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288.
Tomassi, D., Milone, D. and Nelson, J.D. (2015). Wavelet shrinkage using adaptive structured sparsity constraints. Signal Process. 106, 73–87.
Wang, H. (2012). Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7, 4, 867–886.
Xu, X. and Ghosh, M. (2015). Bayesian variable selection and estimation for group lasso. Bayesian Anal. 10, 4, 909–936.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 476, 1418–1429.
Acknowledgements
We would like to thank the editor, the associate editor, and two anonymous referees for their thoughtful comments and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tang, X., Xu, X., Ghosh, M. et al. Bayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors. Sankhya A 80, 215–246 (2018). https://doi.org/10.1007/s13171-017-0118-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-017-0118-2