Abstract
The horseshoe\(+\) estimator for Gaussian linear regression models is a novel extension of the horseshoe estimator that enjoys many favourable theoretical properties. We develop the first efficient Gibbs sampling algorithm for the horseshoe\(+\) estimator for linear and logistic regression models. Importantly, our sampling algorithm incorporates robust data models that naturally handle non-Gaussian data and are less sensitive to outliers. The resulting software implementation provides a powerful, flexible and robust tool for building prediction and classification models from potentially high-dimensional data and represents the state-of-the-art in Bayesian machine learning techniques.
Notes
References
Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. R. Stat. Soc. (Ser. B) 36(1), 99–102 (1974)
Bhadra, A., Datta, J., Polson, N.G., Willard, B.: The horseshoe+ estimator of ultra-sparse signals (2015). arXiv:1502.00560
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration images. IEEE Trans. Patttern Anal. Mach. 6, 721–741 (1984)
Makalic, E., Schmidt, D.F.: A simple sampler for the horseshoe estimator. IEEE Signal Process. Lett. 23(1), 179–182 (2016)
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480 (2010)
Polson, N.G., Scott, J.G.: Shrink globally, act locally: sparse Bayesian regularization and prediction. In: Bayesian Statistics. vol. 9 (2010)
Wand, M.P., Ormerod, J.T., Padoan, S.A., Fruhwirth, R.: Mean field variational Bayes for elaborate distributions. Bayesian Anal. 6(4), 847–900 (2011)
Lindley, D.V., Smith, A.F.M.: Bayes estimates for the linear model. J. R. Stat. Soc. (Ser. B) 34(1), 1–41 (1972)
Rue, H.: Fast sampling of Gaussian Markov random fields. J. R. Stat. Soc. (Ser. B) 63(2), 325–338 (2001)
Cong, Y., Chen, B., Zhou, M.: Fast simulation of hyperplane-truncated multivariate normal distributions (2016)
Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Dirichlet-Laplace priors for optimal shrinkage. J. Am. Stat. Assoc. 110, 1479–1490 (2015)
Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Am. Stat. Assoc. 108(504), 1339–1349 (2013)
Windle, J., Polson, N.G., Scott, J.G.: Sampling Pólya-Gamma random variates: alternate and approximate techniques (2014)
Ekholm, A., Palmgren, J.: Correction for misclassification using doubly sampled data. J. Official Stat. 3(4), 419–429 (1987)
Copas, J.B.: Binary regression models for contaminated data. J. R. Stat. Soc. Ser. B (Methodol.) 50(2), 225–265 (1988)
Carroll, R.J., Pederson, S.: On robustness in the logistic regression model. J. R. Stat. Soc. Ser. B (Methodol.) 55(3), 693–706 (1993)
Lichman, M.: UCI machine learning repository (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Inverse Gamma Distribution
The inverse gamma probability density function is given by
with shape parameter \((\alpha >0)\) and scale parameter \((\beta > 0)\). The first two moments are
where the mean and variance only exist for \((\alpha >1)\) and \((\alpha >2)\) respectively.
1.2 A.2 Inverse Gaussian Distribution
The inverse Gaussian probability density function is given by
for \((x > 0)\), where \((\mu > 0)\) is the mean and \((\lambda > 0)\) is the shape parameter. The first two moments are
1.3 A.3 Student-t Distribution
The Student-t distribution probability density function is given by
where \((x \in \mathbb {R})\), \((\mu \in \mathbb {R})\), \((\sigma ^2 > 0)\) and the degrees of freedom \((\nu > 0)\). The first two moments are
where the mean and variance only exist for \((\nu > 1)\) and \((\nu > 2)\) respectively.
1.4 A.4 Laplace Distribution
The probability density function of the Laplace distribution is
where \((x \in \mathbb {R})\), \((\mu \in \mathbb {R})\) is the location parameter and \((b > 0)\) is the scale parameter. The first two moments are
1.5 A.5 Pólya-Gamma Distribution
A random variable x follows a Pólya-gamma distribution [12], \(x \sim \mathrm{PG}(b, c)\), if
where \(g_k \sim \mathrm{Ga}(b,1)\) are independent gamma random variables, \((b > 0)\) and \((c \in \mathbb {R})\) are the parameters and \(\mathop {=}\limits ^{D}\) denotes equality in distribution. The first two moments of x are
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Makalic, E., Schmidt, D.F., Hopper, J.L. (2016). Bayesian Robust Regression with the Horseshoe+ Estimator. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-50127-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50126-0
Online ISBN: 978-3-319-50127-7
eBook Packages: Computer ScienceComputer Science (R0)