Skip to main content

Bayesian Robust Regression with the Horseshoe+ Estimator

  • Conference paper
  • First Online:
AI 2016: Advances in Artificial Intelligence (AI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9992))

Included in the following conference series:

  • 3145 Accesses

Abstract

The horseshoe\(+\) estimator for Gaussian linear regression models is a novel extension of the horseshoe estimator that enjoys many favourable theoretical properties. We develop the first efficient Gibbs sampling algorithm for the horseshoe\(+\) estimator for linear and logistic regression models. Importantly, our sampling algorithm incorporates robust data models that naturally handle non-Gaussian data and are less sensitive to outliers. The resulting software implementation provides a powerful, flexible and robust tool for building prediction and classification models from potentially high-dimensional data and represents the state-of-the-art in Bayesian machine learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    www.emakalic.org/blog and www.dschmidt.org.

References

  1. Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. R. Stat. Soc. (Ser. B) 36(1), 99–102 (1974)

    MathSciNet  MATH  Google Scholar 

  2. Bhadra, A., Datta, J., Polson, N.G., Willard, B.: The horseshoe+ estimator of ultra-sparse signals (2015). arXiv:1502.00560

  3. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration images. IEEE Trans. Patttern Anal. Mach. 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  4. Makalic, E., Schmidt, D.F.: A simple sampler for the horseshoe estimator. IEEE Signal Process. Lett. 23(1), 179–182 (2016)

    Article  Google Scholar 

  5. Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Polson, N.G., Scott, J.G.: Shrink globally, act locally: sparse Bayesian regularization and prediction. In: Bayesian Statistics. vol. 9 (2010)

    Google Scholar 

  7. Wand, M.P., Ormerod, J.T., Padoan, S.A., Fruhwirth, R.: Mean field variational Bayes for elaborate distributions. Bayesian Anal. 6(4), 847–900 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Lindley, D.V., Smith, A.F.M.: Bayes estimates for the linear model. J. R. Stat. Soc. (Ser. B) 34(1), 1–41 (1972)

    MathSciNet  MATH  Google Scholar 

  9. Rue, H.: Fast sampling of Gaussian Markov random fields. J. R. Stat. Soc. (Ser. B) 63(2), 325–338 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cong, Y., Chen, B., Zhou, M.: Fast simulation of hyperplane-truncated multivariate normal distributions (2016)

    Google Scholar 

  11. Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Dirichlet-Laplace priors for optimal shrinkage. J. Am. Stat. Assoc. 110, 1479–1490 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Am. Stat. Assoc. 108(504), 1339–1349 (2013)

    Article  MATH  Google Scholar 

  13. Windle, J., Polson, N.G., Scott, J.G.: Sampling Pólya-Gamma random variates: alternate and approximate techniques (2014)

    Google Scholar 

  14. Ekholm, A., Palmgren, J.: Correction for misclassification using doubly sampled data. J. Official Stat. 3(4), 419–429 (1987)

    Google Scholar 

  15. Copas, J.B.: Binary regression models for contaminated data. J. R. Stat. Soc. Ser. B (Methodol.) 50(2), 225–265 (1988)

    MathSciNet  Google Scholar 

  16. Carroll, R.J., Pederson, S.: On robustness in the logistic regression model. J. R. Stat. Soc. Ser. B (Methodol.) 55(3), 693–706 (1993)

    MathSciNet  MATH  Google Scholar 

  17. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enes Makalic .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Inverse Gamma Distribution

The inverse gamma probability density function is given by

$$\begin{aligned} p(x | \alpha , \beta ) = \frac{\beta ^\alpha }{\varGamma (\alpha )} x^{-\alpha - 1} \exp \left( - \frac{\beta }{x} \right) , \quad (x > 0), \end{aligned}$$

with shape parameter \((\alpha >0)\) and scale parameter \((\beta > 0)\). The first two moments are

$$\begin{aligned} \mathrm{E} \left( x \right) = \frac{\beta }{\alpha - 1}, \quad \mathrm{Var} \left( x \right) = \frac{\beta ^2}{(\alpha -1)^2 (\alpha - 2)}, \end{aligned}$$

where the mean and variance only exist for \((\alpha >1)\) and \((\alpha >2)\) respectively.

1.2 A.2 Inverse Gaussian Distribution

The inverse Gaussian probability density function is given by

$$\begin{aligned} p(x | \mu , \lambda ) = \left( \frac{\lambda }{2 \pi x^3} \right) ^{\frac{1}{2}} \exp \left( -\frac{\lambda (x - \mu )^2}{2 \mu ^2 x}\right) \!\!, \end{aligned}$$

for \((x > 0)\), where \((\mu > 0)\) is the mean and \((\lambda > 0)\) is the shape parameter. The first two moments are

$$\begin{aligned} \mathrm{E} \left( x \right) = \mu , \quad \mathrm{Var} \left( x \right) = \frac{\mu ^3}{\lambda }. \end{aligned}$$

1.3 A.3 Student-t Distribution

The Student-t distribution probability density function is given by

$$\begin{aligned} p(x | \mu , \sigma ^2, \nu ) = \frac{\varGamma \left( \frac{\nu + 1}{2} \right) }{\varGamma \left( \frac{\nu }{2} \right) \sqrt{\pi \nu \sigma ^2}} \left( 1 + \frac{1}{\nu }\frac{(x - \mu )^2}{\sigma ^2} \right) ^{-\frac{\nu +1}{2}}, \end{aligned}$$

where \((x \in \mathbb {R})\), \((\mu \in \mathbb {R})\), \((\sigma ^2 > 0)\) and the degrees of freedom \((\nu > 0)\). The first two moments are

$$\begin{aligned} \mathrm{E} \left( x \right) = \mu , \quad (\nu > 1), \quad \mathrm{Var} \left( x \right) = \sigma ^2 \left( \frac{\nu }{\nu - 2}\right) \!\!, \end{aligned}$$
(13)

where the mean and variance only exist for \((\nu > 1)\) and \((\nu > 2)\) respectively.

1.4 A.4 Laplace Distribution

The probability density function of the Laplace distribution is

$$\begin{aligned} p(x | \mu , b) = \frac{1}{2b} \exp \left( - \frac{|x - \mu |}{b} \right) , \end{aligned}$$

where \((x \in \mathbb {R})\), \((\mu \in \mathbb {R})\) is the location parameter and \((b > 0)\) is the scale parameter. The first two moments are

$$\begin{aligned} \mathrm{E} \left( x \right) = \mu , \quad \mathrm{Var} \left( x \right) = 2 b^2. \end{aligned}$$

1.5 A.5 Pólya-Gamma Distribution

A random variable x follows a Pólya-gamma distribution [12], \(x \sim \mathrm{PG}(b, c)\), if

$$\begin{aligned} x \mathop {=}\limits ^{D} \frac{1}{2\pi ^2} \sum _{k=1}^\infty \frac{g_k}{(k-1/2)^2 + c^2/(4\pi ^2)}, \end{aligned}$$

where \(g_k \sim \mathrm{Ga}(b,1)\) are independent gamma random variables, \((b > 0)\) and \((c \in \mathbb {R})\) are the parameters and \(\mathop {=}\limits ^{D}\) denotes equality in distribution. The first two moments of x are

$$\begin{aligned} \mathrm{E} \left( x \right) = \frac{b}{2c} \mathrm{tanh} \left( \frac{c}{2}\right) \!\!, \quad \mathrm{Var} \left( x \right) = \frac{b}{4c^3} \left( \mathrm{sinh}(c) - c\right) \mathrm{sech}^2 \left( \frac{c}{2} \right) \!\!. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Makalic, E., Schmidt, D.F., Hopper, J.L. (2016). Bayesian Robust Regression with the Horseshoe+ Estimator. In: Kang, B.H., Bai, Q. (eds) AI 2016: Advances in Artificial Intelligence. AI 2016. Lecture Notes in Computer Science(), vol 9992. Springer, Cham. https://doi.org/10.1007/978-3-319-50127-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50127-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50126-0

  • Online ISBN: 978-3-319-50127-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics