Skip to main content

Robust Lasso Regression with Student-t Residuals

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10400))

Abstract

The lasso, introduced by Robert Tibshirani in 1996, has become one of the most popular techniques for estimating Gaussian linear regression models. An important reason for this popularity is that the lasso can simultaneously estimate all regression parameters as well as select important variables, yielding accurate regression models that are highly interpretable. This paper derives an efficient procedure for fitting robust linear regression models with the lasso in the case where the residuals are distributed according to a Student-t distribution. In contrast to Gaussian lasso regression, the proposed Student-t lasso regression procedure can be applied to data sets which contain large outlying observations. We demonstrate the utility of our Student-t lasso regression by analysing the Boston housing data set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  2. Finegold, M., Drton, M.: Robust graphical modelling with t-distributions. In: 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009) (2009)

    Google Scholar 

  3. Lambert-Lacroix, S.: Robust regression through the Huber’s criterion and adaptive lasso penalty. Electron. J. Stat. 5, 1015–1053 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the \(t\) distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989)

    MathSciNet  Google Scholar 

  5. Li, Y., Zhu, J.: \(l_1\)-norm quantile regression. J. Comput. Graph. Stat. 17(1), 1–23 (2008)

    Article  MathSciNet  Google Scholar 

  6. Osborne, M.R., Presnell, B., Turlach, B.A.: On the LASSO and its dual. J. Comput. Graph. Stat. 9(2), 319–337 (2000)

    MathSciNet  Google Scholar 

  7. Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  8. Polson, N.G., Scott, J.G.: Data augmentation for non-Gaussian regression models using variance-mean mixtures. Biometrika 100(2), 459–471 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  10. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. (Ser. B) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  11. West, M.: On scale mixtures of normal distributions. Biometrika 74(3), 646–648 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  12. Yi, N., Xu, S.: Bayesian LASSO for quantitative trait loci mapping. Genetics 179(2), 1045–1055 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel F. Schmidt .

Editor information

Editors and Affiliations

Appendix A

Appendix A

To find an appropriate maximum value \(\tau _\mathrm{max}\) of \(\tau \) for producing a regularisation path we use the following heuristic procedure: let \(\hat{\varvec{\beta }}_\mathrm{ML}\) and \(\hat{\sigma }^2_\mathrm{ML}\) denote the maximum likelihood estimates for \(\varvec{\beta }\) and \(\sigma ^2\). The negative log-prior probability of the maximum likelihood estimates, under the Laplace prior (3), is given by

$$ p \log (\tau ) + \left( \frac{\sqrt{2}}{\tau \hat{\sigma }_\mathrm{ML}} \right) || \hat{\varvec{\beta }}_\mathrm{ML} ||_1 + \mathrm{const}, $$

where \(\mathrm{const}\) denotes terms that do not depend on either \(\tau \) or \(\hat{\varvec{\beta }}_\mathrm{ML}\). The value of \(\tau \) that maximises the prior probability for \(\hat{\varvec{\beta }}_\mathrm{ML}\) is given by

$$ \tilde{\tau } = \frac{\sqrt{2} \, ||\hat{\varvec{\beta }}_\mathrm{ML}||_1}{p \hat{\sigma }_\mathrm{ML}}. $$

We then choose \(\tau _\mathrm{max} = c \tilde{\tau }\), where \(c>1\) is a constant that controls the distance of the maximum likelihood estimates to the final point on the regularisation path.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Schmidt, D.F., Makalic, E. (2017). Robust Lasso Regression with Student-t Residuals. In: Peng, W., Alahakoon, D., Li, X. (eds) AI 2017: Advances in Artificial Intelligence. AI 2017. Lecture Notes in Computer Science(), vol 10400. Springer, Cham. https://doi.org/10.1007/978-3-319-63004-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63004-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63003-8

  • Online ISBN: 978-3-319-63004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics