Open Access
April 2020 Robust machine learning by median-of-means: Theory and practice
Guillaume Lecué, Matthieu Lerasle
Ann. Statist. 48(2): 906-931 (April 2020). DOI: 10.1214/19-AOS1828

Abstract

Median-of-means (MOM) based procedures have been recently introduced in learning theory (Lugosi and Mendelson (2019); Lecué and Lerasle (2017)). These estimators outperform classical least-squares estimators when data are heavy-tailed and/or are corrupted. None of these procedures can be implemented, which is the major issue of current MOM procedures (Ann. Statist. 47 (2019) 783–794).

In this paper, we introduce minmax MOM estimators and show that they achieve the same sub-Gaussian deviation bounds as the alternatives (Lugosi and Mendelson (2019); Lecué and Lerasle (2017)), both in small and high-dimensional statistics. In particular, these estimators are efficient under moments assumptions on data that may have been corrupted by a few outliers.

Besides these theoretical guarantees, the definition of minmax MOM estimators suggests simple and systematic modifications of standard algorithms used to approximate least-squares estimators and their regularized versions. As a proof of concept, we perform an extensive simulation study of these algorithms for robust versions of the LASSO.

Citation

Download Citation

Guillaume Lecué. Matthieu Lerasle. "Robust machine learning by median-of-means: Theory and practice." Ann. Statist. 48 (2) 906 - 931, April 2020. https://doi.org/10.1214/19-AOS1828

Information

Received: 1 July 2018; Revised: 1 February 2019; Published: April 2020
First available in Project Euclid: 26 May 2020

zbMATH: 07241574
MathSciNet: MR4102681
Digital Object Identifier: 10.1214/19-AOS1828

Subjects:
Primary: 60K35 , 62G08
Secondary: 62C20 , 62G05 , 62G20

Keywords: Empirical processes , High-dimensional statistics

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.48 • No. 2 • April 2020
Back to Top