Skip to yearly menu bar Skip to main content


Data-Driven Confidence Intervals with Optimal Rates for the Mean of Heavy-Tailed Distributions

Ambrus Tamás · Szabolcs Szentpéteri · Balázs Csanád Csáji

MR1 & MR2 - Number 156
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT


Estimating the expected value is one of the key problems of statistics, and it serves as a backbone for countless methods in machine learning. In this paper we propose a new algorithm to build non-asymptotically exact confidence intervals for the mean of a symmetric distribution based on an independent, identically distributed sample. The method combines resampling with median-of-means estimates to ensure optimal subgaussian bounds for the sizes of the confidence intervals under mild, heavy-tailed moment conditions. The scheme is completely data-driven: the construction does not need any information about the moments, yet it manages to build exact confidence regions which shrink at the optimal rate. We also show how to generalize the approach to higher dimensions and prove dimension-free, subgaussian PAC bounds for the exclusion probabilities of false candidates. Finally, we illustrate the method and its properties for heavy-tailed distributions with numerical experiments.

Live content is unavailable. Log in and register to view live content