Skip to yearly menu bar Skip to main content


“Plus/minus the learning rate”: Easy and Scalable Statistical Inference with SGD

Jerry Chee · Hwanwoo Kim · Panos Toulis

Auditorium 1 Foyer 89


In this paper, we develop a statistical inference procedure using stochastic gradient descent (SGD)-based confidence intervals. These intervals are of the simplest possible form: θ{N,j} ± 2√(γ/N) , where θN is the SGD estimate of model parameters θ over N data points, and γ is the learning rate. This construction relies only on a proper selection of the learning rate to ensure the standard SGD conditions for O(1/n) convergence. The procedure performs well in all our empirical evaluations, achieving near-nominal coverage intervals scaling up to 20× as many parameters as other SGD-based inference methods. We demonstrate our method’s practical significance on modeling adverse events in emergency general surgery patients using a novel dataset from the Hospital of the University of Pennsylvania. Our code is available on \href{}{GitHub}.

Live content is unavailable. Log in and register to view live content