Skip to yearly menu bar Skip to main content


Poster

Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

Danqi Liao


Abstract:

LocalSGD and SCAFFOLD are widely usedmethods in distributed stochastic optimization, with numerous applications in machinelearning, large-scale data processing, and federated learning. However, rigorously establishing their theoretical advantages over simplermethods, such as minibatch SGD (MbSGD),has proven challenging, as existing analysesoften rely on strong assumptions, unrealisticpremises, or overly restrictive scenarios.In this work, we revisit the convergence properties of LocalSGD and SCAFFOLD under avariety of existing or weaker conditions, including gradient similarity, Hessian similarity, weak convexity, andLipschitz continuity of the Hessian. Our analysis shows that (i) LocalSGD achieves fasterconvergence compared to MbSGD for weaklyconvex functions without requiring strongergradient similarity assumptions; (ii) LocalSGDbenefits significantly from higher-order similarity and smoothness; and (iii) SCAFFOLDdemonstrates faster convergence than MbSGDfor a broader class of non-quadratic functions.These theoretical insights provide a clearerunderstanding of the conditions under whichLocalSGD and SCAFFOLD outperform MbSGD.

Live content is unavailable. Log in and register to view live content