AISTATS Poster Linear Convergence of Gradient Descent For Overparametrized Finite Width Two-Layer Linear Networks With General Initialization

Poster

Linear Convergence of Gradient Descent For Overparametrized Finite Width Two-Layer Linear Networks With General Initialization

Ziqing Xu · Hancheng Min · Salma Tarmoun · Enrique Mallada · Rene Vidal

Auditorium 1 Foyer 22

[ Abstract ]

Abstract:

Recent theoretical analyses of the convergence of gradient descent (GD) to a global minimum for over-parametrized neural networks make strong assumptions on the step size (infinitesimal), the hidden-layer width (infinite), or the initialization (spectral, balanced). In this work, we relax these assumptions and derive a linear convergence rate for two-layer linear networks trained using GD on the squared loss in the case of finite step size, finite width and general initialization. Despite the generality of our analysis, our rate estimates are significantly tighter than those of prior work. Moreover, we provide a time-varying step size rule that monotonically improves the convergence rate as the loss function decreases to zero. Numerical experiments validate our findings.

Live content is unavailable. Log in and register to view live content