AISTATS Poster Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Poster

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Hristo Papazov · Scott Pesme · Nicolas Flammarion

MR1 & MR2 - Number 103

[ Abstract ]

[ Poster]

Abstract: In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

$\gamma$ and momentum parameter

$\beta$ that allows us to identify an intrinsic quantity

$\lambda = \frac{ \gamma }{ (1 - \beta)^2 }$ which uniquely defines the optimisation path and provides a simple acceleration rule. When training a

$2$ -layer diagonal linear network in an overparametrised regression setting, we characterise the recovered solution through an implicit regularisation problem. We then prove that small values of

$\lambda$ help to recover sparse solutions. Finally, we give similar but weaker results for stochastic momentum gradient descent. We provide numerical experiments which support our claims.

Chat is not available.