Skip to yearly menu bar Skip to main content


Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

Tom Sander · Maxime Sylvestre · Alain Durmus

MR1 & MR2 - Number 71
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT


Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) often results in superior test performance compared to larger batches. This implicit bias is attributed to the specific noise structure inherent to SGD. When ensuring Differential Privacy (DP) in DNNs' training, DP-SGD adds Gaussian noise to the clipped gradients. However, large-batch training still leads to a significant performance decrease, posing a challenge as strong DP guarantees necessitate the use of massive batches.Our study first demonstrates that this phenomenon extends to Noisy-SGD (DP-SGD without clipping), suggesting that the stochasticity, not the clipping, is responsible for this implicit bias, even with additional isotropic Gaussian noise. We then theoretically analyze the solutions obtained with continuous versions of Noisy-SGD for the Linear Least Square and Diagonal Linear Network settings. Our analysis reveals that the additional noise indeed amplifies the implicit bias.It suggests that the performance issues of private training stem from the same underlying principles as SGD, offering hope for improvements in large batch training strategies.

Live content is unavailable. Log in and register to view live content