Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Tianyang Hu · Wenjia Wang · Cong Lin · Guang Cheng

Keywords: [ Deep Learning ] [ Theory ]

[ Abstract ]
Wed 14 Apr 6 a.m. PDT — 8 a.m. PDT


Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of L2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the L2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of the L2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the L2 regularization approach improves the training robustness and works for a wider range of neural networks.

Chat is not available.