On the Generalization Properties of Adversarial Training

Yue Xing · Qifan Song · Guang Cheng

Keywords: [ Ethics and Safety ] [ Robustness ]

[ Abstract ]
Tue 13 Apr 6:30 p.m. PDT — 8:30 p.m. PDT

Abstract: Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed. Theoretical studies of adversarial training algorithms mostly focus on their adversarial training losses or local convergence properties. In contrast, this paper studies the generalization performance of a generic adversarial training algorithm. Specifically, we consider linear regression models and two-layer neural networks (with lazy training) using squared loss under low-dimensional regime and high-dimensional regime. In the former regime, after overcoming the non-smoothness of adversarial training, the adversarial risk of the trained models will converge to the minimal adversarial risk. In the latter regime, we discover that data interpolation prevents the adversarial robust estimator from being consistent (i.e. converge in probability). Therefore, inspired by successes of the least absolute shrinkage and selection operator (LASSO), we incorporate the $\mathcal{L}_1$ penalty in the high dimensional adversarial learning, and show that it leads to consistent adversarial robust estimation. A series of numerical studies are conducted to demonstrate that how the smoothness and $\mathcal{L}_1$ penalization help to improve the adversarial robustness of DNN models.

Chat is not available.