Keywords: [ Ethics and Safety ] [ Privacy-preserving Statistics and Machine Learning ]

[
Abstract
]

Abstract:
We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\rank}/\epsilon n\right)$, where $\rank$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $\epsilon$ is the privacy parameter. This bound is attained via differentially private gradient descent (DP-GD). Furthermore, via the \emph{first lower bound for unconstrained private ERM}, we show that our upper bound is tight. In sharp contrast to the constrained ERM setting, there is no dependence on the dimensionality of the ambient model space ($p$). (Notice that $\rank\leq \min\{n, p\}$.)
Besides, we obtain an analogous excess population risk bound which depends on $\rank$ instead of $p$.
For the smooth non-convex GLM setting (i.e., where the objective function is non-convex but preserves the GLM structure), we further show that DP-GD attains a dimension-independent convergence of $\tilde O\left(\sqrt{\rank}/\epsilon n\right)$ to a first-order-stationary-point of the underlying objective.
Finally, we show that for convex GLMs, a variant of DP-GD commonly used in practice (which involves clipping the individual gradients) also exhibits the same dimension-independent convergence to the minimum of a well-defined objective. To that end, we provide a structural lemma that characterizes the effect of clipping on the optimization profile of DP-GD.