[
Abstract
]
Oral presentation:
Oral 5: Kernels / Optimization / Deep learning

Tue 29 Mar 2:30 a.m. PDT — 3:30 a.m. PDT

Tue 29 Mar 1 a.m. PDT — 2:30 a.m. PDT

Tue 29 Mar 2:30 a.m. PDT — 3:30 a.m. PDT

Abstract:
We study the bilinearly coupled minimax problem: $\min_{x} \max_{y} f(x) + y^\top A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions and admit first-order gradient oracles. Surprisingly, no known first-order algorithms have hitherto achieved the lower complexity bound of $\Omega((\sqrt{\frac{L_x}{\mu_x}} + \frac{\|A\|}{\sqrt{\mu_x \mu_y}} + \sqrt{\frac{L_y}{\mu_y}}) \log(\frac1{\varepsilon}))$ for solving this problem up to an $\varepsilon$ primal-dual gap in the general parameter regime, where $L_x, L_y,\mu_x,\mu_y$ are the corresponding smoothness and strongly convexity constants. We close this gap by devising the first optimal algorithm, the Lifted Primal-Dual (LPD) method. Our method lifts the objective into an extended form that allows both the smooth terms and the bilinear term to be handled optimally and seamlessly with the same primal-dual framework. Besides optimality, our method yields a desirably simple single-loop algorithm that uses only one gradient oracle call per iteration. Moreover, when $f$ is just convex, the same algorithm applied to a smoothed objective achieves the nearly optimal iteration complexity. We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$. Numerical experiments on quadratic minimax problems and policy evaluation problems further demonstrate the fast convergence of our algorithm in practice.

Chat is not available.