Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints
Dhruv Sarkar · Aprameyo Chakrabartty · Subhamon Supantha · Palash Dey · Abhishek Sinha
Abstract
We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints. In this setting, at each round, the learner selects an action from a convex decision set $\mathcal{X}$, after which both a convex cost function and a convex constraint function are revealed. The objective is to design a computationally efficient learning policy that simultaneously achieves low regret with respect to the cost functions and low cumulative constraint violation (CCV) over a horizon of length $T$. A major computational bottleneck in standard OCO algorithms is the projection operation onto the decision set $\mathcal{X}$. However, for many structured decision sets, linear optimization can be performed efficiently. Motivated by this, we propose a \emph{projection-free} online conditional gradient (OCG)-based algorithm that requires only a single call to a linear optimization oracle over $\mathcal{X}$ per round. Our approach improves upon the state of the art for projection-free online learning with adversarial constraints, achieving $\tilde{O}(T^{3/4})$ bounds for both regret and CCV. Our algorithm is conceptually simple. It constructs a surrogate cost function as a nonnegative linear combination of the cost and constraint functions, and feeds these surrogate costs into a novel adaptive online conditional gradient subroutine introduced in this paper. We further extend our framework to the bandit setting, where we show that a new form of surrogate loss is necessary to properly handle bandit feedback—an issue overlooked in prior work. Finally, we develop an efficient Follow-the-Perturbed-Leader (FTPL)-based algorithm, particularly well-suited for online combinatorial optimization problems with discrete actions, which also achieves $O(T^{3/4})$ regret and CCV.
Successful Page Load