Poster
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Danqi Liao · Taiji Suzuki
[
Abstract
]
Abstract:
We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the α-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most O(log(T)) in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most O(√T) in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter β∈(1,∞], which imposes a milder assumption on the suboptimality gap than in previous studies. We show that the proposed algorithm achieves O(log(T)1+β2+βT12+β) regret under the margin condition.
Live content is unavailable. Log in and register to view live content