Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

Danqi Liao · Taiji Suzuki


Abstract: We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the α-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most O(log(T)) in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most O(T) in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter β(1,], which imposes a milder assumption on the suboptimality gap than in previous studies. We show that the proposed algorithm achieves O(log(T)1+β2+βT12+β) regret under the margin condition.

Live content is unavailable. Log in and register to view live content