Skip to yearly menu bar Skip to main content


Further Adaptive Best-of-Both-Worlds Algorithm for Combinatorial Semi-Bandits

Taira Tsuchiya · Shinji Ito · Junya Honda

Auditorium 1 Foyer 91


We consider the combinatorial semi-bandit problem and present a new algorithm with a best-of-both-worlds regret guarantee; the regrets are bounded near-optimally in the stochastic and adversarial regimes. In the stochastic regime, we prove a variance-dependent regret bound depending on the tight suboptimality gap introduced by Kveton et al. (2015) with a good leading constant. In the adversarial regime, we show that the same algorithm simultaneously obtains various data-dependent regret bounds. Our algorithm is based on the follow-the-regularized-leader framework with a refined regularizer and adaptive learning rate. Finally, we numerically test the proposed algorithm and confirm its superior or competitive performance over existing algorithms such as Thompson sampling in most settings.

Live content is unavailable. Log in and register to view live content