Skip to yearly menu bar Skip to main content


Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication

Hugo Richard · Etienne Boursier · Vianney Perchet

MR1 & MR2 - Number 74
[ ]
Fri 3 May 8 a.m. PDT — 8:30 a.m. PDT

Abstract: Multiplayer bandits have recently garnered significant attention due to their relevance in cognitive radio networks. While the existing body of literature predominantly focuses on synchronous players, real-world radio networks, such as those in IoT applications, often feature asynchronous (i.e., randomly activated) devices. This highlights the need for addressing the more challenging asynchronous multiplayer bandits problem. Our first result shows that a natural extension of UCB achieves a minimax regret of $\mathcal{O}(\sqrt{T\log(T)})$ in the centralized setting. More significantly, we introduce Cautious Greedy, which uses $\mathcal{O}(\log(T))$ communications and whose instance-dependent regret is constant if the optimal policy assigns at least one player to each arm (a situation proven to occur when arm means are sufficiently close). Otherwise, the regret is, as usual, $\log(T)$ times the sum of some inverse sub-optimality gaps. We substantiate the optimality of Cautious Greedy through lower-bound analysis based on data-dependent terms. Therefore, we establish a strong baseline for asynchronous multiplayer bandits, at least with $\mathcal{O}(\log(T))$ communications.

Live content is unavailable. Log in and register to view live content