Poster 209

Multi-Agent Lipschitz Bandits

Sourav Chakraborty ⋅ Amit Rege ⋅ Claire Monteleoni ⋅ Lijun Chen

Abstract

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, while separating coordination costs from learning costs. We propose a modular protocol that first solves the multi-agent coordination problem by identifying and seating players on distinct, high-value regions via a novel maxima-directed search and then decouples the problem into $N$ independent single-player Lipschitz bandits. In the consensus regime, we obtain an end-to-end regret bound whose dominant learning term is $\tilde{O}(T^{(d+1)/(d+2)})$, matching the single-player Lipschitz rate; the upfront coordination cost is horizon-independent at fixed confidence and only polylogarithmic in $T$ in the expected-regret form. Under an additional public coverage/scheduling assumption for the epochic extension, we also obtain a gap-free $\tilde{O}(T^{(d+1)/(d+2)})$ guarantee. We further derive a matching lower bound for the dominant learning term and extend the framework to general distance-threshold collision models.