Skip to yearly menu bar Skip to main content


Optimal Algorithms for Latent Bandits with Cluster Structure

Soumyabrata Pal · Arun Sai Suggala · Karthikeyan Shanmugam · Prateek Jain

Auditorium 1 Foyer 89

Abstract: We consider the problem of latent bandits with cluster structure where there are multiple agents, each with an associated multi-armed bandit problem. These agents are grouped into latent clusters such that the mean reward vectors of agents within the same cluster are equivalent. At each round, an agent, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the agents is to collaborate among each other to maximize their cumulative rewards. This problem is of practical importance in recommendation systems in frameworks such as user-user collaborative filtering, and has received wide attention of late. We propose LATTICE (Latent bAndiTs via maTrIx ComplEtion), the first algorithm for this problem that achieves the minimax optimal regret of $\widetilde{O}(\sqrt{(A+B)T})$ when the number of clusters is $O(1)$. Here, $A, B$ are the number of items and agents respectively, and $T$ is the number of online rounds. Without collaboration, any algorithm will suffer $\Omega(\sqrt{ABT})$ regret. The key novelty in our algorithm is the reduction of online learning problem to offline low-rank matrix completion problem. Our algorithm is computationally efficient and only requires $O(\log{T})$ calls to an offline matrix completion oracle across all $T$ rounds.

Live content is unavailable. Log in and register to view live content