Skip to yearly menu bar Skip to main content


Posterior Tracking Algorithm for Classification Bandits

Koji Tabata · Junpei Komiyama · Atsuyoshi Nakamura · Tamiki Komatsuzaki

Auditorium 1 Foyer 80

Abstract: The classification bandit problem aims to determine whether a set of given $K$ arms contains at least $L$ good arms or not. Here, an arm is said to be good if its expected reward is no less than a specified threshold.To solve this problem, we introduce an asymptotically optimal algorithm, named P-tracking, based on posterior sampling. Unlike previous asymptotically optimal algorithms that require solving a linear programming problem with an exponentially large number of constraints, P-tracking solves an equivalent optimization problem that can be computed in time linear in $K$. Additionally, unlike existing algorithms, P-tracking does not require forced exploration steps. Empirical results show that P-tracking outperforms existing algorithms in sample efficiency.

Live content is unavailable. Log in and register to view live content