Skip to yearly menu bar Skip to main content


Model-Based Best Arm Identification for Decreasing Bandits

Sho Takemori · Yuhei Umeda · Aditya Gopalan

MR1 & MR2 - Number 110
[ ]
Fri 3 May 8 a.m. PDT — 8:30 a.m. PDT


We study the problem of reliably identifying the best (lowest loss) arm in a stochastic multi-armed bandit when the expected loss of each arm is monotone decreasing as a function of its pull count. This models, for instance, scenarios where each arm itself represents an optimization algorithm for finding the minimizer of a common function, and there is a limited time available to test the algorithms before committing to one of them. We assume that the decreasing expected loss of each arm depends on the number of its pulls as a (inverse) polynomial with unknown coefficients. We propose two fixed-budget best arm identification algorithms -- one for the case of sparse polynomial decay models and the other for general polynomial models -- along with bounds on the identification error probability. We also derive algorithm-independent lower bounds on the error probability. These bounds are seen to be factored into the product of the usual problem complexity and the model complexity that only depends on the parameters of the model. This indicates that our methods can identify the best arm even when the budget is smaller. We conduct empirical studies of our algorithms to complement our theoretical findings.

Live content is unavailable. Log in and register to view live content