Abstract:
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set X⊂Rd, a fixed budget T, and an unpredictable sequence of parameters {θt}Tt=1, an algorithm will aim to correctly identify the best arm x∗:=argmaxx∈Xx⊤∑Tt=1θt with probability as high as possible. Prior work has addressed the stationary setting where θt=θ1 for all t and demonstrated that the error probability decreases as exp(−T/ρ∗) for a problem-dependent constant ρ∗. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over X at each time then the error probability decreases as exp(−TΔ2(1)/d), where Δ(1)=minx≠x∗(x∗−x)⊤1T∑Tt=1θt. As there exist environments where Δ2(1)/d≪1/ρ∗, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.
Chat is not available.