Skip to yearly menu bar Skip to main content


Poster

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Zhihan Xiong · Romain Camilleri · Maryam Fazel · Lalit Jain · Kevin Jamieson

MR1 & MR2 - Number 111

Abstract: We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set XRd, a fixed budget T, and an unpredictable sequence of parameters {θt}Tt=1, an algorithm will aim to correctly identify the best arm x:=argmaxxXxTt=1θt with probability as high as possible. Prior work has addressed the stationary setting where θt=θ1 for all t and demonstrated that the error probability decreases as exp(T/ρ) for a problem-dependent constant ρ. But in many real-world A/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over X at each time then the error probability decreases as exp(TΔ2(1)/d), where Δ(1)=minxx(xx)1TTt=1θt. As there exist environments where Δ2(1)/d1/ρ, we are motivated to propose a novel algorithm P1-RAGE that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1-RAGE and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.

Chat is not available.