Skip to yearly menu bar Skip to main content


Poster

Nonstationary Bandit Learning via Predictive Sampling

Yueyang Liu · Benjamin Van Roy · Kuang Xu

Auditorium 1 Foyer 87

Abstract:

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to nonstationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulation, we demonstrate that predictive sampling outperforms Thompson sampling in all nonstationary environments examined.

Live content is unavailable. Log in and register to view live content