AISTATS Poster Autoregressive Bandits

Poster

Autoregressive Bandits

Francesco Bacchiocchi · Gianmarco Genalti · Davide Maran · Marco Mussi · Marcello Restelli · Nicola Gatti · Alberto Maria Metelli

MR1 & MR2 - Number 26

[ Abstract ]

[ Poster]

Abstract: Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales forecasting, weather prediction, advertising, and pricing. When facing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for guaranteeing convergence to the optimal policy. In this work, we propose a novel online learning setting, namely, Autoregressive Bandits (ARBs), in which the observed reward is governed by an autoregressive process of order

k

$k$ , whose parameters depend on the chosen action. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. Then, we devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order

\tilde{O} (\frac{(k + 1)^{3 / 2} \sqrt{n T}}{(1 - Γ)^{2}})

$\tilde{O} ( \frac{(k+1)^{3/2}\sqrt{nT}}{(1-\Gamma)^2} )$ , where

T

$T$ is the optimization horizon,

n

$n$ is the number of actions, and

Γ < 1

$\Gamma < 1$ is a stability index of the process. Finally, we empirically validate our algorithm, illustrating its advantages w.r.t. bandit baselines and its robustness to misspecification of key parameters.

Chat is not available.