AISTATS Poster Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Poster

Minimum Empirical Divergence for Sub-Gaussian Linear Bandits

Ngo Nguyen · Kwang-Sung Jun

[ Abstract ]

Abstract: We propose a novel linear bandit algorithm called LinMED (Linear Minimum Empirical Divergence), which is a linear extension of the MED algorithm that was originally designed for multi-armed bandits.LinMED is a randomized algorithm that admits a closed-form computation of the arm sampling probabilities, unlike the popular randomized algorithm called linear Thompson sampling.Such a feature proves useful for off-policy evaluation where the unbiased evaluation requires accurately computing the sampling probability.We prove that LinMED enjoys a near-optimal regret bound of

d \sqrt{n}

$d\sqrt{n}$ up to logarithmic factors where

d

$d$ is the dimension and

n

$n$ is the time horizon.We further show that LinMED enjoys a

\frac{d^{2}}{Δ} (\log^{2} (n)) \log (\log (n))

$\frac{d^2}{\Delta}\left(\log^2(n)\right)\log\left(\log(n)\right)$ problem-dependent regret where

Δ

$\Delta$ is the smallest suboptimality gap.Our empirical study shows that LinMED has a competitive performance with the state-of-the-art algorithms.

Live content is unavailable. Log in and register to view live content