AISTATS Poster A Parameter-Free Algorithm for Misspecified Linear Contextual Bandits

Poster

A Parameter-Free Algorithm for Misspecified Linear Contextual Bandits

Kei Takemura · Shinji Ito · Daisuke Hatano · Hanna Sumita · Takuro Fukunaga · Naonori Kakimura · Ken-ichi Kawarabayashi

Keywords: [ Applications ] [ Body Pose, Face, and Gesture Analysis ] [ Algorithms -> Regression; Probabilistic Methods; Probabilistic Methods ] [ Graphical Models ] [ Learning Theory and Statistics ] [ Decision Processes and Bandits ]

[ Abstract ]

Abstract: We investigate the misspecified linear contextual bandit (MLCB) problem, which is a generalization of the linear contextual bandit (LCB) problem. The MLCB problem is a decision-making problem in which a learner observes

$d$ -dimensional feature vectors, called arms, chooses an arm from

$K$ arms, and then obtains a reward from the chosen arm in each round. The learner aims to maximize the sum of the rewards over

$T$ rounds. In contrast to the LCB problem, the rewards in the MLCB problem may not be represented by a linear function in feature vectors; instead, it is approximated by a linear function with additive approximation parameter

$\varepsilon \geq 0$ . In this paper, we propose an algorithm that achieves

$\tilde{O}(\sqrt{dT\log(K)} + \eps\sqrt{d}T)$ regret, where

$\tilde{O}(\cdot)$ ignores polylogarithmic factors in

$d$ and

$T$ . This is the first algorithm that guarantees a high-probability regret bound for the MLCB problem without knowledge of the approximation parameter

$\varepsilon$ .

Chat is not available.