AISTATS Poster On Information Gain and Regret Bounds in Gaussian Process Bandits

Poster

On Information Gain and Regret Bounds in Gaussian Process Bandits

Sattar Vakili · Kia Khezeli · Victor Picheny

Keywords: [ Applications ] [ Fairness, Accountability, and Transparency ] [ Learning Theory and Statistics ] [ Decision Processes and Bandits ]

[ Abstract ]

Abstract: Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function

$f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when

$f$ is a sample from a Gaussian process (GP)) and a frequentist (when

$f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain

$\gamma_T$ between

$T$ observations and the underlying GP (surrogate) model. We provide general bounds on

$\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels improves the existing bounds on

$\gamma_T$ , and subsequently the regret bounds relying on

$\gamma_T$ under numerous settings. For the Mat{\'e}rn family of kernels, where the lower bounds on

$\gamma_T$ , and regret under the frequentist setting, are known, our results close a huge polynomial in

$T$ gap between the upper and lower bounds (up to logarithmic in

$T$ factors).

Chat is not available.