Skip to yearly menu bar Skip to main content


Poster

Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits

Danqi Liao · Enrique Mallada · Anqi Liu


Abstract: By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose Neural-σ2-LinearUCB, a variance-aware algorithm that utilizes σt2, i.e., an upper bound of the reward noise variance at round t, to enhance the uncertainty quantification quality of the UCB, resulting in a regret performance improvement. We provide an oracle version for our algorithm characterized by an oracle variance upper bound σt2 and a practical version with a novel estimation for this variance bound. Theoretically, we provide rigorous regret analysis for both versions and prove that our oracle algorithm achieves a better regret guarantee than other neural-UCB algorithms in the neural contextual bandits setting. Empirically, our practical method enjoys a similar computational efficiency, while outperforming state-of-the-art techniques by having a better calibration and lower regret across multiple standard settings, including on the synthetic, UCI, MNIST, and CIFAR-10 datasets.

Live content is unavailable. Log in and register to view live content