Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip to yearly menu bar Skip to main content


Poster

On Preference-based Stochastic Linear Contextual Bandits with Knapsacks

Fengpei Li


Abstract: This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK).We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of O((κ+TνB)TlogT), for any total budget B=Ω(T).The parameters κ and TνB capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition B=Ω(T). To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee. The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.

Live content is unavailable. Log in and register to view live content