Skip to yearly menu bar Skip to main content


Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Jiayi Huang · Han Zhong · Liwei Wang · Lin Yang

MR1 & MR2 - Number 39
[ ]
Fri 3 May 8 a.m. PDT — 8:30 a.m. PDT


To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon. The derived regret bound is deemed \emph{sharp}, as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient} with access to a regression oracle. The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs: weighted value-targeted regression and a high-order moment estimator in the context of general function approximation; and (ii) fine-grained analysis: a novel concentration bound of weighted non-linear least squares and a refined analysis which leads to the tight instance-dependent bound. We also conduct comprehensive experiments to corroborate our theoretical findings.

Live content is unavailable. Log in and register to view live content