AISTATS Poster Q-learning with Logarithmic Regret

Poster

Q-learning with Logarithmic Regret

Kunhe Yang · Lin Yang · Simon Du

Keywords: [ Deep Learning ] [ Reinforcement Learning ]

[ Abstract ]

Abstract: This paper presents the first non-asymptotic result showing a model-free algorithm can achieve logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap. We prove that the optimistic Q-learning studied in [Jin et al. 2018] enjoys a

${\mathcal{O}}\!\left(\frac{SA\cdot \mathrm{poly}\left(H\right)}{\Delta_{\min}}\log\left(SAT\right)\right)$ cumulative regret bound where

$S$ is the number of states,

$A$ is the number of actions,

$H$ is the planning horizon,

$T$ is the total number of steps, and

$\Delta_{\min}$ is the minimum sub-optimality gap of the optimal Q-function. This bound matches the information theoretical lower bound in terms of

$S,A,T$ up to a

$\log\left(SA\right)$ factor. We further extend our analysis to the discounted setting and obtain a similar logarithmic cumulative regret bound.

Chat is not available.