Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Q-learning with Logarithmic Regret

Kunhe Yang · Lin Yang · Simon Du

Keywords: [ Deep Learning ] [ Reinforcement Learning ]


Abstract: This paper presents the first non-asymptotic result showing a model-free algorithm can achieve logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap. We prove that the optimistic Q-learning studied in [Jin et al. 2018] enjoys a O(SApoly(H)Δminlog(SAT)) cumulative regret bound where S is the number of states, A is the number of actions, H is the planning horizon, T is the total number of steps, and Δmin is the minimum sub-optimality gap of the optimal Q-function. This bound matches the information theoretical lower bound in terms of S,A,T up to a log(SA) factor. We further extend our analysis to the discounted setting and obtain a similar logarithmic cumulative regret bound.

Chat is not available.