Skip to yearly menu bar Skip to main content


Identification of Blackwell Optimal Policies for Deterministic MDPs

Victor Boone · Bruno Gaujal

Auditorium 1 Foyer 53

Abstract: This paper investigates a new learning problem,the identification of Blackwell optimal policieson deterministic MDPs (DMDPs): A learner hasto return a Blackwell optimal policy with fixedconfidence using a minimal number of queries.First, we characterize the maximal set of DMDPsfor which the identification is possible. Then,we focus on the analysis of algorithms based onproduct-form confidence regions. We minimizethe number of queries by efficiently visiting thestate-action pairs with respect to the shape ofconfidence sets. Furthermore, these confidencesets are themselves optimized to achieve betterperformances. The performances of our methodscompare to the lower bounds up to a factor $n^2$ inthe worst case – where $n$ is the number of states,and constant in certain classes of DMDPs.

Live content is unavailable. Log in and register to view live content