Skip to yearly menu bar Skip to main content


Multiple-policy High-confidence Policy Evaluation

Chris Dann · Mohammad Ghavamzadeh · Teodor Vanislavov Marinov

Auditorium 1 Foyer 57


In reinforcement learning applications, we often want to accurately estimate the return of several policies of interest. We study this problem, multiple-policy high-confidence policy evaluation, where the goal is to estimate the return of all given target policies up to a desired accuracy with as few samples as possible. The natural approaches to this problem, i.e.,~evaluating each policy separately or estimating a model of the MDP, do not take into account the similarities between target policies and scale with the number of policies to evaluate or the size of the MDP, respectively.We present an alternative approach based on reusing samples from on-policy Monte-Carlo estimators and show that it is more sample-efficient in favorable cases. Specifically, we provide guarantees in terms of a notion of overlap of the set of target policies and shed light on when such an approach is indeed beneficial compared to existing methods.

Live content is unavailable. Log in and register to view live content