Skip to yearly menu bar Skip to main content


Poster

Conformal Off-Policy Prediction

yingying zhang · Chengchun Shi · Shikai Luo

Auditorium 1 Foyer 59

Abstract:

Off-policy evaluation is critical in a number of applications where new policiesneed to be evaluated offline before online deployment. Most existing methods focuson the expected return, define the target parameter through averaging and provide apoint estimator only. In this paper, we develop a novel procedure to produce reliableinterval estimators for a target policy’s return starting from any initial state. Ourproposal accounts for the variability of the return around its expectation, focuseson the individual effect and offers valid uncertainty quantification. Our mainidea lies in designing a pseudo policy that generates subsamples as if they weresampled from the target policy so that existing conformal prediction algorithmsare applicable to prediction interval construction. Our methods are justified bytheories, synthetic data and real data from short-video platforms.

Live content is unavailable. Log in and register to view live content