Poster
Conformal Off-Policy Prediction
yingying zhang · Chengchun Shi · Shikai Luo
Auditorium 1 Foyer 59
Off-policy evaluation is critical in a number of applications where new policiesneed to be evaluated offline before online deployment. Most existing methods focuson the expected return, define the target parameter through averaging and provide apoint estimator only. In this paper, we develop a novel procedure to produce reliableinterval estimators for a target policy’s return starting from any initial state. Ourproposal accounts for the variability of the return around its expectation, focuseson the individual effect and offers valid uncertainty quantification. Our mainidea lies in designing a pseudo policy that generates subsamples as if they weresampled from the target policy so that existing conformal prediction algorithmsare applicable to prediction interval construction. Our methods are justified bytheories, synthetic data and real data from short-video platforms.
Live content is unavailable. Log in and register to view live content