Marginalized Operators for Off-policy Reinforcement Learning

Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko

[ Abstract ]
Wed 30 Mar 3:30 a.m. PDT — 5 a.m. PDT


In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalizedoperators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step operators. We show that the estimates for marginalized operators can be computed ina scalable way, which also generalizes prior results on marginalized importance sampling as special cases. Finally, we empirically demonstrate that marginalized operators provide performance gains to off-policy evaluation problems and downstream policy optimization algorithms.

Chat is not available.