Skip to yearly menu bar Skip to main content


Reward-Relevance-Filtered Linear Offline Reinforcement Learning

Angela Zhou

MR1 & MR2 - Number 24
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT


This paper studies offline reinforcement learning with linear function approximation in a setting with decision-theoretic, but not estimation sparsity.The structural restrictions of the data-generating process presume that the transitions factor into a sparse component that affects the reward and could affect additional exogenous dynamics that do not affect the reward.Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function depends only on the sparse component: we call this causal/decision-theoretic sparsity.We develop a method for reward-filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso in least-squares policy evaluation.We provide theoretical guarantees for our reward-filtered linear fitted-Q-iteration, with sample complexity depending only on the size of the sparse component.

Live content is unavailable. Log in and register to view live content