Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Online Sparse Reinforcement Learning

Botao Hao · Tor Lattimore · Csaba Szepesvari · Mengdi Wang

Keywords: [ Applications ] [ Fairness, Accountability, and Transparency ] [ Reinforcement Learning ]


Abstract: We investigate the hardness of online reinforcement learning in sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the number of episodes. Our contribution is two-fold. First, we provide a lower bound showing that linear regret is generally unavoidable, even if there exists a policy that collects well-conditioned data. Second, we show that if the learner has oracle access to a policy that collects well-conditioned data, then a variant of Lasso fitted Q-iteration enjoys a regret of O(N2/3) where N is the number of episodes.

Chat is not available.