Skip to yearly menu bar Skip to main content


On learning history-based policies for controlling Markov decision processes

Gandharv Patil · Aditya Mahajan · Doina Precup

MR1 & MR2 - Number 34
[ ]
Fri 3 May 8 a.m. PDT — 8:30 a.m. PDT


Reinforcement learning (RL) folklore suggests that methods of function approximation based on history, such as recurrent neural networks or state abstractions that include past information, outperform those without memory, because function approximation in Markov decision processes (MDP) can lead to a scenario akin to dealing with a partially observable MDP (POMDP). However, formal analysis of history-based algorithms has been limited, with most existing frameworks concentrating on features without historical context. In this paper, we introduce a theoretical framework to examine the behaviour of RL algorithms that control an MDP using feature abstraction mappings based on historical data. Additionally, we leverage this framework to develop a practical RL algorithm and assess its performance across various continuous control tasks.

Live content is unavailable. Log in and register to view live content