Poster
Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs
Swetha Ganesh · Washim Mondal · Vaneet Aggarwal
[
Abstract
]
Abstract:
We present two Policy Gradient-based algorithms with general parametrization in the context of infinite-horizon average reward Markov Decision Process (MDP). The first one employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order . The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order . These results significantly improve the state-of-the-art regret and achieve the theoretical lower bound. We also show that the average-reward function is approximately -smooth, a result that was previously assumed in earlier works.
Live content is unavailable. Log in and register to view live content