Poster
Entropic Risk Optimization in Discounted MDPs
Jia Lin Hau · Marek Petrik · Mohammad Ghavamzadeh
Auditorium 1 Foyer 60
Risk-averse Markov Decision Processes (MDPs) can compute policies that achieve high returns with low variability but are usually difficult to solve. Few practical risk-averse objectives admit dynamic programming (DP) formulation, which is the mainstay of most MDP and RL algorithms. In this paper, we derive a new DP formulation for discounted risk-averse MDPs with Entropic Risk Measure (ERM) and Entropic Value at Risk (EVaR) objectives. Our DP formulation, which is possible because we define value functions with time-dependent risk levels, can approximate optimal policies in a time that is polynomial in the approximation error. Then we use the ERM algorithm to optimize the EVaR objective in polynomial time using an optimized discretization scheme. Our numerical results demonstrate the viability of EVaR and ERM in discounted MDPs.
Live content is unavailable. Log in and register to view live content