Skip to yearly menu bar Skip to main content


Poster

A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

Mehdi Jafarnia · Rahul Jain · Ashutosh Nayyar

MR1 & MR2 - Number 43

Abstract: In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of \tilde\mathcal{O}(HS\sqrt{AT})\tilde\mathcal{O}(HS\sqrt{AT}) in the infinite-horizon zero-sum stochastic games with average-reward criterion.Here HH is an upper bound on the span of the bias function, SS is the number of states, AA is the number of joint actions and TT is the horizon.We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy.Our regret bound improves on the best existing regret bound of \tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2}) by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in TT.

Chat is not available.