Optimal Learning in Two-Player Zero-Sum Games under Delayed Feedback
Ruotong Zhuang · Taira Tsuchiya · Shinji Ito
Abstract
Learning in games is a central topic in both learning theory and game theory, and learning dynamics based on online learning have made significant theoretical and practical advances in recent years. In particular, in two-player zero-sum games, it has been shown that using optimistic follow-the-regularized-leader (OFTRL) as the online learning algorithm allows us to upper bound the individual regret for each player by a constant independent of the time horizon. However, in realistic game scenarios, players are not always able to observe the outcomes of their interactions immediately. Motivated by this, very recently, the problem of learning from delayed feedback in games has been proposed, and it has been shown that by using a variant of OFTRL, one can achieve social regret upper bounds of $\tilde{O}(D^2 + 1)$ for a fixed delay time $D$. This study investigates the optimal dependence on the delay parameter $D$ in the setting of learning from delayed feedback in games. In particular, we show that a simple algorithm---running $D+1$ independent copies of the standard OFTRL designed for the non-delayed setting---achieves social and individual regret upper bounds of $\tilde{O}(D + 1)$, thereby improving the existing bounds by a factor of $D$. Moreover, we provide a matching lower bound: for any learning dynamic, there exists a utility function such that the regret of every player is at least $\Omega(D + 1)$.
Successful Page Load