Poster 120

On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments

Safwan Labbi · Paul Mangold · Daniil Tiapkin · Eric Moulines

Abstract

We provide global convergence rates for vanilla and entropy-regularized federated softmax stochastic policy gradient ($\texttt{FedPG}$) with local training. We show that $\texttt{FedPG}$ converges to a near-optimal policy in terms of the average agent value, with a gap controlled by the level of heterogeneity. Remarkably, we obtain the first convergence rates for entropy-regularized policy gradient *with explicit constants*, leveraging a projection-like operator. Our results build upon a new analysis of federated averaging for non-convex objectives, based on the observation that the Łojasiewicz-type inequalities from the single-agent setting (Mei et al., 2020) do not hold for the federated objective. This uncovers a fundamental difference between single-agent and federated reinforcement learning: while single-agent optimal policies can be deterministic, federated objectives may inherently require stochastic policies.