AISTATS Poster On the Model-Misspecification in Reinforcement Learning

Poster

On the Model-Misspecification in Reinforcement Learning

Yunfan Li · Lin Yang

MR1 & MR2 - Number 14

[ Abstract ]

[ Poster]

Abstract: The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification—a disparity between the ground-truth and optimal function approximators— it is shown that policy-based approaches can be robust even when the policy function approximation is under a large \emph{locally-bounded} misspecification error, with which the function class may exhibit a

Ω (1)

$\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of

\tilde{O} (p o l y (d H) \cdot (\sqrt{K} + K \cdot ζ))

$\widetilde{O}\left(\mathrm{poly}(dH)\cdot(\sqrt{K} + K\cdot\zeta) \right)$ , where

d

$d$ represents the complexity of the function class,

H

$H$ is the episode length,

K

$K$ is the total number of episodes, and

ζ

$\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of

ζ

$\zeta$ , thereby enhancing its practical applicability.

Chat is not available.