Poster
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality
Ruijia Zhang · Diantong Li · Qizhang Feng · Mingyi Hong
The goal of the Inverse reinforcement learning (IRL) task is to identify the underlyingreward function and the corresponding optimal policy from a set of expert demonstrations. Most algorithms with theoretical guarantees in IRL assume the rewardhas a linear structure. In this work, we want to extend our understanding of theIRL problem when the reward is parametrized by some neural network structures. Meanwhile, conventional IRL algorithms usually adopt a nested structure, thusexhibiting computational inefficiency, especially when the MDP is high-dimensional.We address this problem by proposing the first neural single-loop maximumlikelihood algorithm. Due to the nonlinearity of neural network approximation, theprevious global convergence result established on linear reward scenarios is no longerguaranteed. We give the nonasymptotic convergence analysis of our proposed neuralalgorithm by using the overparameterization of certain neural networks. However,it still remains unknown whether the proposed neural algorithm can identify theglobally optimal reward and the corresponding optimal policy. Under some over-parameterized neural network structures, we provide affirmative answers to bothquestions. To our knowledge, this is the first IRL algorithm with a non-asymptoticconvergence guarantee that identifies provably global optimum within neural networksettings.
Live content is unavailable. Log in and register to view live content