Skip to yearly menu bar Skip to main content


Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling · Xingyu Xie · Qiuhao Wang · Zongpeng Zhang · Zhouchen Lin

Auditorium 1 Foyer 25


A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.

Live content is unavailable. Log in and register to view live content