Skip to yearly menu bar Skip to main content


Poster

Nystrom Method for Accurate and Scalable Implicit Differentiation

Ryuichiro Hataya · Makoto Yamada

Auditorium 1 Foyer 113

Abstract:

The essential difficulty of gradient-based bilevel optimization is to estimate the inverse Hessian vector product of neural networks.This paper proposes to tackle this problem by the Nystrom method and the Woodbury matrix identity, exploiting the low-rankness of the Hessian.Compared to existing methods using iterative approximation, such as conjugate gradient and the Neumann series approximation, the proposed method can avoid numerical instability and be efficiently computed in matrix operations.As a result, the proposed method works stably in various tasks and is two times faster than iterative approximations.Throughout experiments including large-scale hyperparameter optimization and meta learning, we demonstrate that the Nystrom method consistently achieves comparable or even superior performance to other approaches.

Live content is unavailable. Log in and register to view live content