Poster 64

Efficient Bilevel Optimization with KFAC-Based Hypergradients

Disen Liao · Felix Dangel · Yaoliang Yu

Abstract

Bilevel optimization (BO) is widely applicable to many machine learning problems. However, to scale BO, practitioners often adopt crude approximations like one-step gradient unrolling or identity/short-Neumann surrogates, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance–efficiency trade-off than CG/Neumann methods and consistently outperforming unrolling. We evaluate our method across diverse tasks, including meta-learning and AI safety related problems. On models up to BERT, we show that curvature information is valuable at scale, and KFAC can provide it with only modest memory and runtime overhead.