Efficient Bilevel Optimization with KFAC-Based Hypergradients
Abstract
Bilevel optimization (BO) is widely applicable to many machine learning problems. However, to scale BO, practitioners often adopt crude approximations like one-step gradient unrolling or identity/short-Neumann surrogates, which discard curvature information. We build on implicit function theorem-based algorithms and propose to incorporate Kronecker-factored approximate curvature (KFAC), yielding curvature-aware hypergradients with a better performance–efficiency trade-off than CG/Neumann methods and consistently outperforming unrolling. We evaluate our method across diverse tasks, including meta-learning and AI safety related problems. On models up to BERT, we show that curvature information is valuable at scale, and KFAC can provide it with only modest memory and runtime overhead.