Moonwalk: Inverse-Forward Differentiation
Abstract
Backpropagation is effective for gradient computation but requires large memory, limiting scalability. This work explores a novel inverse-forward automatic differentiation mode as an alternative for gradient computation in the relatively broad class of neural networks that have surjective differentials (called submersive networks), showing its potential to reduce the memory footprint without substantial drawbacks. We introduce a novel technique based on a vector–inverse-Jacobian product that accelerates the forward computation of gradients compared to naïve forward-mode methods while retaining their advantages of memory reduction and preserving the fidelity of true gradients. Our method, Moonwalk, significantly reduces the memory footprint of neural network training while achieving performance comparable to backpropagation, making it a compelling alternative for efficient model training.