Three-operator splitting with stale gradients for faster non-linear optimal transport
Abstract
Scalable optimization for non-linear optimal transport (OT) poses unique challenges; it requires efficient memory management of large matrices, effective parallelization strategies suited for modern accelerators like GPUs, and theoretical guarantees that support practical implementation patterns. To address these challenges, we introduce a new algorithm based on three-operator splitting that reduces gradient computation costs by allowing gradient evaluations to run asynchronously and in parallel with other computations. Using monotone operator theory, we establish new convergence guarantees for this asynchronous adaptation and extend existing results to important non-convex problem classes, including Gromov–Wasserstein as a notable example. We validate our method through a series of experiments demonstrating improved accuracy and faster convergence for a broad range of problems