Demystifying Transition Matching: When and Why It Can Beat Flow Matching
Abstract
Flow Matching (FM) underpins many state-of-the-art generative models, yet recent results indicate that Transition Matching (TM) can achieve higher quality with fewer sampling steps. This work answers the question of when and why TM outperforms FM. First, we show that TM provably attains strictly lower KL divergence than FM at any finite number of steps and converges faster under fixed compute for unimodal Gaussian target distribution. Second, we extend the analysis to Gaussian mixtures and identify locally unimodal regimes where the dynamics approximate the unimodal case in which TM outperforms FM. The approximation error shrinks as component means separate, favoring TM when modes are well separated. However, as the target variance approaches zero, TM updates converge to FM updates, and performance advantage of TM vanishes. In summary, we show that TM outperforms FM when the target distribution has well-separated modes and non-negligible variances. We validate our analysis with experiments on Gaussian distributions, and extend the comparison to image and video generation.