Process-Tensor Tomography of SGD: Measuring Non-Markovian Memory via Back-Flow of Distinguishability
Vasileios Sevetlidis · George Pavlidis
Abstract
We model neural training as a classical multi-time map from controllable interventions---batch choices, augmentations, and optimizer micro-steps---to model predictions on a fixed probe set. On this basis, we introduce a simple, model-agnostic witness of training memory based on back-flow of distinguishability. In a controlled two-step protocol, we compare predictive distributions after one intervention versus two; a positive increase $\Delta_{\mathrm{BF}} = D_2 - D_1 > 0$, with $D\in\{\mathrm{TV}, \mathrm{JS}, \mathrm{H}\}$, certifies observable-level non-Markovianity. Across controlled SGD experiments, we observe consistent positive back-flow with tight bootstrap confidence intervals, stronger effects under higher momentum, larger batch overlap, and more micro-steps, and marked collapse under a \emph{causal break} that resets optimizer state. The witness is inexpensive, requires no architectural changes, and is robust across TV/JS/Hellinger. We position this as a measurement contribution: a practical diagnostic, and empirical evidence, that real training often deviates from the Markov idealization in ways that matter for optimizer behavior, data order, and schedule design.
Successful Page Load