The Information Geometry of Local Generalization Dynamics
Abstract
Information-theoretic bounds on generalization are foundational to learning theory, yet their static form offers limited insight into the dynamic, iterative nature of modern optimization. This gap is addressed herein by developing a local theory of generalization based on Euclidean Information Theory, where each update is modeled as a perturbation vector. The resulting analysis shows that the change in the generalization gap is bounded by the expected squared norm of this vector, a quantity interpreted as the local generalization cost. The proposed local bound is proven to be the first-order approximation of classic global bounds, revealing their underlying differential structure. Within this framework, it is further revealed that the optimal update direction is opposite to the gradient, offering an information-geometric justification for gradient descent. Finally, the overall theory is validated through experiments showing that the derived bound closely tracks training dynamics.