Skip to yearly menu bar Skip to main content


Using Sliced Mutual Information to Study Memorization and Generalization in Deep Neural Networks

Shelvia Wongso · Rohan Ghosh · Mehul Motani

Auditorium 1 Foyer 27

Abstract: In this paper, we study the memorization and generalization behaviour of deep neural networks (DNNs) using sliced mutual information (SMI), which is the average of the mutual information (MI) between one-dimensional random projections. We argue that the SMI between features in a DNN ($T$) and ground truth labels ($Y$), $SI(T;Y)$, can be seen as a form of \textit{usable information} that the features contain about the labels. We show theoretically that $SI(T;Y)$ can encode geometric properties of the feature distribution, such as its spherical soft-margin and intrinsic dimensionality, in a way that MI cannot. Additionally, we present empirical evidence showing how $SI(T;Y)$ can capture memorization and generalization in DNNs. In particular, we find that, in the presence of label noise, all layers start to memorize but the earlier layers stabilize more quickly than the deeper layers. Finally, we point out that, in the context of Bayesian Neural Networks, the SMI between the penultimate layer and the output represents the worst case uncertainty of the network's output.

Live content is unavailable. Log in and register to view live content