Rank Lifting and Random Non-Linear Maps
Andrea Drago · Maria Sofia Bucarelli · Francesco Caso · Marius Michetti · Federico Siciliano · Fabrizio Silvestri · Luca Becchetti
Abstract
Deep neural networks exhibit improved training and generalization performance as the number of parameters grows well beyond the size of the training set, contradicting classical intuitions about overfitting. In order to gain a better understanding of this “benign overparameterization”, we analyze the representational capacity of a random one-hidden-layer perceptron with Gaussian weights, no bias and threshold activations. More precisely, we investigate the following question: when does a hidden layer of dimension $n$ maps $k$ input vectors with pairwise angles at least $\theta$, to a full-rank activation matrix, thus ensuring that a simple linear classifier can perfectly fit those inputs in feature space? This problem has an immediate impact on memorization capacity at initialization and we frame it as a question about hyperplane arrangements on the unit sphere, and we prove new isoperimetric-like inequalities. This allows us to derive non-trivial lower bounds on the probability that a random embedding avoids the arrangement’s zero-measure regions. Our results show that once the hidden dimension exceeds a threshold (depending on $\theta$ and the input dimension), hidden representations are linearly independent with high probability. While the case we consider is challenging due to the sparsity of the solution space, this setting highlights crucial, underlying geometric problems and connections to related questions in spherical geometry and linear algebra.
Successful Page Load