Rank Lifting and Random Non-Linear Maps
Andrea Drago · Maria Sofia Bucarelli · Francesco Caso · Marius Michetti · Federico Siciliano · Fabrizio Silvestri · Luca Becchetti
Abstract
Deep neural networks exhibit improved training and generalization performance as the number of parameters grows well beyond the size of the training set, contradicting classical intuitions about overfitting. In order to gain a better understanding this “benign overparameterization”, we analyze the representational capacity of a random one-hidden-layer perceptron with Gaussian weights (without bias) and threshold activations. More precisely, we investigate the following question: when does a hidden layer of dimension $n$ maps $k$ input vectors with pairwase angle at least $\theta$, to a full-rank activation matrix, thus ensuring that a simple linear classifier can perfectly fit those inputs in feature space? In particular this would have immediate impact on the memorization capacity at initialization. We frame this as a question about hyperplane arrangements on the unit sphere, and we prove new isoperimetric-like inequalities. This allows us to derive non-trivial lower bounds on the probability that a random embedding avoids the arrangement’s zero-measure regions. Our results show that once the hidden dimension exceeds a threshold (depending only weakly on $\theta$ and the input dimension), the hidden representations are linearly independent with high probability. While the case we consider is challenging due to the necessary sparsity of the solution space, this setting highlights crucial, underlying geometric problems and connections to a number of related questions in spherical geometry and linear algebra.
Successful Page Load