Skip to yearly menu bar Skip to main content


Poster

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

Danqi Liao


Abstract: This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons f(x)=j=1kσ(vjTx) where v1,...,vk are unit vectors, and σ lacks the first and second Hermite polynomials in its Hermite expansion. It is known that, for the single-index case (k=1), overcoming the search phase requires polynomial time complexity. We first generalize this result to multi-index functions characterized by vectors in arbitrary directions. After the search phase, it is not clear whether the network neurons converge to the index vectors, or get stuck at a sub-optimal solution.When the index vectors are orthogonal, we give a complete characterization of the fixed points and prove that neurons converge to the nearest index vectors. Therefore, using nklogk neurons ensures finding the full set of index vectors with gradient flow with high probability over random initialization.When viTvj=β0 for all ij, we prove the existence of a sharp threshold βc=c/(c+k) at which the fixed point that computes the average of the index vectors transitions from a saddle point to a minimum. Numerical simulations show that using a correlation loss and a mild overparameterization suffices to learn all of the index vectors when they are nearly orthogonal, however, the correlation loss fails when the dot product between the index vectors exceeds a certain threshold.

Live content is unavailable. Log in and register to view live content