AISTATS Poster Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

Poster

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

Danqi Liao

[ Abstract ]

Abstract: This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons

f^{*} (x) = \sum_{j = 1}^{k} σ^{*} (v_{j}^{T} x)

$f^*(x) = \sum_{j=1}^k \sigma^*(v_j^T x)$ where

v_{1}, . . ., v_{k}

$v_1, ..., v_k$ are unit vectors, and

σ^{*}

$\sigma^*$ lacks the first and second Hermite polynomials in its Hermite expansion. It is known that, for the single-index case (

k = 1

$k=1$ ), overcoming the search phase requires polynomial time complexity. We first generalize this result to multi-index functions characterized by vectors in arbitrary directions. After the search phase, it is not clear whether the network neurons converge to the index vectors, or get stuck at a sub-optimal solution.When the index vectors are orthogonal, we give a complete characterization of the fixed points and prove that neurons converge to the nearest index vectors. Therefore, using

n ≍ k \log k

$n \asymp k \log k$ neurons ensures finding the full set of index vectors with gradient flow with high probability over random initialization.When

v_{i}^{T} v_{j} = β \geq 0

$v_i^T v_j = \beta \geq 0$ for all

i \neq j

$i \neq j$ , we prove the existence of a sharp threshold

β_{c} = c / (c + k)

$\beta_c = c/(c+k)$ at which the fixed point that computes the average of the index vectors transitions from a saddle point to a minimum. Numerical simulations show that using a correlation loss and a mild overparameterization suffices to learn all of the index vectors when they are nearly orthogonal, however, the correlation loss fails when the dot product between the index vectors exceeds a certain threshold.

Live content is unavailable. Log in and register to view live content