Skip to yearly menu bar Skip to main content


Understanding Inverse Scaling and Emergence in Multitask Representation Learning

Muhammed Ildiz · Zhe Zhao · Samet Oymak

MR1 & MR2 - Number 116
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT


Large language models exhibit strong multitasking capabilities, however, their learning dynamics as a function of task characteristics, sample size, and model complexity remain mysterious. For instance, it is known that, as the model size grows, large language models exhibit emerging abilities where certain tasks can abruptly jump from poor to respectable performance. Such phenomena motivate a deeper understanding of how individual tasks evolve during multitasking. To this aim, we study a multitask representation learning setup where tasks can have distinct distributions, quantified by their covariance priors. Through random matrix theory, we precisely characterize the optimal linear representation for few-shot learning that minimizes the average test risk in terms of task covariances. When tasks have equal sample sizes, we prove a reduction to an equivalent problem with a single effective covariance from which the individual task risks of the original problem can be deduced. Importantly, we introduce “task competition” to explain how tasks with dominant covariance eigenspectrum emerge faster than others. We show that task competition can potentially explain the inverse scaling of certain tasks i.e. reduced test accuracy as the model grows. Overall, this work sheds light on the risk and emergence of individual tasks and uncovers new high-dimensional phenomena (including multiple-descent risk curves) that arise in multitask representation learning.

Live content is unavailable. Log in and register to view live content