Skip to yearly menu bar Skip to main content


Poster

Superiority of Multi-Head Attention: A Theoretical Study in Shallow Transformers in In-Context Linear Regression

Yingqian Cui ⋅ Jie Ren ⋅ Pengfei He ⋅ Hui Liu ⋅ Jiliang Tang ⋅ Yue Xing

Abstract

Chat is not available.