Skip to yearly menu bar Skip to main content


Poster

Superiority of Multi-Head Attention: A Theoretical Study in Shallow Transformers in In-Context Linear Regression

Yingqian Cui · Jie Ren · Pengfei He · Hui Liu · Jiliang Tang · Yue Xing

Abstract

Chat is not available.