Skip to yearly menu bar Skip to main content


Poster

Memorization in Attention-only Transformers

Danqi Liao · Muni Sreenivas Pydi


Abstract:

Recent research has explored the memorization capacity of multi-head attention, but thesefindings are constrained by unrealistic limitations on the context size. We present a novel prooffor language-based Transformers that extends the current hypothesis to any context size. Ourapproach improves upon the state-of-the-art by achieving more effective exact memorizationwith an attention layer, while also introducing the concept of approximate memorization ofdistributions. Through experimental validation, we demonstrate that our proposed boundsmore accurately reflect the true memorization capacity of language models, and provide a precisecomparison with prior work.

Live content is unavailable. Log in and register to view live content