Skip to yearly menu bar Skip to main content


Poster 158

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Yunhao Tang ⋅ Taco Cohen ⋅ David Zhang ⋅ Gabriel Synnaeve ⋅ Rémi Munos

Abstract

Log in and register to view live content