Skip to yearly menu bar Skip to main content


Poster 163

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Yunhao Tang · Taco Cohen · David Zhang · Gabriel Synnaeve · RĂ©mi Munos

Abstract

Log in and register to view live content