Skip to yearly menu bar Skip to main content


Poster 149

DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning

Batuhan Karaman ⋅ Aditya Rawal ⋅ Mohammad Ghavamzadeh ⋅ Suhaila Shakiah ⋅ Arijit Biswas ⋅ Ruida Zhou

Abstract

Log in and register to view live content