Skip to yearly menu bar Skip to main content


Poster 51

DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning

Batuhan Karaman · Aditya Rawal · Mohammad Ghavamzadeh · Suhaila Shakiah · Arijit Biswas · Ruida Zhou

Abstract

Log in and register to view live content