AISTATS Poster $f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization

Poster

$f$ -PO: Generalizing Preference Optimization with $f$ -divergence Minimization

Danqi Liao · Youngsuk Park · Hao Liu

[ Abstract ]

Abstract: Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces

f

$f$ -divergence Preference Optimization (

f

$f$ -PO), a novel framework that generalizes and extends existing approaches.

f

$f$ -PO minimizes

f

$f$ -divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of

f

$f$ -divergences. We provide theoretical analysis of

f

$f$ -PO's properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate

f

$f$ -PO's effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2, Arena-Hard, MT-Bench, and Open LLM Leaderboard v2. Additionally, we present ablation studies exploring the impact of different

f

$f$ -divergences, offering insights into the trade-offs between regularization and performance in offline preference optimization. Our work contributes both practical algorithms and theoretical understanding to the field of language model alignment. Code is available at https://github.com/MinkaiXu/fPO.

Live content is unavailable. Log in and register to view live content

Poster

fff-PO: Generalizing Preference Optimization with fff-divergence Minimization

Danqi Liao · Youngsuk Park · Hao Liu

$f$ -PO: Generalizing Preference Optimization with $f$ -divergence Minimization