Skip to yearly menu bar Skip to main content


Pessimistic Off-Policy Multi-Objective Optimization

Shima Alizadeh · Aniruddha Bhargava · Karthick Gopalswamy · Lalit Jain · Branislav Kveton · Ge Liu

MR1 & MR2 - Number 22
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT


Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.

Live content is unavailable. Log in and register to view live content