Skip to yearly menu bar Skip to main content


Poster

Pessimistic Off-Policy Multi-Objective Optimization

Shima Alizadeh · Aniruddha Bhargava · Karthick Gopalswamy · Lalit Jain · Branislav Kveton · Ge Liu

MR1 & MR2 - Number 22
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT

Abstract:

Multi-objective optimization is a class of optimization problems with multiple conflicting objectives. We study offline optimization of multi-objective policies from data collected by a previously deployed policy. We propose a pessimistic estimator for policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them.

Chat is not available.