Poster
Learning the Pareto Front Using Bootstrapped Observation Samples
Danqi Liao · Adam Elmachtoub · assaf zeevi
We consider Pareto front identification~(PFI) for linear bandits (PFILin), i.e., the goal is to identify a set of arms with undominated mean reward vectors when the mean reward vector is a linear function of the context. PFILin includes the best arm identification problem and multi-objective active learning as special cases. The sample complexity of our proposed algorithm is optimal up to a logarithmic factor. In addition, the regret incurred by our algorithm during the estimation is within a logarithmic factor of the optimal regret among all algorithms that identify the Pareto front.Our key contribution is a new estimator that in every round updates the estimate for the unknown parameter along \emph{multiple} context directions -- in contrast to the conventional estimator that only updatesthe parameter estimate along the chosen context. This allows us to use low-regret arms to collect information about Pareto optimal arms. Our key innovation is to reuse the explorationsamples multiple times; in contrast to conventional estimators that use each sample only once. Numerical experiments demonstrate that the proposed algorithm successfully identifies the Pareto front while controlling the regret.
Live content is unavailable. Log in and register to view live content