Abstract:
We study pure exploration in bandit problems with vector-valued rewards, where the goal is to (approximately) identify the Pareto set of arms given incomplete preferences induced by a polyhedral convex cone. We address the open problem of designing sample-efficient learning algorithms for such problems. We propose Pareto Vector Bandits (PaVeBa), an adaptive elimination algorithm that nearly matches the gap-dependent and worst-case lower bounds on the sample complexity of $(\epsilon, \delta)$-PAC Pareto set identification. Finally, we provide an in-depth numerical investigation of PaVeBa and its heuristic variants by comparing them with the state-of-the-art multi-objective and vector optimization algorithms on several real-world datasets with conflicting objectives.
Chat is not available.