AISTATS Poster Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

Poster

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

Aadirupa Saha · Suprovat Ghoshal

Virtual

[ Abstract ]

[ Visit Poster at Spot H1 in Virtual World ] [ Slides]

Abstract: We introduce the Correlated Preference Bandits problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of

$n$ items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general `low rank' choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of Block-Rank based RUM model, where the best item is shown to be

$(\epsilon,\delta)$ -PAC learnable with only

$O(r \epsilon^{-2} \log(n/\delta))$ samples. This improves on the standard sample complexity bound of

$\tilde{O}(n\epsilon^{-2} \log(1/\delta))$ known for the usual learning algorithms which might not exploit the item-correlations (

$r \ll n$ ). We complement the above sample complexity with a matching lower bound (up to logarithmic factors), justifying the tightness of our analysis. Further, we extend the results to a more general noisy Block-Rank model, which ensures robustness of our techniques. Overall, our results justify the advantage of playing subsetwise queries over pairwise preferences

$(k=2)$ , we show the latter provably fails to exploit correlation.

Chat is not available.