Skip to yearly menu bar Skip to main content


Poster

Learning from biased positive-unlabeled data via threshold calibration

PaweÅ‚ Teisseyre · Timo Martens · Jessa Bekker · Jesse Davis

Hall A-E 6
[ ]
 
Oral presentation: Oral Session 7: Robust Learning
Mon 5 May midnight PDT — 1 a.m. PDT

Abstract:

Learning from positive and unlabeled data (PU learning) aims to train a binary classification model when only positive and unlabeled examples are available. Typically, learners assume that there is a labeling mechanism that determines which positive labels are observed. A particularly challenging setting arises when the observed positive labels are a biased sample from the positive distribution. Current approaches either require estimating the propensity scores, which are the instance-specific probabilities that a positive example's label will be observed, or make overly restricting assumptions about the labeling mechanism. We make a novel assumption about the labeling mechanism which we show is more general than several commonly used existing ones. Moreover, the combination of our novel assumption and theoretical results from robust statistics can simplify the process of learning from biased PU data. Empirically, our approach offers superior predictive and run time performance compared to the state-of-the-art methods.

Live content is unavailable. Log in and register to view live content