Poster

Density Ratio Estimation and Neyman Pearson Classification with Missing Data

Josh Givens · Henry W. J. Reeve · Song Liu

2023 Poster

Abstract

Density Ratio Estimation (DRE) is an important machine learning technique with many downstream applications. We consider the challenge of DRE with non-uniformly missing data. In this setting, we show that using standard DRE methods leads to biased results while our proposal (M-KLIEP), an adaptation of the popular DRE procedure KLIEP, restores consistency. Moreover, we provide finite sample estimation error bounds for M-KLIEP, which demonstrate minimax optimality with respect to both sample size and worst-case missingness. We then adapt an important downstream application of DRE, Neyman-Pearson (NP) classification, to this missing data setting. Our procedure both controls Type I error and achieves high power, with high probability. Finally, we demonstrate promising empirical performance on a range of both synthetic data and real world data with simulated missingness.