Adversarial Debiasing for Parameter Recovery
Abstract
Advances in machine learning and the increasing availability of high-dimensional data have led to the proliferation of social science research that uses the predictions of machine learning models as proxies for outcomes of interest. However, prediction errors from machine learning models can lead to bias in downstream estimation tasks, including regression. In this paper, we show how this bias can arise, propose a test for detecting bias, and demonstrate the use of an adversarial machine learning algorithm in order to generate predictions suitable for unbiased downstream estimation. Here, we focus on a setting where machine-learned predictions are the dependent variable in a regression. We conduct simulations and empirical exercises using ground truth and satellite data on forest cover in Africa. Using the predictions from a naive machine learning model leads to biased parameter estimates, while the predictions from the adversarial model recover the true coefficients. Our approach consistently matches or exceeds the performance of existing methods.