On the Finite-Sample Bias of Minimizing Expected Wasserstein Loss Between Empirical Distributions
Abstract
We show that minimizing the expected Wasserstein loss between empirical distributions can lead to biased parameter estimates in the finite-sample regime. Remarkably, such bias arises even in well-specified settings where both empirical distributions are drawn from the same parametric family: unlike maximum likelihood estimation—understood here as maximizing the expected log-likelihood—optimizing one parameter while fixing another fails to recover the true fixed value. We derive closed-form expressions for the expected Wasserstein loss in one dimension and, focusing on location–scale models, provide an analytic characterization of the resulting bias. This analysis reveals that finite-sample bias occurs whenever the expected loss varies along the diagonal subspace where the parameter values coincide, and we propose a simple correction scheme that removes this effect. We further extend the analysis to misspecified models and to the Sinkhorn divergence, demonstrating that finite-sample bias persists in more practical settings. Experiments on synthetic and real data confirm that stochastic optimization of Wasserstein-based objectives converges to biased solutions, and validate the effectiveness of the proposed correction scheme.