Poster
On Convergence in Wasserstein Distance and f-divergence Minimization Problems
Cheuk Ting Li · Jingwei Zhang · Farzan Farnia
MR1 & MR2 - Number 123
The zero-sum game in generative adversarial networks (GANs) for learning the distribution of observed data is known to reduce to the minimization of a divergence measure between the underlying and generative models. However, the current theoretical understanding of the role of the target divergence in the characteristics of GANs' generated samples remains largely inadequate. In this work, we aim to analyze the influence of the divergence measure on the local optima and convergence properties of divergence minimization problems in learning a multi-modal data distribution. We show a mode-seeking f-divergence, e.g. the Jensen-Shannon (JS) divergence in the vanilla GAN, could lead to poor locally optimal solutions missing some underlying modes. On the other hand, we demonstrate that the optimization landscape of 1-Wasserstein distance in Wasserstein GANs does not suffer from such suboptimal local minima. Furthermore, we prove that a randomly-initialized gradient-based optimization of the Wasserstein distance will, with high probability, capture all the existing modes. We present numerical results on standard image datasets, revealing the success of Wasserstein GANs compared to JS-GANs in avoiding suboptimal local optima under a mixture model.