Moderators: Fernando Perez-Cruz · Quanquan Gu
Ulysse Marteau-Ferey · Francis Bach · Alessandro Rudi
In many areas of applied statistics and machine learning, generating an arbitrary number of inde- pendent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implemen- tations. Instead, we take a two-step approach by first modeling the probability distribution and then sampling from that model. We use the recently introduced class of positive semi-definite (PSD) models which have been shown to be ecient for approximating probability densities. We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to eectively sample from these models. We also present preliminary empirical results to illustrate our assertions.
Kirill Neklyudov · Max Welling
Markov Chain Monte Carlo (MCMC) algorithms ubiquitously employ complex deterministic transformations to generate proposal points that are then filtered by the Metropolis-Hastings-Green (MHG) test. However, the condition of the target measure invariance puts restrictions on the design of these transformations. In this paper, we first derive the acceptance test for the stochastic Markov kernel considering arbitrary deterministic maps as proposal generators. When applied to the transformations with orbits of period two (involutions), the test reduces to the MHG test. Based on the derived test we propose two practical algorithms: one operates by constructing periodic orbits from any diffeomorphism, another on contractions of the state space (such as optimization trajectories). Finally, we perform an empirical study demonstrating the practical advantages of both kernels.
Ilias Diakonikolas · Daniel Kane · Pasin Manurangsi · Lisheng Ren
We study the problem of distribution-free learning of a single neuronunder adversarial label noise with respect to the squared loss.For a wide range of activation functions, including ReLUs and sigmoids,we prove hardness of learning results in the Statistical Query model andunder a well-studied assumption on the complexity of refuting XOR formulas.Specifically, we establish that no polynomial-time learning algorithm, even improper,can approximate the optimal loss value within any constant factor.
Nicole Mücke · Enrico Reiss · Jonas Rungenhagen · Markus Klein
While large training datasets generally offer improvement in model performance, thetraining process becomes computationally expensive and time consuming. Distributedlearning is a common strategy to reduce the overall training time by exploiting multiplecomputing devices. Recently, it has been observed in the single machine setting thatoverparameterization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.