Moderators: Antonio Artés · Liping Liu
Arman Adibi · Aryan Mokhtari · Hamed Hassani
Minimax optimization has been central in addressing various applications in machine learning, game theory, and control theory. Prior literature has thus far mainly focused on studying such problems in the continuous domain, e.g., convex-concave minimax optimization is now understood to a significant extent. Nevertheless, minimax problems extend far beyond the continuous domain to mixed continuous-discrete domains or even fully discrete domains. In this paper, we study mixed continuous-discrete minimax problems where the minimization is over a continuous variable belonging to Euclidean space and the maximization is over subsets of a given ground set. We introduce the class of convex-submodular minimax problems, where the objective is convex with respect to the continuous variable and submodular with respect to the discrete variable. Even though such problems appear frequently in machine learning applications, little is known about how to address them from algorithmic and theoretical perspectives. For such problems, we first show that obtaining saddle points are hard up to any approximation, and thus introduce new notions of (near-) optimality. We then provide several algorithmic procedures for solving convex and monotone-submodular minimax problems and characterize their convergence rates, computational complexity, and quality of the final solution according to our notions of optimally. Our proposed algorithms are iterative and combine tools from both discrete and continuous optimization. Finally, we provide numerical experiments to showcase the effectiveness of our purposed methods.
Jun Ho Yoon · Daniel Jeong · Seyoung Kim
We address the multi-task Gaussian process (GP) regression problem with the goal of decomposing input effects on outputs into components shared across or specific to tasks and samples. We propose a family of mixed-effects GPs, including doubly and translated mixed-effects GPs, that performs such a decomposition, while also modeling the complex task relationships. Instead of the tensor product widely used in multi-task GPs, we use the direct sum and Kronecker sum for Cartesian product to combine task and sample covariance functions. With this kernel, the overall input effects on outputs decompose into four components: fixed effects shared across tasks and across samples and random effects specific to each task and to each sample. We describe an efficient stochastic variational inference method for our proposed models that also significantly reduces the cost of inference for the existing mixed-effects GPs. On simulated and real-world data, we demonstrate that our approach provides higher test accuracy and interpretable decomposition.
Hugh Dance · Brooks Paige
Variable selection in Gaussian processes (GPs) is typically undertaken by thresholding the inverse lengthscales of automatic relevance determination kernels, but in high-dimensional datasets this approach can be unreliable. A more probabilistically principled alternative is to use spike and slab priors and infer a posterior probability of variable inclusion. However, existing implementations in GPs are very costly to run in both high-dimensional and large-n datasets, or are only suitable for unsupervised settings with specific kernels. As such, we develop a fast and scalable variational inference algorithm for the spike and slab GP that is tractable with arbitrary differentiable kernels. We improve our algorithm's ability to adapt to the sparsity of relevant variables by Bayesian model averaging over hyperparameters, and achieve substantial speed ups using zero temperature posterior restrictions, dropout pruning and nearest neighbour minibatching. In experiments our method consistently outperforms vanilla and sparse variational GPs whilst retaining similar runtimes (even when n=10^6) and performs competitively with a spike and slab GP using MCMC but runs up to 1000 times faster.
Ningyuan Chen · Xuefeng GAO · Yi Xiong
It has been recently shown in the literature (Nie et al, 2018; Shin et al, 2019a,b) that the sample averages from online learning experiments are biased when used to estimate the mean reward. To correct the bias, off-policy evaluation methods, including importance sampling and doubly robust estimators, typically calculate the conditional propensity score, which is ill-defined for non-randomized policies such as UCB. This paper provides a procedure to debias the samples using bootstrap, which doesn't require the knowledge of the reward distribution and can be applied to any adaptive policies. Numerical experiments demonstrate the effective bias reduction for samples generated by popular multi-armed bandit algorithms such as Explore-Then-Commit (ETC), UCB, Thompson sampling (TS) and $\epsilon$-greedy (EG). We analyze and provide theoretical justifications for the procedure under the ETC algorithm, including the asymptotic convergence of the bias decay rate in the real and bootstrap worlds.