Skip to yearly menu bar Skip to main content


Awards

Paper Awards Talks

Abstract:
Chat is not available.


1
Best Student Paper
Pick-to-Learn and Self-Certified Gaussian Process Approximations

Daniel Marks · Dario Paccagnan

Generalisation bounds are crucial for providing data-driven models with performance and safety guarantees. In this respect, bounds that do not require a held-out test set are particularly valuable as they allow the use of all data for training. While many such bounds do not improve upon the train-test approach, which remains the gold standard, the P2L algorithm (Paccagnan et al., 2023) has shown great potential. However, P2L comes with limitations, including computational overhead, reliance on consistent data, and restriction to non-Bayesian settings. In this work, we overcome these challenges in general settings and employ the corresponding results to show that classical Gaussian process (GP) training procedures can be interpreted as instantiations of P2L, thus inheriting tight, self-certified bounds. Three contributions underpin these conclusions. First, we introduce early stopping in P2L, equipping it with a tight generalisation bound to reduce training costs and address the non-consistent case. Second, we adapt P2L to the Bayesian setting and demonstrate its equivalence to posterior updating in a hierarchical model. Third, we show that greedy subset-of-data GPs are special P2L instantiations. Numerical evidence shows that the resulting P2L bounds we obtain compare favourably with the train-test and PAC-Bayes approaches on various real-world datasets.

Given an intractable target density $p$, variational inference (VI) attempts to find the best approximation $q$ from a tractable family $\mathcal Q$. This is typically done by minimizing the exclusive Kullback-Leibler divergence, $\text{KL}(q||p)$. In practice, $\mathcal Q$ is not rich enough to contain $p$, and the approximation is misspecified even when it is a unique global minimizer of $\text{KL}(q||p)$. In this paper, we analyze the robustness of VI to these misspecifications when $p$ exhibits certain symmetries and $\mathcal Q$ is a location-scale family that shares these symmetries. We prove strong guarantees for VI not only under mild regularity conditions but also in the face of severe misspecifications. Namely, we show that (i) VI recovers the mean of $p$ when $p$ exhibits an even symmetry, and (ii) it recovers the correlation matrix of $p$ when in addition $p$ exhibits an elliptical symmetry. These guarantees hold for the mean even when $q$ is factorized and $p$ is not, and for the correlation matrix even when $q$ and $p$ behave differently in their tails. We analyze various regimes of Bayesian inference where these symmetries are useful idealizations, and we also investigate experimentally how VI behaves in their absence.


Test-of-Time Award
Deeply supervised Nets

Chen-Yu Lee · Saining Xie · Patrick Gallagher · Zhengyou Zhang · Zhuowen Tu

We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. We focus our attention on three aspects of traditional convolutional-neural-network-type (CNN-type) architectures: (1) transparency in the effect intermediate layers have on overall classification; (2) discriminativeness and robustness of learned features, especially in early layers; (3) training effectiveness in the face of “vanishing” gradients. To combat these issues, we introduce “companion” objective functions at each hidden layer, in addition to the overall objective function at the output layer (an integrated strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.