Oral
Oral Session 8: Robustness, Calibration, Privacy & Evaluation (Applied)
Main Ballroom
Moderator: Aymeric Dieuleveut
Panprediction: Optimal Predictions for Any Downstream Task and Loss
Sivaraman Balakrishnan ⋅ Nika Haghtalab ⋅ Daniel Hsu ⋅ Brian Lee ⋅ Eric Zhao
Supervised learning is classically formulated as training a model to minimize a fixed loss function over a fixed distribution, or task. However, an emerging paradigm instead views model training as extracting enough information from data so that the model can be used to minimize many losses on many downstream tasks. We formalize a mathematical framework for this paradigm, which we call panprediction, and study its statistical complexity. Formally, panprediction generalizes omniprediction (Gopalan et al., 2021) and sits upstream from multi-group learning (Rothblum and Yona, 2021), which respectively focus on predictions that generalize to many downstream losses or many downstream tasks, but not both. Concretely, we design algorithms that learn deterministic and randomized panpredictors with $\tilde{O}(1/\varepsilon^3)$ and $\tilde{O}(1/\varepsilon^2)$ samples, respectively. Our results demonstrate that under mild assumptions, simultaneously minimizing infinitely many losses on infinitely many tasks can be as statistically easy as minimizing one loss on one task. Along the way, we improve the best known sample complexity guarantee of deterministic omniprediction by a factor of $1/\varepsilon$, and match all other known sample complexity guarantees of omniprediction and multi-group learning. Our key technical ingredient is a nearly lossless reduction from panprediction to a statistically efficient notion of calibration, called step calibration.
OEUVRE: OnlinE Unbiased Variance-Reduced loss Estimation
Kanad Pardeshi ⋅ Bryan Wilder ⋅ Aarti Singh
Online learning algorithms continually update their models as data arrive, making it essential to accurately estimate the expected loss at the current time step. The prequential method is an effective estimation approach which can be practically deployed in various ways. However, theoretical guarantees have previously been established under strong conditions on the algorithm, and practical algorithms have hyperparameters which require careful tuning. We introduce OEUVRE, an estimator that evaluates each incoming sample on the function learned at the current and previous time steps, recursively updating the loss estimate in constant time and memory. We use algorithmic stability, a property satisfied by many popular online learners, for optimal updates and prove consistency, convergence rates, and concentration bounds for our estimator. We design a method to adaptively tune OEUVRE's hyperparameters and test it across diverse online and stochastic tasks. We observe that OEUVRE matches or outperforms other estimators even when their hyperparameters are tuned with oracle access to ground truth.
On the calibration of survival models with competing risks
Julie Alberge ⋅ Tristan Haugomat ⋅ Gaël Varoquaux ⋅ Judith Abécassis
In survival analysis, accurate probability estimates are essential for decision-making, particularly in the competing-risks setting where multiple events are considered. Recent work has focused on the calibration of these probabilities in the survival analysis setting. Yet calibration in the competing-risks setting is both under-explored and harder, because it applies to both probabilities across classes and across time. We show that existing calibration measures are not suited to the competing-risk setting and that recent models do not give well-behaved probabilities. Competing risks need a dedicated calibration framework. For this, we introduce two well-behaved calibration measures, and related methods to estimate, test, and correct -recalibration. We show that these calibration scores lead to a principled statistical framework: they are minimized for oracle estimators (i.e., both measures are proper); they reveal calibration errors in modern models, corrected by our recalibration methods that yield good probabilities while preserving discrimination.
Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Hassan Baker ⋅ Austin J. Brockmeier
Detecting brain lesions as abnormalities observed in magnetic resonance imaging (MRI) is essential for diagnosis and treatment. In the search of abnormalities, such as tumors and malformations, radiologists may benefit from computer-aided diagnostics that use computer vision systems trained with machine learning to segment normal tissue from abnormal brain tissue. While supervised learning methods require annotated lesions, we propose a new unsupervised approach (Patch2Loc) that learns from normal patches taken from structural MRI. We train a neural network model to map a patch back to its spatial location within a slice of the brain volume. During inference, abnormal patches are detected by the anomaly score based on the error and variance of the location prediction. By applying the network in a convolutional manner, this generates a pixel-wise heatmap of anomalies providing finer-grained segmentation. We demonstrate the ability of our model to segment abnormal brain tissues by applying our approach to the detection of tumor tissues in MRI on T2-weighted images from BraTS2021 and MSLUB datasets and T1-weighted images from ATLAS and WMH datasets. We show that it outperforms the state-of-the art in unsupervised segmentation.
AMRM-Pure: Semantic-Preserving Adversarial Purification
Zhihao Dou ⋅ Zhiqiang Gao ⋅ Dongfei Cui ⋅ Weida Wang ⋅ Qinjian Zhao ⋅ Zhang Dinggen ⋅ Jun Yan ⋅ Zeke Xie ⋅ Shufei Zhang
Adversarial purification is a defense technique that employs generative models to remove adversarial perturbations. Current methods often rely on powerful generators, typically diffusion models, and focus on reducing the gap between adversarial and clean samples in the feature space, while overlooking semantic correlation within a single sample. To address this issue, we explore adversarial purification from the perspective of preserving semantic relationships among image patches. We employ an Attentive Mask Reconstruction Model (AMRM), which shows superior performance. Our theoretical and experimental analysis reveals that AMRM is highly sensitive to adversarial noise, as such noise significantly distorts patch relationships. Based on this observation, we propose AMRM-Pure, a purification framework that denoises adversarial inputs by preserving patch-level semantics, and formulate this process as a tractable optimization problem with respect to the input. To further enhance robustness, we finetune AMRM-Pure with classification loss to strengthen semantic consistency. We apply our insight to two AMRM architectures, including Mask Autoencoder (MAE) and MaskDiT. Extensive experiments confirm the effectiveness of our method, establishing new state-of-the-art performance across multiple benchmarks.
Scalable Utility-Aware Multiclass Calibration
Mahmoud Hegazy ⋅ Michael Jordan ⋅ Aymeric Dieuleveut
Ensuring that classifiers are well-calibrated, i.e., their predictions align with observed frequencies, is a minimal and fundamental requirement for classifiers to be viewed as trustworthy. Existing methods for assessing multiclass calibration often focus on specific aspects associated with prediction (e.g., top-class confidence, class-wise calibration) or utilize computationally challenging variational formulations. In this work, we study scalable \emph{evaluation} of multiclass calibration. To this end, we propose utility calibration, a general framework which measures the calibration error relative to a specific utility function that encapsulates the goals or decision criteria relevant to the end user. We demonstrate how this framework can unify and re-interpret several existing calibration metrics, particularly allowing for more robust versions of the top-class and class-wise calibration metrics, and going beyond such binarized approaches, towards assessing calibration for richer classes of downstream utilities.
Citizens' assemblies - small panels of citizens that convene to deliberate and make policy recommendations - often face the issue of panelists dropping out last-minute. These dropouts undermine two key goals: that the panel is (a) of a desired size, and (b) is descriptively representative of the population. This dropouts problem motivates the question: how can we choose the panel -or add extra participants to an existing panel - to ensure that after dropouts, the panel satisfied desiderata (a) and (b) to some guaranteed degree? The practical challenge is that panelists (or extras) must be selected before seeing who ultimately drops out. We model this problem as a minimax game: the minimizer chooses a panel (or extras); then, an adversary defines a randomization over dropouts from which the realized dropouts are drawn. The loss is then the deviation of the resulting panel from predefined descriptive representation targets. Our main contribution is an efficient loss-minimizing algorithm for selecting a panel (or extras), which achieves optimal expected loss even as we vary the adversary's power from worst case to average case. Our algorithm - which iteratively plays a projected gradient descent subroutine against a best-responder - also addresses a key issue left open by prior work on this problem: it allows us to control the selection probabilities with which we choose each potential panelist (or extra). We implement our algorithms and run them on datasets from real assemblies. We show robustness gains over previous algorithms, and we use our control over selection probabilities to offer the first exploration of trade-offs between randomness and representation in handling dropouts.
SetPINNs: Set-based Physics-informed Neural Networks
Mayank Nagda ⋅ Phil Ostheimer ⋅ Thomas Specht ⋅ Frank Rhein ⋅ Fabian Jirasek ⋅ Stephan Mandt ⋅ Marius Kloft ⋅ Sophie Fellenz
Physics-Informed Neural Networks (PINNs) solve partial differential equations using deep learning. However, conventional PINNs perform pointwise predictions that neglect dependencies within a domain, which may result in suboptimal solutions. We introduce SetPINNs, a framework that effectively captures local dependencies. With a finite element-inspired sampling scheme, we partition the domain into sets to model local dependencies while simultaneously enforcing physical laws. We provide a rigorous theoretical analysis showing that SetPINNs yield unbiased, lower-variance estimates of residual energy and its gradients, ensuring improved domain coverage and reduced residual error. Extensive experiments on synthetic and real-world tasks show improved accuracy, efficiency, and robustness.