Oral
Oral: Statistics
Auditorium 1
Transductive conformal inference with adaptive scores
Ulysse Gazin · Gilles Blanchard · Etienne Roquain
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test${+}$calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers
Arnab Auddy · Haolin Zou · Kamiar Rahnama Rad · Arian Maleki
The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding method to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with twice differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO's error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with the non-differentiable $\ell_1$ regularizer. We bound the error \(|{\rm ALO}-{\rm LO}|\) in terms of intuitive metrics such as the size of leave-\(i\)-out perturbations in active sets, sample size $n$, number of features $p$ and signal-to-noise ratio (SNR). As a consequence, for the $\ell_1$ regularized problems, we show that $|{\rm ALO}-{\rm LO}| \stackrel{p\rightarrow \infty}{\longrightarrow} 0$ while $n/p$ and SNR remain bounded.
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent
Pratik Patil · Yuchen Wu · Ryan Tibshirani
We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we show that LOOCV converges uniformly along the GD trajectory to the prediction risk. Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear. Furthermore, by leveraging the individual LOOCV errors, we construct consistent estimators for the entire prediction error distribution along the GD trajectory and consistent estimators for a wide class of error functionals. This in particular enables the construction of pathwise prediction intervals based on GD iterates that have asymptotically correct nominal coverage conditional on the training data.
In this paper, we address the problem of testing exchangeability of a sequence of random variables, $X_1, X_2,\cdots$. This problem has been studied under the recently popular framework of \emph{testing by betting}. But the mapping of testing problems to game is not one to one: many games can be designed for the same test. Past work established that it is futile to play single game betting on every observation: test martingales in the data filtration are powerless. Two avenues have been explored to circumvent this impossibility: betting in a reduced filtration (wealth is a test martingale in a coarsened filtration), or playing many games in parallel (wealth is an e-process in the data filtration). The former has proved to be difficult to theoretically analyze, while the latter only works for binary or discrete observation spaces. Here, we introduce a different approach that circumvents both drawbacks. We design a new (yet simple) game in which we observe the data sequence in pairs. Even though betting on individual observations is futile, we show that betting on pairs of observations is not. To elaborate, we prove that our game leads to a nontrivial test martingale, which is interesting because it has been obtained by shrinking the filtration very slightly. We show that our test controls type-1 error despite continuous monitoring, and is consistent for both binary and continuous observations, under a broad class of alternatives. Due to the shrunk filtration, optional stopping is only allowed at even stopping times: a relatively minor price. We provide a variety of simulations that align with our theoretical findings.