AISTATS 2025 Invited Talks

The predictive nature of Bayesian inference

Fri 2 May 7 p.m. PDT

We explore the predictive viewpoint of Bayesian inference that bypasses the conventional reliance on priors and likelihoods. By treating the joint predictive distribution of observables as the fundamental element, we gain fresh insights into the nature of Bayesian reasoning and derive principled generalizations – including new methods such as martingale posteriors. The predictive perspective improves our understanding of uncertainty quantification and can facilitate more adaptable, data-driven approaches to probabilistic modelling.

Chris Holmes

Chris Holmes is Programme Director for Health and Medical Sciences at The Alan Turing Institute.

He is Professor of Biostatistics at the University of Oxford with a joint appointment between the Department of Statistics and the Nuffield Department of Clinical Medicine through the Wellcome Trust Centre for Human Genetics and the Li Ka Shing Centre for Health Innovation and Discovery.

Before joining Oxford, Chris was based at Imperial College, London, and also worked in industry conducting research in scientific computing. He holds a Programme Leader’s award in Statistical Genomics from the Medical Research Council UK. In 2016, WIRED UK magazine named him one of the ‘Innovators of the year in AI’.

Chris has a broad interest in the theory, methods and applications of statistics and statistical modelling. He is particularly interested in pattern recognition and nonlinear, nonparametric statistical machine learning methods applied to the genomic sciences and genetic epidemiology.

Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning

Sat 3 May 7 p.m. PDT

A central problem in unsupervised deep learning is how to find useful representations of high-dimensional data, sometimes called "disentanglement". Most approaches are heuristic and lack a proper theoretical foundation. In linear representation learning, independent component analysis (ICA) has been successful in many applications areas, and it is principled, i.e., based on a well-defined probabilistic model. However, extension of ICA to the nonlinear case has been problematic due to the lack of identifiability, i.e., uniqueness of the representation. Recently, nonlinear extensions that utilize temporal structure or some auxiliary information have been proposed. Such models are in fact identifiable, and consequently, an increasing number of algorithms have been developed. In particular, some self-supervised algorithms can be shown to estimate nonlinear ICA, even though they have initially been proposed from heuristic perspectives. This talk reviews the state-of-the-art of nonlinear ICA theory and algorithms, based on a review paper available at https://arxiv.org/pdf/2303.16535 .

Aapo Hyvarinen

Aapo Hyvarinen studied undergraduate mathematics at the universities of Helsinki (Finland), Vienna (Austria), and Paris (France), and obtained a Ph.D. degree in Information Science at the Helsinki University of Technology in 1997. After post-doctoral work at the Helsinki University of Technology, he moved to the University of Helsinki in 2003. In 2008, he was appointed Professor of Computational Data Analysis, and in 2013, Professor of Computer Science. From 2016 to 2019, he was on leave and in the position of Professor of Machine Learning at the Gatsby Computational Neuroscience Unit, University College London, UK.

Understanding Inference Time Compute: Self-Improvement and Scaling

Sun 4 May 7 p.m. PDT

Inference-time compute has emerged as a new axis for scaling large language models, leading to breakthroughs in AI reasoning. Broadly speaking, inference-time compute methods involve allowing the language model to interact with a verifier to search for desirable, high-quality, or correct responses. While recent breakthroughs involve using a ground-truth verifier of correctness, it is also possible to invoke the language model itself or an otherwise learned model as verifiers. These latter protocols raise the possibility of self-improvement, whereby the AI system evaluates and refines its own generations to achieve higher performance.

This talk presents new understanding of and new algorithms for language model self-improvement. The first part of the talk focuses on a new perspective on self-improvement that we refer to as sharpening, whereby we "sharpen" the model toward one placing large probability mass on high-quality sequence, as measured by the language model itself. We show how the sharpening process can be done purely at inference time or amortized into the model via post-training, thereby avoiding expensive inference-time computation. In the second part of the talk, we consider the more general setting of a learned reward model, show that the performance of naive-but-widely-used inference-time compute strategies does not improve monotonically with compute, and develop a new compute-monotone algorithm with optimal statistical performance.

Based on joint works with Audrey Huang, Dhruv Rohatgi, Adam Block, Qinghua Liu, Jordan T. Ash, Cyril Zhang, Max Simchowitz, Dylan J. Foster and Nan Jiang.

Akshay Krishnamurthy

Akshay is a senior principal research manager at Microsoft Research, New York City. Previously, he spent two years as an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst and a year as a postdoctoral researcher at Microsoft Research, NYC. Before that, he completed my PhD in the Computer Science Department at Carnegie Mellon University, advised by Aarti Singh. He received his undergraduate degree in EECS at UC Berkeley.