Oral
Oral Session 2: Distribution Learning and Causality
Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some "pathologies" of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.
Uncovering causal relationships in datasets that include both categorical and continuous variables is a challenging problem. The overwhelming majority of existing methods restrict their application to dealing with a single type of variable. Our contribution is a structural causal model designed to handle mixed-type data through a general function class. We present a theoretical foundation that specifies the conditions under which the directed acyclic graph underlying the causal model can be identified from observed data. In addition, we propose Mixed-type data Extension for Regression and Independence Testing (MERIT), enabling the discovery of causal connections in real-world classification settings. Our empirical studies demonstrate that MERIT outperforms its state-of-the-art competitor in causal discovery on relatively low-dimensional data.
Distributional Counterfactual Explanations With Optimal Transport
Lei You · Lele Cao · Mattias Nilsson · Bo Zhao · Lei Lei
Counterfactual explanations (CE) are the de facto method of providing insight and interpretability in black-box decision-making models by identifying alternative input instances that lead to different outcomes. This paper extends the concept of CE to a distributional context, broadening the scope from individual data points to entire input and output distributions, named distributional counterfactual explanation (DCE). In DCE, we take the stakeholder's perspective and shift focus to analyzing the distributional properties of the factual and counterfactual, drawing parallels to the classical approach of assessing individual instances and their resulting decisions. We leverage optimal transport (OT) to frame a chance-constrained optimization problem, aiming to derive a counterfactual distribution that closely aligns with its factual counterpart, substantiated by statistical confidence. Our proposed optimization method, Discount, strategically balances this confidence in both the input and output distributions. This algorithm is accompanied by an analysis of its convergence rate. The efficacy of our proposed method is substantiated through a series of quantitative and qualitative experiments, highlighting its potential to provide deep insights into decision-making models.
Importance-weighted Positive-unlabeled Learning for Distribution Shift Adaptation
Atsutoshi Kumagai · Tomoharu Iwata · Hiroshi Takahashi · Taishi Nishiyama · Yasuhiro Fujiwara
Positive and unlabeled (PU) learning is a fundamental task in many applications, which trains a binary classifier from only PU data. Existing PU learning methods typically assume that training and test distributions are identical. However, this assumption is often violated due to distribution shifts, and identifying shift types such as covariate and concept shifts is generally difficult. In this paper, we propose a distribution shift adaptation method for PU learning without assuming shift types by using a few PU data in the test distribution and PU data in the training distribution. Our method is based on the importance weighting, which learns the classifier in a principled manner by minimizing the importance-weighted training risk that approximates the test risk. Although existing methods require positive and negative data in both distributions for the importance weighting without assuming shift types, we theoretically show that it can be performed with only PU data in both distributions. Based on this finding, our neural network-based classifiers can be effectively trained by iterating the importance weight estimation and classifier learning. We show that our method outperforms various existing methods with seven real-world datasets.
On Distributional Discrepancy for Experimental Design with General Assignment Probabilities
Anup Rao · Peng Zhang
We investigate experimental design for randomized controlled trials (RCTs) with both equal and unequal treatment-control assignment probabilities. Our work makes progress on the connection between the distributional discrepancy minimization (DDM) problem introduced by Harshaw et al. (2024) and the design of RCTs. We make two main contributions: First, we prove that approximating the optimal solution of the DDM problem within a certain constant error is NP-hard. Second, we introduce a new Multiplicative Weights Update (MWU) algorithm for the DDM problem, which improves the Gram-Schmidt walk algorithm used by Harshaw et al. (2024) when assignment probabilities are unequal. Building on the framework of Harshaw et al. (2024) and our MWU algorithm, we then develop the MWU design, which reduces the worst-case mean-squared error in estimating the average treatment effect. Finally, we present a comprehensive simulation study comparing our design with commonly used designs.