Session

Oral 3: Causality / Trustworthy ML

Moderators: Ricardo Silva · Yingzhen Li



Mon 28 Mar 6 a.m. PDT — 7 a.m. PDT

Abstract:

Chat is not available.

Mon 28 March 6:00 - 6:15 PDT

(Oral)
Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions

Vibhor Porwal · Piyush Srivastava · Gaurav Sinha

A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better.

Mon 28 March 6:15 - 6:30 PDT

(Oral)
Variance Minimization in the Wasserstein Space for Invariant Causal Prediction

Guillaume Martinet · Alexander Strzalkowski · Barbara Engelhardt

Selecting powerful predictors for an outcome is a cornerstone task for machine learning. However, some types of questions can only be answered by identifying the predictors that causally affect the outcome. A recent approach to this causal inference problem leverages the invariance property of a causal mechanism across differing experimental environments (Peters et al., 2016; Heinze-Deml et al., 2018). This method, invariant causal prediction (ICP), has a substantial computational defect -- the runtime scales exponentially with the number of possible causal variables. In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors. Each of these tests relies on the minimization of a novel loss function -- the Wasserstein variance -- that is derived from tools in optimal transport theory and is used to quantify distributional variability across environments. We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.

Mon 28 March 6:30 - 6:45 PDT

(Oral)
On the Assumptions of Synthetic Control Methods

Claudia Shi · Dhanya Sridhar · Vishal Misra · David Blei

Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy.The idea of synthetic controls is to approximate one unit's counterfactual outcomes using a weighted combination of some other units' observed outcomes.The motivating question of this paper is: how does the SC strategy lead to valid causal inferences?We address this question by re-formulating the causal inference problem targeted by SC with a more fine-grained model, where we change the unit of analysis from large units" (e.g., states) tosmall units" (e.g., individuals in states).Under the re-formulation, we derive sufficient conditions for the non-parametric causal identification of the causal effect.We show that, in some settings, existing linear SC estimators are valid even when the data generating process is non-linear.We highlight two implications of the reformulation: 1) it clarifies where ``linearity" comes from, and how it falls naturally out of the more fine-grained and flexible model; 2) it suggests new ways of using available data with SC methods for valid causal inference, in particular, new ways of selecting observations from which to estimate the counterfactual.

Mon 28 March 6:45 - 7:00 PDT

(Oral)
Differentially Private Densest Subgraph

Alireza Farhadi · MohammadTaghi Hajiaghai · Elaine Shi

Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, e.g., community detection in social networks, link spam detection, correlation mining, bioinformatics, and so on. Although there are efficient algorithms that output either exact or approximate solutions to the densest subgraph problem, existing algorithms may violate the privacy of the individuals in the network, e.g., leaking the existence/non-existence of edges.In this paper, we study the densest subgraph problem in the framework of the differential privacy, and we derive the upper and lower bounds for this problem. We show that there exists a linear-time \epsilon-differentially private algorithm that finds a 2-approximation of the densest subgraph with an extra poly-logarithmic additive error. Our algorithm not only reports the approximate density of the densest subgraph, but also reports the vertices that form the densesubgraph.Our upper bound almost matches the famous 2-approximation by Charikar both in performance and in approximation ratio, but we additionally achieve differential privacy. In comparison with Charikar’s algorithm, our algorithm has an extra poly logarithmic additive error. We partly justify the additive error with a new lower bound, showing that for any differentially private algorithm that provides a constant-factor approximation, a sub-logarithmic additive erroris inherent.We also practically study our differentially private algorithm on real-world graphs, and we show that in practice the algorithm finds a solution which is very close to the optimal.