AISTATS Poster $\beta$-th order Acyclicity Derivatives for DAG Learning

Poster

$\beta$ -th order Acyclicity Derivatives for DAG Learning

Haowei Chen · Adam Elmachtoub

[ Abstract ]

Abstract: We consider a non-convex optimization formulation for learning the weighted adjacency matrix

W

$W$ of a directed acyclic graph (DAG) that uses acyclicity constraints that are functions of

| W_{i j} |^{β}

$|W_{ij}|^\beta$ , for

β \in N

$\beta \in \mathbb{N}$ . State-of-the-art algorithms for this problem use gradient-based Karush-Kuhn-Tucker (KKT) optimality conditions which only yield useful search directions for

β = 1

$\beta =1$ . Therefore, constraints with

β \geq 2

$\beta \geq 2$ have been ignored in the literature, and their empirical performance remains unknown. We introduce

β

$\beta$ -th Order Taylor Series Expansion Based Local Search (

β

$\beta$ -LS) which yields actionable descent directions for any

β \in N

$\beta \in \mathbb{N}$ . Our empirical experiments show that

2

$2$ -LS obtains solutions of higher quality than

1

$1$ -LS,

3

$3$ -LS and

4

$4$ -LS.

2

$2$ -LSopt, an optimized version of

2

$2$ -LS, obtains high quality solutions significantly faster than the state of the art which uses

β = 1

$\beta=1$ . Moreover,

2

$2$ -LSopt does not need any graph-size specific hyperparameter tuning. We prove that

β

$\beta$ -LSopt is guaranteed to converge to a Coordinate-Wise Local Stationary Point (Cst) for any

β \in N

$\beta \in \mathbb{N}$ . If the objective function is convex,

β

$\beta$ -LSopt converges to a local minimum.

Live content is unavailable. Log in and register to view live content

Poster

ββ\beta-th order Acyclicity Derivatives for DAG Learning

Haowei Chen · Adam Elmachtoub

$\beta$ -th order Acyclicity Derivatives for DAG Learning