Skip to yearly menu bar Skip to main content


DAGnosis: Localized Identification of Data Inconsistencies using Structures

Nicolas HUYNH · Jeroen Berrevoets · Nabeel Seedat · Jonathan CrabbĂ© · Zhaozhi Qian · Mihaela van der Schaar

MR1 & MR2 - Number 92
[ ]
Thu 2 May 8 a.m. PDT — 8:30 a.m. PDT


Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. While recent data-centric methods are able to identify such inconsistencies with respect to the training set, they suffer from two key limitations: (1) suboptimality in settings where features exhibit statistical independencies, due to their usage of compressive representations and (2) lack of localization to pin-point why a sample might be flagged as inconsistent, which is important to guide future data collection. We solve these two fundamental limitations using directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions. DAGnosis unlocks the localization of the causes of inconsistencies on a DAG, an aspect overlooked by previous approaches. Moreover, we show empirically that leveraging these interactions (1) leads to more accurate conclusions in detecting inconsistencies, as well as (2) provides more detailed insights into why some samples are flagged.

Live content is unavailable. Log in and register to view live content