Data Appraisal Without Data Sharing

XINLEI XU · Awni Hannun · Laurens van der Maaten

[ Abstract ]
Tue 29 Mar 1 a.m. PDT — 2:30 a.m. PDT


One of the most effective approaches to improving the performance of amachine learning model is to procure additional training data.A model owner seeking relevant training data from a data owner needsto appraise the data before acquiring it.However, without a formal agreement, the data owner does not wantto share data.The resulting Catch-22 prevents efficient data marketsfrom forming.This paper proposes adding a data appraisal stage that requires nodata sharing between data owners and model owners. Specifically,we use multi-party computation to implement an appraisal functioncomputed on private data. The appraised value serves as a guide tofacilitate data selection and transaction. We propose an efficientdata appraisal method based on forward influence functions thatapproximates data value through its first-order lossreduction on the current model.The method requires no additional hyper-parameters or re-training.We show that in private, forward influence functions provide anappealing trade-off between high quality appraisal and required computation,in spite of label noise, class imbalance, and missing data.Our work seeks to inspire an open market that incentivizes efficient, equitable exchange of domain-specific training data.

Chat is not available.