Poster
Monitoring machine learning-based risk prediction algorithms in the presence of performativity
Jean Feng · Alexej Gossmann · Gene Pennello · Nicholas Petrick · Berkman Sahiner · Romain Pirracchio
MR1 & MR2 - Number 72
Performance monitoring of machine learning (ML)-based risk prediction models in healthcare is complicated by the issue of performativity: when an algorithm predicts a patient to be at high risk for an adverse event, clinicians are more likely to administer prophylactic treatment and alter the very target that the algorithm aims to predict. A simple approach is to ignore performativity and monitor only the untreated patients, whose outcomes remain unaltered. In general, ignoring performativity may inflate Type I error because (i) untreated patients disproportionally represent those with low predicted risk, and (ii) changes in the clinician's trust in the ML algorithm and the algorithm itself can induce complex dependencies that violate standard assumptions. Nevertheless, we show that valid inference is still possible when monitoring \textit{conditional} rather than marginal performance measures under either the assumption of conditional exchangeability or time-constant selection bias. Finally, performativity can vary over time and induce nonstationarity in the data, which presents challenges for monitoring. To this end, we introduce a new score-based cumulative sum (CUSUM) monitoring procedure with dynamic control limits. Through extensive simulation studies, we study applications of the score-based CUSUM and how it is affected by various factors, including the efficiency of model updating procedures and the level of clinician trust. Finally, we apply the procedure to detect calibration decay of a risk model during the COVID-19 pandemic.