Skip to yearly menu bar Skip to main content


On the Effect of Key Factors in Spurious Correlation: A theoretical Perspective

Yipei Wang · Xiaoqian Wang

MR1 & MR2 - Number 77
[ ]
Sat 4 May 6 a.m. PDT — 8:30 a.m. PDT


Spurious correlations arise when irrelevant patterns in input data are mistakenly associated with labels, compromising the generalizability of machine learning models. While these models may be confident during the training stage, they often falter in real-world testing scenarios due to the shift of these misleading correlations. Current solutions to this problem typically involve altering the correlations or regularizing latent representations. However, while these methods show promise in experiments, a rigorous theoretical understanding of their effectiveness and the underlying factors of spurious correlations is lacking. In this work, we provide a comprehensive theoretical analysis, supported by empirical evidence, to understand the intricacies of spurious correlations. Drawing on our proposed theorems, we investigate the behaviors of classifiers when confronted with spurious features, and present our findings on how various factors influence these correlations and their impact on model performances, including the Mahalanobis distance of groups, and training/testing spurious correlation ratios. Additionally, by aligning empirical outcomes with our theoretical discoveries, we highlight the feasibility of assessing the degree of separability of intertwined real-world features. This research paves the way for a nuanced comprehension of spurious correlations, laying a solid theoretical groundwork that promises to steer future endeavors toward crafting more potent mitigation techniques.

Live content is unavailable. Log in and register to view live content