Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks
Abstract
SimCLR is a popular contrastive learning method for vision tasks, renowned for its ability to pre-train neural networks to learn efficient representations. Despite its empirical effectiveness, the theoretical understanding of SimCLR is still very limited, even in the simplest learning scenarios. In this paper, we introduce a theoretical case study of SimCLR. Specifically, we consider training a two-layer convolutional neural network (CNN) to learn a toy image data model that has been considered in a series of recent works. For this particular learning task, we precisely characterize the label complexity under which SimCLR pre-training followed by supervised fine-tuning achieves approximately zero training loss and almost optimal test loss. Notably, the label complexity for SimCLR pre-training is far less demanding compared to direct supervised training, especially when the signal-to-noise ratio in the data is low. Our analysis sheds light on the benefits of SimCLR in learning with fewer labels.