PolarQuant: Vector Quantization with Polar Transformation
Abstract
Vector quantization is a prevalent technique for reducing the memory footprint in a wide range of computational problems, such as the training and deployment of deep learning models and vector search systems. We introduce PolarQuant, a novel online vector quantization method that leverages random preconditioning and polar transformation. Our approach efficiently transforms Euclidean vectors into polar coordinates using a recursive algorithm, and then quantizes the resulting angles. A key insight is that, following random preconditioning, the angles in the polar representation exhibit a tightly bounded, highly concentrated, and analytically computable distribution, independent of the input data. This nice distribution eliminates the need for explicit normalization and learned data-dependent quantization codebooks, steps that introduce significant memory and runtime overhead in traditional product/scalar quantization methods. By circumventing this data-dependent step, PolarQuant achieves substantial memory and runtime savings, making it highly suitable for online scenarios such as KV cache compression. The long-context evaluation demonstrates that PolarQuant compresses the KV cache by over 4.2X while achieving the best quality scores compared to the state-of-the-art methods.