Skip to yearly menu bar Skip to main content


Clustering High-dimensional Data with Ordered Weighted L1 Regularization

Chandramauli Chakraborty · Sayan Paul · Saptarshi Chakraborty · Swagatam Das

Auditorium 1 Foyer 73

Abstract: Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most do not perform up to the mark, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted $\ell_1$ (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of $\mathcal{O}\left(\sqrt{k \log p/n}\right)$, under model-sparsity, bridging the gap between theory and practice of weighted clustering.

Live content is unavailable. Log in and register to view live content