Hyperbolic Part-Whole Image Segmentation
Abstract
Semantic segmentation typically focuses on pixel-level classification at the object level. Yet, objects naturally decompose into parts and subparts, mirroring human visual perception. In this work, we introduce a hyperbolic prototypical segmentation framework capable of simultaneously representing multiple granularity levels within a unified embedding space. Leveraging hyperbolic geometry's unique capacity to model hierarchies effectively, we propose to embed class prototypes within the Poincaré ball. We introduce a tree-aware prototype initialization strategy and a distortion-p loss that together yield improved hierarchical embeddings. Furthermore, we derive an optimized formulation of the hyperbolic distance function, enabling tractable inference for dense prediction tasks. A shared transformer encoder paired with separate hyperbolic heads allows efficient multi-level segmentation from a single model. Experiments on the recently introduced SubPartImageNet show that our approach (i) improves over the state-of-the-art, especially at the subpart and part levels, at a fraction of the number of parameters, (ii) enables zero-shot generalization, and (iii) allows for transfer from part- to object-level predictions without object-level supervision. All code will be made publicly available.