T$_k$CP: Context-Aware Pooling via Top-k% Activation Selection
Seo-Yeon Choi · Kyungsu Lee
Abstract
Pooling is a fundamental operation in convolutional neural networks (CNN), leading to spatial reduction and hierarchical abstraction. However, conventional pooling methods such as max and average pooling operate locally and often fail to capture semantically meaningful features across the broader context of an image with under- or over-estimation. The inherent limitation hampers performance in vision tasks demanding both precise localization and global interpretation. To alleviate this, we introduce Top-$k$\% Contextual Pooling (T$_k$CP), a novel pooling framework designed to retain the most informative activations based on the contextual importance. T$_k$CP consists of two variants: (1) Sparse Contextual Pooling performs top-$k$\% selection within local kernel windows, and (2) Global Contextual Pooling identifies top-$k$\% activations across the entire feature map. Given a kernel size and target output resolution, T$_k$CP automatically determines stride values and reconstructs the output through a deterministic process that preserves spatial coherence without additional learnable parameters. We evaluate T$_k$CP on a wide range of computer vision tasks including image classification, object detection, object tracking, semantic segmentation, and generation. Experimental results demonstrate consistent improvements in accuracy and robustness in vision tasks. Beyond performance gains, T$_k$CP provides a mechanism for interpreting model behavior by revealing how high-activation regions evolve across layers in the CNN pyramid. The hierarchical interpretation supports efficient representation while enabling layer-wise insight into the attention and decision patterns.
Successful Page Load