Skip to yearly menu bar Skip to main content


Towards a Complete Benchmark on Video Moment Localization

Jinyeong Chae · Donghwa Kim · Kwanseok Kim · Doyeon Lee · Sangho Lee · Seongsu Ha · Jonghwan Mun · Wooyoung Kang · Byungseok Roh · Joonseok Lee

MR1 & MR2 - Number 137
[ ]
Thu 2 May 8 a.m. PDT — 8:30 a.m. PDT


In this paper, we propose and conduct a comprehensive benchmark on moment localization task, which aims to retrieve a segment that corresponds to a text query from a single untrimmed video. Our study starts from an observation that most moment localization papers report experimental results only on a few datasets in spite of availability of far more benchmarks. Thus, we conduct an extensive benchmark study to measure the performance of representative methods on widely used 7 datasets. Looking further into the details, we pose additional research questions and empirically verify them, including if they rely on unintended biases introduced by specific training data, if advanced visual features trained on classification task transfer well to this task, and if computational cost of each model pays off. With a series of these experiments, we provide multi-faceted evaluation of state-of-the-art moment localization models. Codes are available at \url{}.

Live content is unavailable. Log in and register to view live content