Mingfei Gao Yingbo Zhou Ran Xu Richard Socher Caiming Xiong

Abstract
Online action detection in untrimmed videos aims to identify an action as ithappens, which makes it very important for real-time applications. Previousmethods rely on tedious annotations of temporal action boundaries for training,which hinders the scalability of online action detection systems. We proposeWOAD, a weakly supervised framework that can be trained using only video-classlabels. WOAD contains two jointly-trained modules, i.e., temporal proposalgenerator (TPG) and online action recognizer (OAR). Supervised by video-classlabels, TPG works offline and targets at accurately mining pseudo frame-levellabels for OAR. With the supervisory signals from TPG, OAR learns to conductaction detection in an online fashion. Experimental results on THUMOS'14,ActivityNet1.2 and ActivityNet1.3 show that our weakly-supervised methodlargely outperforms weakly-supervised baselines and achieves comparableperformance to the previous strongly-supervised methods. Beyond that, WOAD isflexible to leverage strong supervision when it is available. When stronglysupervised, our method obtains the state-of-the-art results in the tasks ofboth online per-frame action recognition and online detection of action start.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| online-action-detection-on-thumos-14 | WOAD | mAP: 67.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.