HyperAI超神经

摘要

弱监督时序动作定位（Weakly Supervised Temporal Action Localization, WS-TAL）是一项具有挑战性的任务，旨在仅通过视频级别的类别标签，定位视频中动作实例的发生时间。以往的研究通常同时利用外观特征与运动特征，但大多采用简单的特征拼接或得分级融合方式，未能充分挖掘两类特征之间的协同作用。本文认为，从预训练提取器（如I3D）中获得的特征并非专为WS-TAL任务设计，其中包含大量与任务无关的信息冗余，因此有必要对特征进行重新校准。为此，本文提出一种跨模态共识网络（Cross-modal Consensus Network, CO₂-Net）。在CO₂-Net中，我们引入两个结构相同的跨模态共识模块（Cross-modal Consensus Modules, CCM），每个模块通过设计跨模态注意力机制，利用主模态的全局信息与辅助模态的局部跨模态信息，有效过滤掉与任务无关的冗余信息。此外，我们进一步将每个CCM所生成的注意力权重视为另一个CCM的伪标签（pseudo target），以强制两个CCM的预测结果保持一致性，从而形成一种相互学习的机制。为验证所提方法的有效性，我们在两个广泛使用的时序动作定位数据集THUMOS14和ActivityNet1.2上进行了大量实验，结果表明，本文提出的方法在多个评估指标上均达到当前最优性能。实验结果充分证明，所提出的跨模态共识模块能够生成更具代表性的特征，显著提升弱监督时序动作定位的准确性与鲁棒性。

基准	方法	指标
weakly-supervised-action-localization-on	CO2-Net	[email protected]:0.5: 54.4 [email protected]:0.7: 44.6 [email protected]: 38.3
weakly-supervised-temporal-action	CO2-Net	mAP [email protected]: 70.1 mAP [email protected]: 63.6 mAP [email protected]: 54.5 mAP [email protected]: 45.7 mAP [email protected]: 38.3 mAP [email protected]: 26.4 mAP [email protected]: 13.4 mAP [email protected]: 6.9 mAP [email protected]: 2.0 mAP@AVG(0.1:0.9): 35.7

基准

方法

指标

weakly-supervised-action-localization-on

CO2-Net

[email protected]:0.5: 54.4

[email protected]:0.7: 44.6

[email protected]: 38.3

weakly-supervised-temporal-action

CO2-Net

mAP [email protected]: 70.1

mAP [email protected]: 63.6

mAP [email protected]: 54.5

mAP [email protected]: 45.7

mAP [email protected]: 38.3

mAP [email protected]: 26.4

mAP [email protected]: 13.4

mAP [email protected]: 6.9

mAP [email protected]: 2.0

mAP@AVG(0.1:0.9): 35.7

摘要

基准	方法	指标
weakly-supervised-action-localization-on	CO2-Net	[email protected]:0.5: 54.4 [email protected]:0.7: 44.6 [email protected]: 38.3
weakly-supervised-temporal-action	CO2-Net	mAP [email protected]: 70.1 mAP [email protected]: 63.6 mAP [email protected]: 54.5 mAP [email protected]: 45.7 mAP [email protected]: 38.3 mAP [email protected]: 26.4 mAP [email protected]: 13.4 mAP [email protected]: 6.9 mAP [email protected]: 2.0 mAP@AVG(0.1:0.9): 35.7

基准

方法

指标

weakly-supervised-action-localization-on

CO2-Net

[email protected]:0.5: 54.4

[email protected]:0.7: 44.6

[email protected]: 38.3

weakly-supervised-temporal-action

CO2-Net

mAP [email protected]: 70.1

mAP [email protected]: 63.6