6 个月前

面向弱监督端到端学习的长视频动作识别

Zhou Jiaming ; Li Hanjun ; Lin Kun-Yu ; Liang Junwei

摘要

在长视频中开发端到端动作识别模型对于长视频动作理解至关重要。由于在整个长视频上进行端到端训练的成本过高，现有的研究通常是在从长视频中剪辑出的短片段上训练模型。然而，这种“先剪辑再训练”的方法需要动作区间注释以提供片段级别的监督，即知道哪些动作被剪辑到了这些片段中。不幸的是，收集此类注释非常昂贵，阻碍了大规模的模型训练。为此，本研究旨在构建一个仅使用视频级别动作类别标签的弱监督端到端框架，用于在长视频上训练识别模型。在不知道长视频中动作的确切时间位置的情况下，我们提出的弱监督框架（即AdaptFocus）估计动作可能发生的位置及其概率，从而自适应地关注信息量丰富的动作片段进行端到端训练。AdaptFocus框架的有效性已在三个长视频数据集上得到验证。此外，在下游长视频任务中，我们的AdaptFocus框架提供了一种弱监督特征提取流程，用于提取更加鲁棒的长视频特征，从而显著提升了下游任务的最新方法。我们将发布代码和模型。

基准测试

基准	方法	指标
action-classification-on-charades	AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)	MAP: 41.4
action-classification-on-charades	AdaFocus (weak supervision, MViT-B-24, 32x3)	MAP: 47.8
action-classification-on-charades	AdaFocus (weak supervision, Slowfast-R50, 16x8)	MAP: 39.3
action-classification-on-charades	AdaFocus (weak supervision, X3D-L, 32x3)	MAP: 41.2
action-segmentation-on-breakfast-1	AdaFocus (newly extracted I3D-features, LT-Context model)	Acc: 78.0 Average F1: 76.2 Edit: 78.3 F1@10%: 82.1 F1@25%: 79.0 F1@50%: 67.5
long-video-activity-recognition-on-breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, GHRM)	mAP: 69.6
long-video-activity-recognition-on-breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, Timeception)	mAP: 70.4
long-video-activity-recognition-on-breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, Timeception)	mAP: 79.2
long-video-activity-recognition-on-breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, GHRM)	mAP: 79.5
temporal-sentence-grounding-on-charades-sta	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	[email protected]: 46.9 [email protected]: 21.1 [email protected]: 79.3 [email protected]: 49.2
temporal-sentence-grounding-on-charades-sta	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	[email protected]: 49.1 [email protected]: 22.4 [email protected]: 84.2 [email protected]: 51.8
temporal-sentence-grounding-on-charades-sta	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	[email protected]: 56.7 [email protected]: 35.6 [email protected]: 87.9 [email protected]: 65.0
temporal-sentence-grounding-on-charades-sta	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	[email protected]: 62.4 [email protected]: 38.6 [email protected]: 89.4 [email protected]: 66.4
temporal-sentence-grounding-on-charades-sta	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	[email protected]: 50.1 [email protected]: 21.8 [email protected]: 86.1 [email protected]: 54.6
temporal-sentence-grounding-on-charades-sta	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	[email protected]: 51.7 [email protected]: 23.2 [email protected]: 85.2 [email protected]: 52.6
weakly-supervised-action-segmentation-action	AdaFocus (newly extracted I3D-features, POC model)	Acc: 49.6

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供

HyperAI超神经

算力平台

6 个月前

面向弱监督端到端学习的长视频动作识别

查看论文详情

Zhou Jiaming ; Li Hanjun ; Lin Kun-Yu ; Liang Junwei

摘要

基准测试

基准	方法	指标
action-classification-on-charades	AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)	MAP: 41.4
action-classification-on-charades	AdaFocus (weak supervision, MViT-B-24, 32x3)	MAP: 47.8
action-classification-on-charades	AdaFocus (weak supervision, Slowfast-R50, 16x8)	MAP: 39.3
action-classification-on-charades	AdaFocus (weak supervision, X3D-L, 32x3)	MAP: 41.2
action-segmentation-on-breakfast-1	AdaFocus (newly extracted I3D-features, LT-Context model)	Acc: 78.0 Average F1: 76.2 Edit: 78.3 F1@10%: 82.1 F1@25%: 79.0 F1@50%: 67.5
long-video-activity-recognition-on-breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, GHRM)	mAP: 69.6
long-video-activity-recognition-on-breakfast	AdaFocus (I3D-Breakfast-Pretrain-feature, Timeception)	mAP: 70.4
long-video-activity-recognition-on-breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, Timeception)	mAP: 79.2
long-video-activity-recognition-on-breakfast	AdaFocus (MViT-Breakfast-Pretrain-feature, GHRM)	mAP: 79.5
temporal-sentence-grounding-on-charades-sta	AdaFocus (Semi-weak, I3D-Charades-Pretrain-feature, D3G model)	[email protected]: 46.9 [email protected]: 21.1 [email protected]: 79.3 [email protected]: 49.2
temporal-sentence-grounding-on-charades-sta	AdaFocus (Weak, I3D-Charades-Pretrain-feature, CPL model)	[email protected]: 49.1 [email protected]: 22.4 [email protected]: 84.2 [email protected]: 51.8
temporal-sentence-grounding-on-charades-sta	AdaFocus (Full, I3D-Charades-Pretrain-feature, MMN model)	[email protected]: 56.7 [email protected]: 35.6 [email protected]: 87.9 [email protected]: 65.0
temporal-sentence-grounding-on-charades-sta	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)	[email protected]: 62.4 [email protected]: 38.6 [email protected]: 89.4 [email protected]: 66.4
temporal-sentence-grounding-on-charades-sta	AdaFocus (Semi-weak, MViT-Charades-Pretrain-feature, D3G model)	[email protected]: 50.1 [email protected]: 21.8 [email protected]: 86.1 [email protected]: 54.6
temporal-sentence-grounding-on-charades-sta	AdaFocus (Weak, MViT-Charades-Pretrain-feature, CPL model)	[email protected]: 51.7 [email protected]: 23.2 [email protected]: 85.2 [email protected]: 52.6
weakly-supervised-action-segmentation-action	AdaFocus (newly extracted I3D-features, POC model)	Acc: 49.6

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供

Command Palette

面向弱监督端到端学习的长视频动作识别

Zhou Jiaming ; Li Hanjun ; Lin Kun-Yu ; Liang Junwei

摘要

基准测试

用 AI 构建 AI

Hyper Newsletters

Command Palette

面向弱监督端到端学习的长视频动作识别

Zhou Jiaming ; Li Hanjun ; Lin Kun-Yu ; Liang Junwei

摘要

基准测试

用 AI 构建 AI

Hyper Newsletters