HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
动作分类
Action Classification On Kinetics 400
Action Classification On Kinetics 400
评估指标
Acc@1
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Acc@1
Paper Title
OmniVec2
93.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
FTP-UniFormerV2-L/14
93.4
Enhancing Video Transformers for Action Understanding with VLM-aided Training
InternVideo2-6B
92.1
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B
91.6
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
OmniVec
91.1
OmniVec: Learning robust representations with cross modal sharing
InternVideo
91.1
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
TubeViT-H (ImageNet-1k)
90.9
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UMT-L (ViT-L/16)
90.6
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher (ViT-L)
90.6
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
TubeVit-L (ImageNet-1k)
90.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UniFormerV2-L (ViT-L, 336)
90.0
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
VideoMAE V2-g (64x266x266)
90.0
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MTV-H (WTS 60M)
89.9
Multiview Transformers for Video Recognition
TAdaFormer-L/14
89.9
Temporally-Adaptive Models for Efficient Video Understanding
EVA
89.7
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
AM/12 ViT-B Dinov2
89.6
AM Flow: Adapters for Temporal Processing in Action Recognition
ATM
89.4
What Can Simple Arithmetic Operations Do for Temporal Modeling?
CoCa (finetuned)
88.9
CoCa: Contrastive Captioners are Image-Text Foundation Models
ILA (ViT-L/14)
88.7
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
BIKE (CLIP ViT-L/14)
88.7
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
0 of 204 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
动作分类
Action Classification On Kinetics 400
Action Classification On Kinetics 400
评估指标
Acc@1
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Acc@1
Paper Title
OmniVec2
93.6
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
FTP-UniFormerV2-L/14
93.4
Enhancing Video Transformers for Action Understanding with VLM-aided Training
InternVideo2-6B
92.1
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B
91.6
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
OmniVec
91.1
OmniVec: Learning robust representations with cross modal sharing
InternVideo
91.1
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
TubeViT-H (ImageNet-1k)
90.9
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UMT-L (ViT-L/16)
90.6
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher (ViT-L)
90.6
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
TubeVit-L (ImageNet-1k)
90.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UniFormerV2-L (ViT-L, 336)
90.0
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
VideoMAE V2-g (64x266x266)
90.0
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MTV-H (WTS 60M)
89.9
Multiview Transformers for Video Recognition
TAdaFormer-L/14
89.9
Temporally-Adaptive Models for Efficient Video Understanding
EVA
89.7
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
AM/12 ViT-B Dinov2
89.6
AM Flow: Adapters for Temporal Processing in Action Recognition
ATM
89.4
What Can Simple Arithmetic Operations Do for Temporal Modeling?
CoCa (finetuned)
88.9
CoCa: Contrastive Captioners are Image-Text Foundation Models
ILA (ViT-L/14)
88.7
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
BIKE (CLIP ViT-L/14)
88.7
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
0 of 204 row(s) selected.
Previous
Next