HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Action Recognition
Action Recognition On Diving 48
Action Recognition On Diving 48
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
LVMAE
94.9
Extending Video Masked Autoencoders to 128 frames
Video-FocalNet-B
90.8
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
AIM (CLIP ViT-L/14, 32x224)
90.6
AIM: Adapting Image Models for Efficient Video Action Recognition
DUALPATH
88.7
Dual-path Adaptation from Image to Video Transformers
StructVit-B-4-1
88.3
Learning Correlation Structures for Vision Transformers
TFCNet
88.3
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
ORViT TimeSformer
88.0
Object-Region Video Transformers
GC-TDN
87.6
Group Contextualization for Video Recognition
BEVT
86.7
BEVT: BERT Pretraining of Video Transformers
PSB
86
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
VIMPAC
85.5
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Relational Self-Attention: What's Missing in Attention for Video Understanding
TQN
81.8
Temporal Query Networks for Fine-grained Video Understanding
PMI Sampler
81.3
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
TimeSformer-L
81
Is Space-Time Attention All You Need for Video Understanding?
TimeSformer-HR
78
Is Space-Time Attention All You Need for Video Understanding?
SlowFast
77.6
SlowFast Networks for Video Recognition
TimeSformer
75
Is Space-Time Attention All You Need for Video Understanding?
0 of 18 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Action Recognition
Action Recognition On Diving 48
Action Recognition On Diving 48
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
LVMAE
94.9
Extending Video Masked Autoencoders to 128 frames
Video-FocalNet-B
90.8
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
AIM (CLIP ViT-L/14, 32x224)
90.6
AIM: Adapting Image Models for Efficient Video Action Recognition
DUALPATH
88.7
Dual-path Adaptation from Image to Video Transformers
StructVit-B-4-1
88.3
Learning Correlation Structures for Vision Transformers
TFCNet
88.3
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
ORViT TimeSformer
88.0
Object-Region Video Transformers
GC-TDN
87.6
Group Contextualization for Video Recognition
BEVT
86.7
BEVT: BERT Pretraining of Video Transformers
PSB
86
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
VIMPAC
85.5
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Relational Self-Attention: What's Missing in Attention for Video Understanding
TQN
81.8
Temporal Query Networks for Fine-grained Video Understanding
PMI Sampler
81.3
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
TimeSformer-L
81
Is Space-Time Attention All You Need for Video Understanding?
TimeSformer-HR
78
Is Space-Time Attention All You Need for Video Understanding?
SlowFast
77.6
SlowFast Networks for Video Recognition
TimeSformer
75
Is Space-Time Attention All You Need for Video Understanding?
0 of 18 row(s) selected.
Previous
Next
Action Recognition On Diving 48 | SOTA | HyperAI