HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Action Recognition
Action Recognition On Ava V2 2
Action Recognition On Ava V2 2
Metrics
mAP
Results
Performance results of various models on this benchmark
Columns
Model Name
mAP
Paper Title
LART (Hiera-H, K700 PT+FT)
45.1
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Hiera-H (K700 PT+FT)
43.3
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
VideoMAE V2-g
42.6
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
STAR/L
41.7
End-to-End Spatio-Temporal Action Localisation with Video Transformers
MVD (Kinetics400 pretrain+finetune, ViT-H, 16x4)
41.1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
InternVideo
41.01
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
MVD (Kinetics400 pretrain, ViT-H, 16x4)
40.1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
MaskFeat (Kinetics-600 pretrain, MViT-L)
39.8
Masked Feature Prediction for Self-Supervised Visual Pre-Training
UMT-L (ViT-L/16)
39.8
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
VideoMAE (K400 pretrain+finetune, ViT-H, 16x4)
39.5
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE (K700 pretrain+finetune, ViT-L, 16x4)
39.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain+finetune, ViT-L, 16x4)
38.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
VideoMAE (K400 pretrain+finetune, ViT-L, 16x4)
37.8
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain, ViT-L, 16x4)
37.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
VideoMAE (K400 pretrain, ViT-H, 16x4)
36.5
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE (K700 pretrain, ViT-L, 16x4)
36.1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MeMViT-24
35.4
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
MViTv2-L (IN21k, K700)
34.4
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
VideoMAE (K400 pretrain, ViT-L, 16x4)
34.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain+finetune, ViT-B, 16x4)
34.2
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
0 of 38 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Action Recognition
Action Recognition On Ava V2 2
Action Recognition On Ava V2 2
Metrics
mAP
Results
Performance results of various models on this benchmark
Columns
Model Name
mAP
Paper Title
LART (Hiera-H, K700 PT+FT)
45.1
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Hiera-H (K700 PT+FT)
43.3
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
VideoMAE V2-g
42.6
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
STAR/L
41.7
End-to-End Spatio-Temporal Action Localisation with Video Transformers
MVD (Kinetics400 pretrain+finetune, ViT-H, 16x4)
41.1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
InternVideo
41.01
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
MVD (Kinetics400 pretrain, ViT-H, 16x4)
40.1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
MaskFeat (Kinetics-600 pretrain, MViT-L)
39.8
Masked Feature Prediction for Self-Supervised Visual Pre-Training
UMT-L (ViT-L/16)
39.8
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
VideoMAE (K400 pretrain+finetune, ViT-H, 16x4)
39.5
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE (K700 pretrain+finetune, ViT-L, 16x4)
39.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain+finetune, ViT-L, 16x4)
38.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
VideoMAE (K400 pretrain+finetune, ViT-L, 16x4)
37.8
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain, ViT-L, 16x4)
37.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
VideoMAE (K400 pretrain, ViT-H, 16x4)
36.5
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE (K700 pretrain, ViT-L, 16x4)
36.1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MeMViT-24
35.4
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
MViTv2-L (IN21k, K700)
34.4
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
VideoMAE (K400 pretrain, ViT-L, 16x4)
34.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain+finetune, ViT-B, 16x4)
34.2
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
0 of 38 row(s) selected.
Previous
Next
Action Recognition On Ava V2 2 | SOTA | HyperAI